Can you jailbreak Anthropic's latest AI safety measure? Researchers want you to try -- and are offering up to $20,000 if you succeed. Trained on synthetic data, these "classifiers" were able to ...
Today, Claude model-maker Anthropic has released a new system of Constitutional Classifiers that it says can "filter the overwhelming majority" of those kinds of jailbreaks. And now that the ...