Can you jailbreak Anthropic's latest AI safety measure? Researchers want you to try -- and are offering up to $20,000 if you succeed. Trained on synthetic data, these "classifiers" were able to ...
Today, Claude model-maker Anthropic has released a new system of Constitutional Classifiers that it says can "filter the overwhelming majority" of those kinds of jailbreaks. And now that the ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results