The new Claude safeguards have already technically been broken but Anthropic says this was due to a glitch — try again.
The company offered hackers $15,000 to crack the system. No one claimed the prize, despite people spending 3,000 hours trying ...
But Anthropic still wants you to try beating it. The company stated in an X post on Wednesday that it is "now offering $10K to the first person to pass all eight levels, and $20K to the first person ...
Anthropic developed a defense against universal AI jailbreaks for Claude called Constitutional Classifiers - here's how it ...
"The principles define the classes of content that are allowed and disallowed (for example, recipes for mustard are allowed, but recipes for mustard gas are not)," Anthropic noted. Researchers ...
Claude model-maker Anthropic has released a new system of Constitutional Classifiers that it says can "filter the ...