Anthropic Claude Sonnet PNG

Anthropic dares you to try to jailbreak Claude AI

Anthropic developed a defense against universal AI jailbreaks for Claude called Constitutional Classifiers - here's how it ...

ZDNet2d

Anthropic offers $20,000 to whoever can jailbreak its new AI safety system

After improving it, Anthropic ran a test of 10,000 synthetic jailbreaking attempts on an October version of Claude 3.5 Sonnet with and without classifier protection using known successful attacks.

TechRadar5d

Anthropic has a new security system it says can stop almost all AI jailbreaks

Anthropic unveils new proof-of-concept security measure tested on Claude 3.5 Sonnet “Constitutional classifiers” are an attempt to teach LLMs value systems Tests resulted in more than an 80% ...

VentureBeat5d

Anthropic claims new AI security method blocks 95% of jailbreaks, invites red teamers to try

Claude 3.5 Sonnet. It does this while minimizing over-refusals (rejection of prompts that are actually benign) and and doesn’t require large compute. The Anthropic Safeguards Research Team has ...

heise online4d

Anthropic: users to put jailbreak protection for AI chatbot to the test

In Anthropic's internal test, the unprotected version of Claude 3.5 Sonnet is said to have blocked only 14 percent of unauthorized requests. A version protected with the filter system, on the ...

Results that may be inaccessible to you are currently showing.

Hide inaccessible results