Was Claude Fable 5 actually jailbroken?

A researcher known as Pliny claimed a jailbreak within 48 hours of launch, using methods like multi-step decomposition and narrative framing. Anthropic disputed this, saying the technique only coaxed the model to keep responding past its own refusals, which is a known limitation in nearly all large language models. Anthropic says no one has found a universal jailbreak that broadly defeats Fable 5's safeguards.

How does Fable 5's safety system work?

Fable 5 runs each request through classifiers, small models trained to spot sensitive topics like cybersecurity, biology, and chemistry. When a classifier fires, the request is not refused. It is answered by Claude Opus 4.8, an older and less capable model. Anthropic says this happens in under 5% of sessions.

Who reported the jailbreak that got Fable 5 suspended?

The Wall Street Journal reported that researchers at Amazon, Anthropic's major investor, found the jailbreak and reported it directly to the US Commerce Department rather than disclosing it to Anthropic first. Earlier coverage from Axios named only 'another company.' The Commerce Department then issued the export-control order.

What is the difference between a narrow and a universal jailbreak?

A narrow jailbreak extracts specific information in a specific setup. A universal jailbreak broadly defeats a model's safeguards across many tasks. Anthropic argued the Fable 5 case was narrow at most, while the US government treated even a narrow risk as enough to suspend the model.

Is Claude Fable 5 coming back?

Anthropic disagreed with the order in unusually blunt language and clearly wants the models back online. The dispute reads as fixable rather than permanent. As of 13 June 2026 there is no announced date for restored access to Fable 5 or Mythos 5.

The Full Story Behind the Fable 5 Suspension: Inside the Jailbreak

In three days, Anthropic's most guarded AI model was launched, declared jailbroken, accused of quietly downgrading its own users, and then switched off by the US government. This is how Claude Fable 5 went from flagship to suspended, and what it revealed about how AI safety really works and who controls a frontier model once it ships.

It is the deeper companion to our shorter take on why the Fable 5 suspension matters.

Day one: a model built to be hard to break

Anthropic released Fable 5 on 9 June as the most capable model it had ever put in public hands, and leaned hard on safety. The design matters, because it sits under everything that followed. Each request runs through classifiers, small models trained to spot sensitive topics like cybersecurity, biology, and chemistry. When one fires, the request is not refused. It is quietly handed to the older, weaker Claude Opus 4.8, which answers instead. Anthropic says this happens in under 5% of sessions.

So the safety layer was never a locked door. It was a router. Most of the time you got the full Fable 5; some of the time, on a guess, a weaker model, and you were not always told which one replied.

Forty-eight hours in: the Fable 5 jailbreak claim

A jailbreak is a trick that pushes a model past its safety rules. Within two days of launch, a researcher known as Pliny claimed he had broken Fable 5. By his account it was a stack of techniques: splitting a banned request across multiple steps and agents so no single one looked harmful, wrapping the pieces in academic or fictional framing, and using character tricks to slip past filters. He also claimed to have leaked the model's full system prompt. Screenshots spread fast.

Anthropic pushed back. It said this was not a real jailbreak, just coaxing the model past its own refusals, a weakness in almost every large language model, and that nothing genuinely dangerous had been unlocked. It drew a line between a narrow jailbreak, which pries loose specific information in a specific setup, and a universal one that breaks the safeguards across the board. By its account, no one had found a universal jailbreak for Fable 5.

Both can be true. The trick probably handed no one a real weapon, but it exposed the core weakness of this kind of safety: break a harmful request into small, innocent-looking steps and you can slip past filters that would catch it asked directly. Each step reads as harmless. Only the whole is dangerous, and the classifiers judge the steps. That is the state of AI alignment today, the work of keeping a model inside its limits. Better than last year. Not a wall.

12 June: the government pulls the plug

Then it stopped being about Anthropic and its users. On 12 June, the US government issued an export-control order citing a narrow jailbreak, the kind that asks the model to read a codebase and find software flaws. It targeted foreign-national access, and the only way to comply was to disable Fable 5 and its sibling Mythos 5 for every customer worldwide. The full account of that order sits in our companion piece on the suspension.

Anthropic's official graphic for its public statement on the US government directive to suspend access to Fable 5 and Mythos 5 Anthropic's public statement on the order to suspend access to Fable 5 and Mythos 5. Source: Anthropic

Three days after launch, the most powerful model Anthropic had ever shipped was gone.

The trigger was a rival, and what it signals

The order came wrapped in national-security language, but the reality reported since is stranger. According to the Wall Street Journal, the jailbreak that alarmed the government was found by researchers at Amazon, Anthropic's major investor, who reportedly took it to the Commerce Department rather than to Anthropic first. Early speculation had blamed OpenAI. The administration had also already asked Anthropic to delay the launch and been refused, so the report became the lever to stop the models anyway.

It is hard not to read this as a competitor using the government's power to take a rival's flagship offline, though no one has said so on the record. Anthropic argued the flaw was narrow, but once a model is framed as a national-security risk, the call leaves the engineers. That makes a frontier model less an ordinary product than a strategic asset, one that can be switched off overnight for reasons that have nothing to do with its code. It is one of the first times an export-control order has pulled a commercial model worldwide, and the mechanism will be used again. We will keep following it.

The Full Story Behind the Fable 5 Suspension: Inside the Jailbreak

Day one: a model built to be hard to break

Forty-eight hours in: the Fable 5 jailbreak claim

12 June: the government pulls the plug

The trigger was a rival, and what it signals

References

Frequently Asked Questions

Need help building this for your business?

Related articles

Anthropic Fable 5 Suspended After 3 Days: Don't Build on One AI Model

AI That Codes for Days Is Here. What Claude Fable 5 Means for Your Business

Claude Oceanus: Why Anthropic's Delay Helps Malaysian Business