Mythos Slipped the Cage: Notes on Glasswing’s First Real Test

A little over two weeks ago, Anthropic announced Claude Mythos Preview and put it behind a deliberately small door called Project Glasswing — a phased rollout to “critical industry partners and open source developers” with the explicit goal of giving defenders a head start on a model that, by Anthropic’s own description, can find and exploit zero-days in shipping software. The bet was that if you give a sharp tool to the people patching things first, the asymmetry tilts toward defense long enough to matter.

Last week the door turned out to be propped open. A small group of users on a private forum stumbled into Mythos through a third-party vendor environment — by, as Fortune phrased it, “guessing where it was located” — on the same day the limited-access program was announced. Anthropic has confirmed it’s investigating. As I write this, no one is claiming a catastrophic outcome. What we do have is the first real-world stress test of the Glasswing premise, and it’s worth being honest about what that test showed.

What Anthropic actually built

Mythos Preview (codename Capybara) is a general-purpose frontier model that posts numbers most of us hadn’t expected to see this year — SWE-bench at 93.9%, USAMO at 97.6%, and a generational jump on cyber tasks specifically. The vibe in the public materials is unusually direct for Anthropic: red.anthropic.com calls it “strikingly capable at computer security tasks,” notes that during testing it discovered “thousands of high-severity vulnerabilities” across “every major operating system and web browser,” and frames Glasswing as a deliberate attempt to bias the rollout toward defenders.

That framing matters because it’s not the standard “we hope it goes well” disclosure. Anthropic is explicitly saying this model raises the offensive ceiling enough that the order of access changes the threat model. That’s a real claim, and it’s one the next two sections actually back up.

The numbers from AISI

The UK AI Security Institute’s evaluation, published April 13, is the cleanest third-party look so far. Two findings stuck with me:

On expert-level capture-the-flag challenges — tasks that no model could complete at all before April 2025 — Mythos Preview succeeds 73% of the time. That’s not “AI is getting better at security CTFs.” That’s a category change.

On AISI’s “The Last Ones” range — a 32-step simulated corporate-network attack they estimate would take a human professional roughly 20 hours — Mythos became the first model to solve it end-to-end, doing so in 3 of 10 attempts, with an average of 22/32 steps completed. Claude Opus 4.6, the previous best, averaged 16. AISI’s chart shows performance still scaling up at the 100M-token budget they tested; they expect more compute to keep extracting more capability. Translation for defenders: the bottleneck right now is inference budget, not capability.

The honest caveat AISI prints in plain English is that their ranges lack active defenders, EDR, and meaningful detection penalties. Mythos can chain a kill chain on a soft target. Whether it does so against a hardened, monitored estate is the next evaluation, not this one.

How the leak happened (and didn’t)

The leak details we have are thin but instructive. Per the SiliconANGLE and CBS reports, the access path was a third-party vendor environment, not an Anthropic-side credential break. Per Fortune, the discovery vector was effectively guessing — pattern-matching where a limited-access endpoint might be hosted, then trying it. That’s the oldest move in the book: you don’t break the lock, you find the door no one remembered putting on the master key.

This is the bit I keep returning to. Glasswing’s threat model assumes the perimeter you have to defend includes every partner you handed access to. The model is hard. Vendor-environment hygiene is hard in a different, much more boring way. The boring way is the one that broke first.

What this means for defenders this week

A few things I’m doing or recommending around our estate, none of them novel, all of them more urgent than they were on April 7:

The Cyber Essentials basics that NCSC and AISI both pointed at — patch cadence, access control, configuration baselining, real logging — are now the difference between “vulnerable to a skilled human attacker over a weekend” and “vulnerable to an autonomous agent over a coffee break.” If your patch SLA is 30 days for highs, that window is now quite a bit more expensive.

If you’re a partner in any frontier-model preview, treat the access credentials as a Tier-0 secret on par with domain admin. The Glasswing leak is going to make every vendor questionnaire about model access materially more painful for the next twelve months, and rightly so.

Detection assumptions need a refresh. Most of our content is tuned to human pacing and human mistakes. An agent that runs 22 steps of a kill chain in a single autonomous session won’t make the small, slow tells we instrument for. The next round of detection engineering is going to be about behavior-rate signals, not signatures.

What I’m watching

Three things over the next couple of weeks. First, whether Anthropic publishes a real post-mortem on the vendor-side leak — not a “we are investigating” line, but the kind of write-up that lets the rest of us learn from a partner’s misconfiguration. Second, whether the UK government’s reported discussions about limited Mythos access produce any public structure for state-level defender programs; that’s the natural next ring outside Glasswing. Third, whether AISI’s hardened-range follow-up actually shows the capability gap I expect — because if Mythos still solves a defended estate at non-trivial rates, the calculus described in the foreign-policy commentary stops being theoretical and starts dictating procurement decisions.

For now, my read is unchanged from a month ago: the model is real, the defender-first framing is the right framing, and the Glasswing leak is a caution about implementation rather than a refutation of the strategy. The asymmetry window is still there. It’s just smaller than Anthropic wanted it to be.

What Anthropic actually built

The numbers from AISI

How the leak happened (and didn’t)

What this means for defenders this week

What I’m watching

Sources

Comments

Leave a Reply Cancel reply

More posts

Building a Secure Router Config Backup System with Google Antigravity and Azure Key Vault

Who Watches the AI Agents? — Cisco’s Case for Agentic Observability

Cisco’s “AI-First Ops” Pivot — Why Production AI Is an Infrastructure Problem

The Patch Deficit: One Month Into Mythos, Less Than 1% Has Been Fixed