{"id":25,"date":"2026-04-25T20:41:38","date_gmt":"2026-04-25T20:41:38","guid":{"rendered":"https:\/\/inhochoi.com\/index.php\/2026\/04\/25\/mythos-slipped-the-cage-notes-on-glasswings-first-real-test\/"},"modified":"2026-04-25T20:41:38","modified_gmt":"2026-04-25T20:41:38","slug":"mythos-slipped-the-cage-notes-on-glasswings-first-real-test","status":"publish","type":"post","link":"https:\/\/inhochoi.com\/index.php\/2026\/04\/25\/mythos-slipped-the-cage-notes-on-glasswings-first-real-test\/","title":{"rendered":"Mythos Slipped the Cage: Notes on Glasswing&#8217;s First Real Test"},"content":{"rendered":"<p>A little over two weeks ago, Anthropic announced Claude Mythos Preview and put it behind a deliberately small door called Project Glasswing \u2014 a phased rollout to &#8220;critical industry partners and open source developers&#8221; with the explicit goal of giving defenders a head start on a model that, by Anthropic&#8217;s own description, can find and exploit zero-days in shipping software. The bet was that if you give a sharp tool to the people patching things first, the asymmetry tilts toward defense long enough to matter.<\/p>\n<p>Last week the door turned out to be propped open. A small group of users on a private forum stumbled into Mythos through a third-party vendor environment \u2014 by, as Fortune phrased it, &#8220;guessing where it was located&#8221; \u2014 on the same day the limited-access program was announced. Anthropic has confirmed it&#8217;s investigating. As I write this, no one is claiming a catastrophic outcome. What we do have is the first real-world stress test of the Glasswing premise, and it&#8217;s worth being honest about what that test showed.<\/p>\n<h2>What Anthropic actually built<\/h2>\n<p>Mythos Preview (codename Capybara) is a general-purpose frontier model that posts numbers most of us hadn&#8217;t expected to see this year \u2014 SWE-bench at 93.9%, USAMO at 97.6%, and a generational jump on cyber tasks specifically. The vibe in the public materials is unusually direct for Anthropic: red.anthropic.com calls it &#8220;strikingly capable at computer security tasks,&#8221; notes that during testing it discovered &#8220;thousands of high-severity vulnerabilities&#8221; across &#8220;every major operating system and web browser,&#8221; and frames Glasswing as a deliberate attempt to bias the rollout toward defenders.<\/p>\n<p>That framing matters because it&#8217;s not the standard &#8220;we hope it goes well&#8221; disclosure. Anthropic is explicitly saying this model raises the offensive ceiling enough that the order of access changes the threat model. That&#8217;s a real claim, and it&#8217;s one the next two sections actually back up.<\/p>\n<h2>The numbers from AISI<\/h2>\n<p>The UK AI Security Institute&#8217;s evaluation, published April 13, is the cleanest third-party look so far. Two findings stuck with me:<\/p>\n<p>On expert-level capture-the-flag challenges \u2014 tasks that no model could complete at all before April 2025 \u2014 Mythos Preview succeeds 73% of the time. That&#8217;s not &#8220;AI is getting better at security CTFs.&#8221; That&#8217;s a category change.<\/p>\n<p>On AISI&#8217;s &#8220;The Last Ones&#8221; range \u2014 a 32-step simulated corporate-network attack they estimate would take a human professional roughly 20 hours \u2014 Mythos became the first model to solve it end-to-end, doing so in 3 of 10 attempts, with an average of 22\/32 steps completed. Claude Opus 4.6, the previous best, averaged 16. AISI&#8217;s chart shows performance still scaling up at the 100M-token budget they tested; they expect more compute to keep extracting more capability. Translation for defenders: the bottleneck right now is inference budget, not capability.<\/p>\n<p>The honest caveat AISI prints in plain English is that their ranges lack active defenders, EDR, and meaningful detection penalties. Mythos can chain a kill chain on a soft target. Whether it does so against a hardened, monitored estate is the next evaluation, not this one.<\/p>\n<h2>How the leak happened (and didn&#8217;t)<\/h2>\n<p>The leak details we have are thin but instructive. Per the SiliconANGLE and CBS reports, the access path was a third-party vendor environment, not an Anthropic-side credential break. Per Fortune, the discovery vector was effectively guessing \u2014 pattern-matching where a limited-access endpoint might be hosted, then trying it. That&#8217;s the oldest move in the book: you don&#8217;t break the lock, you find the door no one remembered putting on the master key.<\/p>\n<p>This is the bit I keep returning to. Glasswing&#8217;s threat model assumes the perimeter you have to defend includes every partner you handed access to. The model is hard. Vendor-environment hygiene is hard in a different, much more boring way. The boring way is the one that broke first.<\/p>\n<h2>What this means for defenders this week<\/h2>\n<p>A few things I&#8217;m doing or recommending around our estate, none of them novel, all of them more urgent than they were on April 7:<\/p>\n<p>The Cyber Essentials basics that NCSC and AISI both pointed at \u2014 patch cadence, access control, configuration baselining, real logging \u2014 are now the difference between &#8220;vulnerable to a skilled human attacker over a weekend&#8221; and &#8220;vulnerable to an autonomous agent over a coffee break.&#8221; If your patch SLA is 30 days for highs, that window is now quite a bit more expensive.<\/p>\n<p>If you&#8217;re a partner in any frontier-model preview, treat the access credentials as a Tier-0 secret on par with domain admin. The Glasswing leak is going to make every vendor questionnaire about model access materially more painful for the next twelve months, and rightly so.<\/p>\n<p>Detection assumptions need a refresh. Most of our content is tuned to human pacing and human mistakes. An agent that runs 22 steps of a kill chain in a single autonomous session won&#8217;t make the small, slow tells we instrument for. The next round of detection engineering is going to be about behavior-rate signals, not signatures.<\/p>\n<h2>What I&#8217;m watching<\/h2>\n<p>Three things over the next couple of weeks. First, whether Anthropic publishes a real post-mortem on the vendor-side leak \u2014 not a &#8220;we are investigating&#8221; line, but the kind of write-up that lets the rest of us learn from a partner&#8217;s misconfiguration. Second, whether the UK government&#8217;s reported discussions about limited Mythos access produce any public structure for state-level defender programs; that&#8217;s the natural next ring outside Glasswing. Third, whether AISI&#8217;s hardened-range follow-up actually shows the capability gap I expect \u2014 because if Mythos still solves a defended estate at non-trivial rates, the calculus described in the foreign-policy commentary stops being theoretical and starts dictating procurement decisions.<\/p>\n<p>For now, my read is unchanged from a month ago: the model is real, the defender-first framing is the right framing, and the Glasswing leak is a caution about implementation rather than a refutation of the strategy. The asymmetry window is still there. It&#8217;s just smaller than Anthropic wanted it to be.<\/p>\n<h2>Sources<\/h2>\n<ul>\n<li><a href=\"https:\/\/red.anthropic.com\/2026\/mythos-preview\/\">Claude Mythos Preview \u2014 red.anthropic.com<\/a><\/li>\n<li><a href=\"https:\/\/www.aisi.gov.uk\/blog\/our-evaluation-of-claude-mythos-previews-cyber-capabilities\">Our evaluation of Claude Mythos Preview&#8217;s cyber capabilities \u2014 AISI<\/a><\/li>\n<li><a href=\"https:\/\/foreignpolicy.com\/2026\/04\/20\/claude-mythos-preview-anthropic-project-glasswing-cybersecurity-ai-hacking-danger\/\">Anthropic&#8217;s Claude Mythos Preview Changes Cyber Calculus \u2014 Foreign Policy<\/a><\/li>\n<li><a href=\"https:\/\/fortune.com\/2026\/04\/23\/anthropic-mythos-leak-dario-amodei-ceo-cybersecurity-hackers-exploits-ai\/\">A group of users leaked Anthropic&#8217;s AI model Mythos by reportedly guessing where it was located \u2014 Fortune<\/a><\/li>\n<li><a href=\"https:\/\/www.cbsnews.com\/news\/anthropic-investigates-mythos-ai-breach\/\">Anthropic investigating possible breach of its Mythos AI model \u2014 CBS News<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>A little over two weeks ago, Anthropic announced Claude Mythos Preview and put it behind a deliberately small door called Project Glasswing \u2014 a phased rollout to &#8220;critical industry partners and open source developers&#8221; with the explicit goal of giving defenders a head start on a model that, by Anthropic&#8217;s own description, can find and [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-25","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/inhochoi.com\/index.php\/wp-json\/wp\/v2\/posts\/25","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/inhochoi.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/inhochoi.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/inhochoi.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/inhochoi.com\/index.php\/wp-json\/wp\/v2\/comments?post=25"}],"version-history":[{"count":0,"href":"https:\/\/inhochoi.com\/index.php\/wp-json\/wp\/v2\/posts\/25\/revisions"}],"wp:attachment":[{"href":"https:\/\/inhochoi.com\/index.php\/wp-json\/wp\/v2\/media?parent=25"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/inhochoi.com\/index.php\/wp-json\/wp\/v2\/categories?post=25"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/inhochoi.com\/index.php\/wp-json\/wp\/v2\/tags?post=25"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}