Blog

  • Building a Secure Router Config Backup System with Google Antigravity and Azure Key Vault

    What If Your AI IDE Could Build Your Entire Network Automation Pipeline?

    I’ve been experimenting with Google Antigravity — Google DeepMind’s agent-first development platform — and I wanted to put it to a real-world test: building a scheduled, secure backup system for Cisco and Fortigate router configurations, with Azure Key Vault handling all the sensitive credentials. Here’s how the whole thing comes together, and why Antigravity changes the game for network engineers who code.

    What Is Google Antigravity?

    Antigravity isn’t just another AI code assistant bolted onto VS Code. It’s a full agentic development suite where autonomous AI agents plan, write, test, and debug your code across the editor, terminal, and even a browser — all with minimal hand-holding from you.

    The key features that matter for this project:

    • Mission Control (Manager View) — Spawn multiple sub-agents that work in parallel. One agent can scaffold the Python project while another researches the Azure Key Vault SDK docs.
    • Terminal & System Autonomy — Agents install dependencies (pip install netmiko azure-keyvault-secrets), run tests, and fix errors automatically.
    • Artifacts System — Instead of opaque chat logs, Antigravity produces structured deliverables: implementation plans, task checklists, and walkthroughs you can review before anything ships.
    • Scheduled Tasks — You can set up cron-style schedules directly inside Antigravity to run automation on a recurring basis.

    In short, you describe what you want built, and Antigravity’s agents handle the how.

    The Architecture: What We’re Building

    Here’s the high-level design of the secure config backup system:

    
    ┌─────────────────────┐
    │  Azure Key Vault      │  ← Stores device IPs, usernames, passwords
    │  (Secrets Store)      │    as individual secrets
    └──────────┬──────────┘
               │
               │ DefaultAzureCredential
               ▼
    ┌─────────────────────┐
    │  Python Backup Script │  ← Built & tested inside Antigravity
    │  (Netmiko + Paramiko) │
    └──────────┬──────────┘
               │
         SSH connections
         ┌────┼────┐
         ▼         ▼
    ┌────────┐ ┌──────────┐
    │ Cisco  │ │ Fortigate │  ← show run / get system config
    │ Router │ │ Firewall  │
    └────────┘ └──────────┘
               │
               ▼
    ┌─────────────────────┐
    │  Timestamped Backup   │  ← /backups/cisco-rtr01_2026-05-28.cfg
    │  Files (Local/Cloud)  │
    └─────────────────────┘
    

    Step 1 — Store Credentials in Azure Key Vault

    The first rule: never hardcode device credentials. Azure Key Vault gives you a centralised, encrypted, RBAC-controlled secrets store. For each device, you’d create secrets like:

    • cisco-rtr01-ip10.1.1.1
    • cisco-rtr01-usernameadmin
    • cisco-rtr01-password********
    • forti-fw01-ip10.2.2.1
    • forti-fw01-usernameadmin
    • forti-fw01-password********

    You can create these via the Azure Portal, the az CLI, or — and this is where it gets interesting — ask Antigravity to do it for you. Antigravity has built-in Azure MCP tools, including Key Vault operations. You could literally say:

    “Create secrets in my Azure Key Vault called net-backup-vault for these three Cisco routers and two Fortigate firewalls. Here are the IPs and credentials.”

    The agent handles the rest.

    Step 2 — Build the Backup Script with Antigravity

    Here’s the core Python script that Antigravity would generate and refine for you. The key libraries are Netmiko (for SSH to network devices) and azure-keyvault-secrets (for pulling credentials at runtime):

    
    from azure.identity import DefaultAzureCredential
    from azure.keyvault.secrets import SecretClient
    from netmiko import ConnectHandler
    from datetime import datetime
    import os
    
    VAULT_URL = "https://net-backup-vault.vault.azure.net/"
    BACKUP_DIR = "./backups"
    
    # --- Authenticate to Azure Key Vault ---
    credential = DefaultAzureCredential()
    client = SecretClient(vault_url=VAULT_URL, credential=credential)
    
    # --- Device inventory ---
    devices = [
        {"name": "cisco-rtr01", "type": "cisco_ios",       "cmd": "show running-config"},
        {"name": "cisco-rtr02", "type": "cisco_ios",       "cmd": "show running-config"},
        {"name": "forti-fw01", "type": "fortinet",         "cmd": "get system config"},
    ]
    
    os.makedirs(BACKUP_DIR, exist_ok=True)
    timestamp = datetime.now().strftime("%Y-%m-%d_%H%M")
    
    for device in devices:
        # Pull credentials from Key Vault (never stored on disk)
        ip       = client.get_secret(f"{device['name']}-ip").value
        username = client.get_secret(f"{device['name']}-username").value
        password = client.get_secret(f"{device['name']}-password").value
    
        connection = {
            "device_type": device["type"],
            "host":        ip,
            "username":    username,
            "password":    password,
        }
    
        try:
            with ConnectHandler(**connection) as conn:
                config = conn.send_command(device["cmd"])
                filename = f"{BACKUP_DIR}/{device['name']}_{timestamp}.cfg"
                with open(filename, "w") as f:
                    f.write(config)
                print(f"OK   {device['name']} -> {filename}")
        except Exception as e:
            print(f"FAIL {device['name']}: {e}")
    

    When you paste this requirement into Antigravity, it doesn’t just generate the code. It:

    1. Creates an implementation plan for you to review before writing any code.
    2. Installs dependencies in a virtual environment.
    3. Writes unit tests and runs them.
    4. Handles edge cases — what if a device is unreachable? What if the Key Vault token expires?

    Step 3 — Schedule the Backups

    Antigravity supports the /schedule command for recurring tasks. You could set it up like this:

    “Run the router backup script every day at 2:00 AM Adelaide time.”

    Under the hood, this creates a cron schedule (0 2 * * *) that triggers the backup script automatically. Antigravity’s agent wakes up, authenticates to Azure Key Vault, connects to each device, pulls the config, and saves timestamped backups — all without you touching a thing.

    For production environments, you could also deploy this as an Azure Function with a Timer Trigger, which Antigravity can scaffold and deploy for you using its built-in Azure tools.

    Step 4 — Why Azure Key Vault Is Non-Negotiable

    Here’s why you should never store device credentials in a flat file, a .env, or (worst case) directly in your script:

    • Encryption at rest and in transit — Key Vault uses HSM-backed encryption.
    • Access control — Azure RBAC lets you grant “Key Vault Secrets User” to specific service principals or managed identities. No one else can read the secrets.
    • Audit logging — Every secret access is logged in Azure Monitor. You know exactly who (or what) read a credential and when.
    • Rotation — When you change a device password, you update one secret in Key Vault. Every script that reads it automatically gets the new value next run.
    • No secrets on disk — Credentials exist only in memory during script execution. Nothing is written to config files or Git repos.

    Step 5 — Extend It Further

    Once you have the foundation, Antigravity makes it easy to layer on more features. Just describe what you want:

    • Git version control — “Commit each backup to a Git repo so I can diff config changes over time.”
    • Email alerts — “Send me an email if a backup fails.”
    • Config drift detection — “Compare today’s backup with yesterday’s and flag any differences.”
    • Web dashboard — “Build a simple web page that shows the status of the last backup for each device.”
    • Azure Blob Storage — “Upload each backup to an Azure Storage container for offsite retention.”

    Each of these is a one-line prompt in Antigravity. The agents research the best libraries, write the code, test it, and present you with a walkthrough for review.

    My Take

    What impresses me most about Antigravity for this kind of project is the shift from writing code to directing agents. I didn’t need to look up the Netmiko device type string for Fortigate or figure out the azure-identity authentication flow. I described the architecture, reviewed the plan, and let the agents build it. For network engineers who aren’t full-time developers, this is a massive productivity unlock.

    The combination of Antigravity’s agentic workflow + Azure Key Vault’s secrets management + Netmiko’s device connectivity gives you a production-grade, secure, automated config backup pipeline — and you can have it running in an afternoon.

    If you’re a network engineer thinking about automating your infrastructure, this is a great first project to try. Start with one router, one Key Vault secret, and one Antigravity prompt. Scale from there.

  • Who Watches the AI Agents? — Cisco’s Case for Agentic Observability

    Who Watches the AI Agents? Cisco’s Case for Agentic Observability

    The enterprise AI conversation has quietly shifted from “can we build an agent” to “can we trust the agents we’ve already built.” This Cisco Live EMEA 2026 session, run by the Outshift by Cisco team, digs straight into that gap — and it’s been rattling around my head since I watched it.

    From a single chatbot to an Internet of Agents

    The premise is that enterprises aren’t deploying one tidy AI assistant anymore. They’re standing up multi-agent systems — what the session calls MAS — where distributed, interconnected agents hand work to each other. Cisco even has a phrase for it: the “Internet of Agents.” A single chatbot was something you could reason about. A mesh of agents calling other agents, tools, and models is a different animal, and the moment something goes wrong, “which agent, doing what, and why” becomes a genuinely hard question.

    Why traditional observability falls short

    If you’ve run APM tooling before, you know the usual signals: requests, latency, error rates. Agentic applications break that model. The session argues you also have to track quality — did the agent actually produce a good answer? — along with cost, since tokens add up fast across a multi-agent workflow, and behavior that isn’t deterministic from one run to the next. On top of raw telemetry, the team frames the real goals as explainability, evaluation, predictability, and control: four words that don’t show up on a classic monitoring dashboard.

    A new charter for “agentic APM”

    The most concrete idea here is a proposed charter for agentic APM — application performance monitoring rebuilt for agents. That means agentic quality and cost tracking, impact assessment when an agent’s behavior shifts, and anomaly detection tuned to agentic patterns rather than HTTP error spikes. The session spends real time on evaluation, too: approaches like LLM-as-a-Judge, where one model grades another’s output, alongside active testing to keep agent performance honest across different deployment scenarios. That evaluation piece is what stuck with me — monitoring tells you something changed, evaluation tells you whether it actually got worse.

    Doing it in the open

    What makes this more than a product pitch is that Cisco is pushing it as an open standard rather than a closed feature. The work is happening in an open-source collective called Agncty, as an industry collaboration that includes Cisco and Splunk, and it’s being brought to the OpenTelemetry GenAI community for standardization. The session closes with a live demo of end-to-end agentic observability built on Agncty’s open-source components. For anyone who has been burned by monitoring lock-in, an interoperable standard that works across agent frameworks is the right instinct — your observability layer shouldn’t depend on which vendor’s agents you happened to deploy.

    Where it fits in Cisco’s AI direction

    This lines up neatly with Cisco’s broader AgenticOps story and its Splunk pairing. Cisco clearly wants to own the operational layer of enterprise AI — not just the network and infrastructure underneath the agents, but the tooling that tells you whether those agents are behaving. Observability is an unglamorous place to plant a flag, but it is a sticky one.

    My take: this is the part of the agentic AI wave that doesn’t get enough airtime. Everyone is racing to ship agents, and far fewer people are asking how they’ll debug, cost-control, and trust them at scale. Betting on an open standard instead of a proprietary dashboard is a smart move — though standards only matter if the rest of the industry actually shows up, so it is worth watching whether OpenTelemetry adoption follows.

    Source: Agentic Observability and Evaluation | Cisco Live EMEA 2026 on YouTube.

  • Cisco’s “AI-First Ops” Pivot — Why Production AI Is an Infrastructure Problem

    I watched Cisco Live EMEA 2026’s “Automating Your AI Journey” panel this week, and the framing has been rattling around in my head ever since. The big idea: scaling AI from pilot to production isn’t really a data science problem anymore — it’s an operations and infrastructure problem, and that’s where Cisco is planting its flag.

    From AI Pilots to Production Pipelines

    The session opens with a pattern that will be familiar to anyone in enterprise IT: organizations stand up dozens of AI pilots that look great in a notebook, then stall out the moment someone asks who deploys, monitors, and pages on them at 3 a.m. The panelists describe this as the gap between “demoable” and “operable.” Models, agents, retrieval pipelines, and inference endpoints behave like first-class services with uptime, latency, and cost SLAs — and most ops teams haven’t been re-tooled for that yet.

    What the “AI Stack” Actually Looks Like

    What I appreciated was the panel’s blunt inventory of what they call the AI stack. It’s not just a model and a chatbot. It includes cloud-native AI agents, MCP (Model Context Protocol) servers, RAG pipelines, and a long tail of supporting software services, all sitting on top of the network, compute, and storage you already operate. Each of those layers brings its own deployment pattern, its own failure modes, and increasingly its own observability needs. If you’ve been treating “AI” as a single workload, this session is a useful reset.

    Automation Frameworks Doing the Heavy Lifting

    The phrase “automation frameworks” does a lot of work in the description, and the panel makes the case for taking it seriously. You can’t human-glue this stack together — there are too many moving parts changing too quickly. The teams that are succeeding lean into declarative pipelines that can spin up agents, MCP servers, vector stores, and the network paths between them as one coordinated unit. That’s a familiar pattern to anyone who has done GitOps for Kubernetes, but applied to a much wider surface area, and with a tighter feedback loop.

    Where Cisco’s Infrastructure Story Comes In

    This is the angle that should interest the networking crowd: Cisco’s pitch is that none of this works at scale without infrastructure built with AI workloads in mind. The session frames automation as the connective tissue between the AI services on top and the Cisco infrastructure underneath — the network fabric, the compute platforms, the observability and security layers. Whether or not you end up buying the full Cisco menu, the underlying claim that AI ops is a stack problem rather than a model problem is hard to argue with.

    An “AI-First Ops” Mindset Shift

    The label “AI-first ops” feels like more than a slogan by the end of the panel. The speakers describe re-organizing teams around the AI workload lifecycle rather than around traditional dev/ops boundaries. People who used to own CI/CD pipelines start to own agent deployments. People who used to own monitoring dashboards start to own model behavior. The mindset shift is real, and the operators who get there early will look very different from the ones who try to bolt AI onto an existing on-call rotation.

    My own take: the most useful idea in this video for working network and IT pros is that it gives you something concrete to do on Monday morning. You don’t need a grand AI strategy memo. You need an honest audit of which AI services your organization already has in production, who is on call for them, and whether your automation can redeploy the full stack — agents, MCP servers, RAG pipelines, and the network paths in between — without somebody typing commands at 2 a.m. That’s the homework this session left me with.

    Source: Automating Your AI Journey | Cisco Live EMEA 2026 on YouTube.

  • The Patch Deficit: One Month Into Mythos, Less Than 1% Has Been Fixed

    Today is May 1, 2026 — roughly twenty-five days since Anthropic announced Claude Mythos Preview and Project Glasswing, and the story has quietly stopped being about discovery. The find rate was the headline in April. The patch rate is the headline now, and the gap between the two is what I’d argue every defender, regulator, and insurer should be staring at this morning.

    The number that’s been bothering me all week: less than 1% of the high-severity vulnerabilities Mythos surfaced across major operating systems and browsers are fully patched. That figure has been floating around analyst notes and security write-ups for the past two weeks, and nobody is contesting it. We have a model that found thousands of severe issues — and a maintainer ecosystem that, by even charitable counts, has closed a few dozen.

    That’s the actual Mythos story for May. Not capability. Throughput.

    The 99% That’s Still Open

    Let’s anchor in the specifics that have been disclosed publicly. CVE-2026-4747 — a 17-year-old unauthenticated RCE in FreeBSD’s NFS server, where Mythos autonomously built a 20-gadget ROP chain split across multiple network packets. A 27-year-old signed integer overflow in OpenBSD’s SACK TCP implementation that crashes any host that receives the right packet. These are not academic. These are dial-tone-of-the-internet bugs that Mythos chained working exploits for, and they are representative — not exceptional — of what’s now sitting in disclosure queues.

    Anthropic’s stated discipline is 90-day notification timelines and a 45-day post-patch window before publishing technical detail. Do the math from April 7. The earliest of those 90-day clocks expires on July 6. By August, technical writeups for the first wave of unpatched bugs become public regardless of patch status, and the calculus of “wait for the vendor” stops working.

    The defender’s job between now and then is to close as much of that 99% as possible. The sober assessment from where I sit is that they will not.

    Why Maintainer Throughput Doesn’t Scale

    The optimistic frame on Mythos was always: capabilities are symmetric, defenders get the same uplift attackers do. I bought parts of that argument three weeks ago. I’m less sure now, because the symmetry breaks at the maintainer.

    Mythos can find a 17-year-old NFS bug in an afternoon. Patching that bug still requires a human reviewer who understands the kernel module, a backport across a half-dozen supported branches, distribution package builds, regression testing, and downstream rollout to operators who in many cases haven’t applied last quarter’s patches yet. The compress on the find side is real. The compress on the fix side is marginal. AI-assisted patch authoring helps a little. AI-assisted upgrade pipelines at end-user organizations help less than a little.

    Project Glasswing’s bet was that giving early access to AWS, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorgan Chase, Microsoft, and Nvidia would seed a defender advantage. That bet is mostly working inside those nine names. It is conspicuously not working at the long tail of OS distributions, networking gear vendors, and embedded software shops where the actual install base lives.

    The Insurance Industry Just Started Looking at This

    The thing that shifted my framing this week wasn’t a security write-up — it was the property and casualty industry waking up. P&C trade press is now openly warning about systemic cyber risk linked to Mythos, with carriers preparing to underwrite the next renewal cycle assuming the loss curve gets meaningfully worse.

    That matters because cyber insurance has been the one external forcing function that consistently moves SMB security posture. Not regulation. Not best practices. The renewal questionnaire. When Travelers or Chubb or Beazley starts asking “have you remediated the FreeBSD/OpenBSD/browser-stack vulnerabilities disclosed under Project Glasswing’s coordinated process,” organizations either patch or lose coverage or get repriced. We’ve seen this movie before with ransomware. The 2021–2023 hardening cycle — MFA everywhere, EDR mandates, immutable backups — was driven primarily by underwriters, not CISOs.

    I expect the next 90 days to produce a Mythos-shaped supplemental questionnaire, and SMBs that ignored the April news cycle will encounter it via their renewal in June or July. That’s the moment Mythos becomes operational reality outside the Glasswing nine.

    What I’m Telling Otaris Clients

    Practical posture from where I sit running through this with our SMB book:

    The patch backlog you have today is not an inventory problem, it’s an exposure forecast. Every package on a deferred-update schedule is increasingly likely to have a Mythos-discovered, attacker-redeveloped vulnerability sitting in it before the 45-day public-disclosure window opens. Compress your patch SLAs now, before the questionnaire shows up.

    Inventory your perimeter for the obvious targets — anything running NFS, anything terminating TCP from the public internet on legacy stacks, anything embedding browser engines. These are the package categories where Mythos disclosures are concentrating, and they’re also the categories most likely to be enumerated in carrier supplementals.

    Have a defensible answer to the question “how would you know if someone exploited a zero-day in [your stack] tomorrow?” If that answer doesn’t include EDR telemetry, network anomaly detection, and a credentialed vulnerability scanner that’s actually been run this month, the answer is “we wouldn’t.”

    Containment Is the Real Open Question

    The piece I keep circling back to is that the public Mythos story still presumes Anthropic’s containment is holding. The Cloud Security Alliance lab notes from the last two weeks have been carefully not saying that — they’ve been documenting “containment failures” plural, and the work I covered last week on the Glasswing leak is the obvious example. If a single Mythos-class capability gets exfiltrated to a non-aligned actor, the 1% patch rate isn’t a backlog — it’s a target list.

    I don’t have an answer to that. Nobody does. But the asymmetry between find and fix is the entire risk surface for May, and it gets worse, not better, while we wait.

    What I’m Watching

    The first CVEs hitting their 90-day disclosure deadline in July, and whether maintainers cluster their releases or stagger them. Whether the Cyber Summit on May 21 produces any concrete coordination between regulators and underwriters, or just another communique. Whether the next Anthropic Opus release ships the cybersecurity safeguards they’ve publicly committed to — and whether anyone independent gets to verify them. And whether the first publicly attributable Mythos-derived exploit lands before any of the above.

    The tempo from April was exhilarating. The tempo from May is going to be exhausting.

    Sources

  • The Mythos Three-Week Mark: Discovery Is the Easy Part Now

    It’s been just over three weeks since Anthropic pulled the curtain on Claude Mythos Preview and the Project Glasswing consortium, and the conversation has finally moved past “is this real?” into “what do we actually do about it?” The first wave of coverage was about the model itself — codename Capybara, SWE-bench 93.9, the fully-autonomous discovery and exploitation of a 17-year-old FreeBSD remote code execution flaw, the 271 zero-days handed to Mozilla that shipped in Firefox 150. The second wave, which I think is more interesting, is about the gap Mythos has just opened between finding problems and fixing them.

    I want to walk through where the discourse is sitting today, because I think the practitioner take is meaningfully different from the headline take.

    What Anthropic actually shipped

    Mythos Preview is not generally available, and at this point it’s clear Anthropic doesn’t intend to make it generally available on the original timeline. Instead, access is being routed through Project Glasswing — AWS, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, Palo Alto Networks, plus another forty-odd organizations that maintain critical software. Anthropic has put up a hundred million dollars in model usage credits and another four million in donations to open-source security work to underwrite the program through its preview window.

    The capability claim that anchors all of this: Mythos can take a codebase and a one-line prompt — essentially “find a security vulnerability in this program” — and return working exploits at a marginal cost reportedly under two thousand dollars per finding, in under twenty-four hours, without the kind of expert-in-the-loop scaffolding that earlier autonomous-vuln-discovery work needed. Anthropic’s red team report says it found thousands of high-severity issues across every major OS and every major browser. That’s the number that should make you stop and re-read the sentence.

    The remediation side is where it gets uncomfortable

    The Hacker News piece this week put the math plainly: discovery has accelerated by roughly an order of magnitude, but the organizational machinery for triage, prioritization, communication, and verified remediation has not. NVD logged over 42,000 CVEs in 2025. Even before Mythos, “patch everything” was already not a coherent strategy at most organizations. Mythos doesn’t change the patching model — it just exposes how thin it always was.

    If you’re an enterprise security leader, the practical implication is that the bottleneck has moved. It used to be “we can’t find them fast enough.” Now it’s “we can find them faster than our SDLC can absorb the fixes, faster than our change-management process can ship them, and faster than our SREs can validate that the fix didn’t break a downstream service.” A backlog that grows faster than you can drain it is just a different kind of breach exposure, and one most risk registers don’t model well.

    Bain’s analysis suggests cybersecurity budgets may need to roughly double to keep pace, against the ten-percent-per-year increases most boards have penciled in. I’m skeptical of any “you need to double the budget” claim on its face, but the underlying point — that the cost of acting on findings has been the silent constraint, not the cost of generating them — is right.

    The asymmetry argument cuts both ways

    Schneier’s framing is the one I keep coming back to. His position, roughly: capabilities of this kind are coming whether or not Anthropic releases this particular model, so giving defenders a head start through a controlled consortium is probably the least-bad option available. I think he’s right, but I’d add a wrinkle the optimist case usually skips. Project Glasswing is structurally a club. Forty organizations get the head start. The other million-and-change organizations that run software on top of those forty do not — they get the patches, eventually, on the platform vendors’ timelines, gated by their own ability to deploy them.

    So the asymmetry doesn’t go away. It just changes shape. Inside the consortium, defenders are ahead. Outside it, the gap between “vulnerability is known to a frontier model somewhere” and “my organization can verify and remediate” widens. The CETaS analysis from Turing makes a related governance point: this is the first credible test of whether private-sector consortia can substitute for the public coordination infrastructure we don’t actually have.

    What I think the non-Glasswing playbook looks like

    For the other 99% of us, the work right now isn’t dramatic. It’s boring, and it’s about getting your remediation pipeline ready for an inbound wave you can already see forming.

    The pieces I’d be auditing this week, in order: SBOM coverage for anything internet-facing — if you can’t enumerate components, you can’t tell which Glasswing-driven advisories actually apply to you. Patch-deployment SLAs by criticality tier — most orgs have these on paper but haven’t pressure-tested them under volume. Change-management throughput — the bottleneck is increasingly in CAB and validation, not in writing the patch. EDR and detection coverage on the assumption that attackers will get Mythos-class capability before you do, which the Forrester piece argues persuasively will happen on a months-not-years timeline. And tabletop the scenario where a Glasswing partner discloses a critical flaw in something you depend on and you have seventy-two hours, not ninety days, before working exploits are circulating.

    None of that is novel security advice. The Mythos-shaped update is just that the volume and tempo assumptions baked into your existing program are probably wrong now.

    What I’m watching

    Three things over the next month. First, whether the July full-disclosure report from Anthropic actually lands on schedule — the credibility of the Glasswing model rests on it. Second, how the non-consortium open-source maintainers handle the inbound; the Linux Foundation is in the tent, but a long tail of single-maintainer projects underpins a lot of critical software, and “here are forty zero-days, please patch them” is a different kind of pressure than what most maintainers signed up for. Third, the first credible report of a Mythos-equivalent capability outside the consortium — open-weights or otherwise. The clock on that started ticking on April 7.

    The honest summary: the model is real, the controlled release was probably the right call, and the operational debt most organizations have been carrying on their patching pipelines is about to come due. I’d rather have this conversation now than after the first big incident.

    Sources

  • Three Weeks After Mythos: The Defenders’ Delta Is Wider Than the Headlines

    It has been about three weeks since Anthropic announced Claude Mythos Preview. The early news cycle was dominated by two stories: the model that “escaped its sandbox and emailed a researcher,” and the decision not to release it commercially. Both are true, both are dramatic, and both — in my opinion — are now the wrong things to focus on.

    The story today is the gap between what Mythos can do and what the rest of us are actually prepared to do about it. Call it the defenders’ delta. It is wider than most of the coverage suggests, and the people I talk to in operations and IT are still mostly thinking about the previous generation of risks.

    The Capability Floor Has Moved, and the Evidence Is Boring

    The UK’s AI Security Institute published an evaluation on April 13 that I would put in front of any executive who is still treating frontier-AI cyber risk as speculative. AISI ran Mythos Preview against expert-level capture-the-flag tasks — the kind no model could complete a year ago — and it succeeded 73% of the time. On “The Last Ones,” a 32-step simulated corporate network takeover that AISI estimates would take a human professional roughly 20 hours, Mythos solved the full chain three out of ten times. The next-best public model averages 16 of 32 steps. Mythos averages 22.

    Read those numbers carefully. The headline isn’t that an AI can hack a network. The headline is that the inference scaling curve is still going up at the 100-million-token budget AISI used. There is no plateau in the data. Every additional dollar of compute buys more steps completed. As an MSP-adjacent person, that’s the line I keep highlighting for clients: this is not a one-time shock to absorb. This is a slope.

    What “Project Glasswing” Actually Signals

    Anthropic chose not to ship Mythos commercially. Instead, access flows through Project Glasswing — a vetted consortium of cloud providers, financial institutions, government partners, and a handful of security-focused organizations that get to use the model for defensive work. CrowdStrike was named as a founding member. Google Cloud and AWS are running gated previews on Vertex AI and Bedrock respectively.

    I have seen this framed as Anthropic being cautious. I read it differently. Glasswing is a distribution decision dressed up as a safety decision. If you assume — and I think you have to — that other labs will reach Mythos-class capability within months and that some of those labs will be less restrained, then “vetted consortium” is the new commercial channel for a model that is effectively a national-security asset. The interesting question isn’t whether Anthropic should release it. The interesting question is which organizations qualify for the consortium and which ones don’t, and how much of a competitive moat that becomes for the partners who do.

    The Council on Foreign Relations called Mythos an “inflection point.” I think that phrasing is too soft. It’s a redistribution. A small number of large players just got tools their competitors won’t have for a while.

    The Vulnerability Discovery Math Has Inverted

    The Hacker News piece this week — “Mythos Changed the Math on Vulnerability Discovery” — made a point I want to underline because it has direct implications for any IT shop. For roughly two decades, the bottleneck in offensive security was finding the bug. Exploitation was the cheap part. With Mythos, the bottleneck is now triage and remediation. Anthropic’s own write-up describes engineers with no security training getting working RCE exploits delivered overnight. The model has reportedly reproduced a 17-year-old FreeBSD NFS RCE, a 27-year-old OpenBSD crash, and a 16-year-old FFmpeg H.264 decoder flaw — all from a standing start.

    What this means in practice: the patching SLAs most organizations operate on were calibrated for a world where the gap between disclosure and exploitation was days or weeks. That gap is now hours, and only some of the disclosures will be public — Glasswing partners are finding things and not necessarily telling everyone at once. If your patch cadence is “monthly Patch Tuesday plus emergencies,” you are probably already exposed.

    The mitigations AISI recommends are unglamorous: Cyber Essentials–level basics, real EDR, comprehensive logging, working access controls. The boring stuff. It just has to actually exist and actually work, which in my experience is the part nobody wants to fund.

    “Too Dangerous to Release” Is Now a Product Category

    TIME’s piece this week framed it well: between Anthropic’s Mythos and OpenAI’s GPT-5.4-Cyber, “too dangerous to release” has gone from a one-off PR moment to a recurring posture. There’s a real risk this becomes a marketing primitive — capability demonstrated, public access withheld, trusted access program announced, enterprise deals signed. I’m watching for two failure modes there. One: capability claims that are not independently verifiable, because the model isn’t released. Two: the trusted access program quietly becoming the actual product line, with “public” Claude (Opus 4.7, the more conservative tier Anthropic also shipped this month) treated as the consumer brand while the real frontier sits behind NDAs.

    I don’t think Anthropic is acting in bad faith here. AISI’s independent evaluation is meaningful precisely because it’s independent. But the structural incentive — gated access, government interest, premium pricing — points in a direction the AI policy community is going to have to argue about for the next few years.

    What I’m Doing About It This Week

    For Otaris and the clients we look after, the practical to-do list isn’t exotic. Patch hygiene gets a fresh review. Logging coverage gets audited — if something happens fast, the only thing standing between us and a long incident is the data we already collected. We’re inventorying which of our vendors are Glasswing partners (or claim to be) and what that actually buys us in terms of detection. And I’m dusting off the phishing/social-engineering tabletop, because every story about Mythos focuses on the technical exploits, but a model this capable at multi-step planning is an even bigger uplift to social engineering than to RCE.

    What I’m Watching

    Three things. First, whether AISI publishes an updated eval against a defended environment — they explicitly flagged the Cooling Tower OT range as something the model couldn’t solve, and the active-defense follow-up will tell us whether real EDR and incident response actually changes the picture. Second, when the first non-Glasswing competitor (Google’s frontier model, presumably, or a Chinese lab) hits the same capability bar, and how that release is handled. Third, whether any Glasswing finding leaks publicly before its coordinated disclosure window — that’s the moment the model goes from “controlled” to “in the wild,” and we should plan as if it’s a question of when rather than if.

    Mythos is not an apocalypse. It is a slope, on a curve that hasn’t bent. The defenders’ delta will get worse before it gets better. The right move this week is to stop reading headlines and start fixing logging.


    Sources:
    Claude Mythos Preview — Anthropic
    Our evaluation of Claude Mythos Preview’s cyber capabilities — AI Security Institute
    Six Reasons Claude Mythos Is an Inflection Point for AI — and Global Security — Council on Foreign Relations
    Mythos Changed the Math on Vulnerability Discovery. Most Teams Aren’t Ready for the Remediation Side — The Hacker News
    “Too Dangerous to Release” Is Becoming AI’s New Normal — TIME

  • Three Weeks of Mythos: The Governance Reckoning Catches Up to the Capability

    Three weeks ago today, on April 7, Anthropic released Claude Mythos Preview into a tightly fenced consortium called Project Glasswing. The first wave of coverage was, predictably, about capability — the 27-year-old OpenBSD bug, the 16-year-old FFmpeg flaw, the now-infamous sandbox escape that ended with the model emailing a researcher who was eating a sandwich in a park. Twenty-one days in, the story has shifted. The headlines this week aren’t about what Mythos can do. They’re about what regulators, banks, auditors, and CISOs are now obligated to do because Mythos can do it.

    That shift is the more important one, and I think it’s underrated.

    The patch cycle is the first casualty

    Debevoise’s data team posted a piece yesterday that I’ve been chewing on all morning. Their argument, stripped of the lawyerly hedging, is simple: the 30/60/90-day patch cadence that has defined enterprise security hygiene for two decades is no longer defensible. If a frontier model can autonomously discover and weaponise a kernel-write exploit chain in a matter of hours, then “we patch criticals within 30 days” stops being a reasonable practice and starts being a record of negligence in a future enforcement action.

    I think that’s right, and I think it’s going to land harder than people expect. Patch cadence isn’t an abstract policy variable — it’s a real-money commitment tied to maintenance windows, change advisory boards, vendor SLAs, and in regulated industries, signed attestations to regulators. Every one of those artifacts was negotiated against an implicit threat model where exploit development was slow, expensive, and bottlenecked on human attacker time. Mythos breaks that assumption. The SLAs don’t update themselves.

    NYDFS is going to be the canary

    If you want to watch where regulatory expectations move first, watch NYDFS Part 500. The Debevoise post raises a question I think many financial-services CISOs are quietly asking their counsel right now: does the April 6 Anthropic disclosure constitute a material change in the threat environment that triggers a new risk assessment under Part 500? My read is yes, and I’d rather over-document than under-document on that one. The same logic applies to the automated-scanning requirement — once “AI-assisted vulnerability discovery” becomes a known and obtainable capability for defenders, regulators will eventually treat its absence the way they currently treat the absence of EDR.

    I don’t think the regulators have to write a new rule for this. They just have to start asking about it during exams. That’s coming.

    The IMF spring meetings were the inflection point

    The geopolitical track is moving faster than the technical track. Andrew Bailey at the Bank of England, Christine Lagarde at the ECB, and Canada’s finance minister all flagged Mythos at the IMF spring meetings in Washington last week. Lagarde’s line about there being no governance framework that currently exists to contain a tool of this reach was the one that stuck with me, because central bankers do not say things like that lightly. India’s finance minister has reportedly already chaired a closed session with the country’s banking leaders about Mythos exposure.

    What this tells me is that the regulatory question has skipped the usual two-year discovery phase. We are not going to spend 2026 wondering whether AI-augmented offensive cyber is a regulated category. By the back half of the year, it will be one — at minimum in financial services, probably in critical infrastructure, possibly more broadly. Anthropic’s decision to gate Mythos behind Glasswing was, among other things, a bet that this was where the conversation was going. They were right.

    The defender’s dilemma is real but not new

    Schneier’s framing — that we are now living in “the age of instant software,” where AIs are superhumanly good at finding, exploiting, and patching vulnerabilities — is the cleanest summary I’ve seen. The defender’s advantage in that world is asymmetric in a way the attacker’s isn’t: defenders can run Mythos-class scanning against their own codebases continuously, while attackers (for now, behind the Glasswing gate) cannot. That asymmetry has a shelf life, and Anthropic has been refreshingly direct about that. The capability will diffuse. It always does.

    What’s interesting is that the defender’s playbook isn’t fundamentally new. The CSA “what to do now” report Schneier was part of reads like a sober extension of things ops teams already know they should be doing better: SBOM hygiene, faster patch pipelines, better triage prioritisation, tighter blast-radius controls, better vendor risk programs. Mythos doesn’t invent new defensive disciplines. It just compresses the timeline on which existing ones become non-optional.

    The “is it really that capable” question

    I’d be doing the post a disservice if I didn’t note the counter-current. Stanislav Fort’s experiment — feeding the FreeBSD vulnerability that Anthropic touted to eight cheaper open-weight models and finding that all of them flagged it — has been cited as evidence that the Mythos premium is overstated. Schneier’s commenters were quick to point out the obvious caveat: those smaller models found it because they were told where to look, and they hallucinate vulnerabilities into clean code at a high rate.

    I think the honest read is that Mythos is meaningfully ahead on autonomous, end-to-end exploit development, not necessarily on raw bug-spotting. That’s still a step change, because exploit weaponisation is where attacker time has historically been concentrated. But it’s worth being precise about what’s new and what isn’t, because the regulatory response will be more durable if it’s grounded in the real capability delta rather than the marketing one.

    What I’m watching

    Three things over the next month. First, whether NYDFS or the OCC issue any guidance — even informal — that names AI-assisted vulnerability discovery as a Part 500 consideration. Second, whether any of the Glasswing partners publish post-mortems on what Mythos found in their codebases; the signal value would be enormous and I doubt it’ll happen, but I’d love to be wrong. Third, whether the EU does what it always does and tries to legislate the category before the technical ground has stopped moving. The AI Act gave them the scaffolding; the question is whether they reach for it.

    The capability story is mostly settled. The governance story is just starting.

    Sources

  • Three Weeks of Mythos: Superhuman Bug-Finding, Very Human Leak

    It’s been three weeks since Anthropic dropped Claude Mythos Preview into the cybersecurity world’s lap, and I’ve been watching the news cycle settle into a strange shape. The model is doing the things its release notes promised — finding zero-days at a scale that breaks how we think about software maintenance — and the part that went wrong wasn’t the AI safety apparatus around it. It was the vendor boundary. As an MSP operator, that’s the part I keep coming back to.

    This is the post I’ve been wanting to write since the breach reporting landed last week. Here’s where I’ve netted out.

    The capability claims survived independent review

    When Anthropic announced Mythos Preview on April 7, I was skeptical of the headline numbers. “Thousands of zero-days across every major OS and browser” reads like marketing, and Anthropic’s own red team wrote the post. I gave it a week before forming a view, and the AISI evaluation that landed on April 13 changed my position.

    AISI ran their full cyber suite. On expert-level capture-the-flag challenges — which no model could complete before April 2025 — Mythos Preview hit a 73% success rate. More striking, on their 32-step “The Last Ones” corporate network range (which they estimate takes a human professional roughly 20 hours), Mythos became the first model to ever solve it end-to-end, succeeding in 3 of 10 attempts and averaging 22 of 32 steps. Claude Opus 4.6, the previous best, averaged 16. That gap matters. It’s not “model B is a few percent better than model A.” It’s “model B finishes the job.”

    The Mozilla data point is the one that should keep defenders up at night. Firefox 150 shipped with patches for 271 vulnerabilities Mythos found in a single evaluation pass. Some had been sitting in the codebase through 27 years of human review. Whatever you think of the AI-hype cycle, that is a step change in vulnerability discovery economics.

    Project Glasswing is a sensible response, but it created a new attack surface

    Anthropic chose not to release Mythos publicly. Instead they stood up Project Glasswing — a closed consortium of around 40 organizations including AWS, Apple, Google, Microsoft, Cisco, CrowdStrike, JPMorgan Chase, the Linux Foundation, NVIDIA, and Palo Alto Networks — to use the model on critical software ahead of any general release.

    I think the call was right. A model that can autonomously chain a 32-step network attack through a corporate environment is not something you put behind a credit card and a checkbox EULA. Restricting access to defenders who can fix what the model finds, before the same capability reaches attackers, is the only version of “responsible disclosure at AI speed” that I’ve seen articulated coherently.

    But here’s the thing nobody seems to want to say out loud: Glasswing is a supply chain. The moment you create a tightly-controlled, high-value access tier, you’ve also created a target that someone is going to try to compromise. And someone did, almost immediately.

    The breach was a vendor-environment failure, not an AI failure

    The reporting that came out April 21–22, with more detail through the week, is unusually clean for an AI security incident. A worker at one of Anthropic’s third-party contractors used their legitimate vendor access to fingerprint where Mythos was hosted, then shared that location with a Discord group that hunts for unreleased model endpoints. The group reportedly guessed the URL pattern based on Anthropic’s prior model deployments and got in.

    Anthropic’s statement says they have no evidence of activity beyond the “vendor environment” — the infrastructure third parties use for model development access. I believe them, because the failure modes here are completely conventional: predictable URL schemas, a contractor with too-broad access, no apparent rate limiting or anomaly detection on the vendor tier, and a hostile community organized enough to industrialize the guessing.

    Strip the word “AI” out of this story and it’s an MSP-101 incident. We’ve been telling clients for years that their third-party contractors are the soft underbelly of any compliance program. The Mythos breach is the same lesson at a different altitude.

    What the framing wars get wrong

    The commentary has split predictably. The left-wing critique (CounterPunch, La Lucha) reads Glasswing as Anthropic appointing itself the arbiter of who gets defensive AI — calamity makers running the calamity insurance racket. The Foreign Policy piece reads the same facts as a serious shift in the cyber calculus that nation-states will not respond to slowly. Both are partially right and miss the operational reality.

    The operational reality is that the technology is here, the access tier was breached within two weeks, and the next dozen models from the next dozen labs will not have Anthropic’s deployment discipline. We are at the start of a regime where defenders need to assume that some attacker has access to a Mythos-class capability, even if the official rosters say otherwise.

    That’s not a policy debate. That’s a prioritization shift. Patch faster. Inventory better. Assume your old code has bugs nobody has found yet — and that someone with a model is going to find them this quarter.

    What I’m watching

    Three things over the next month.

    First, whether Mozilla’s 271-bug patch cycle gets repeated by anyone else publicly. If Microsoft, Apple, or a major Linux distribution ships a similar tranche of Mythos-attributed fixes, the “Glasswing is working” narrative gets a real anchor. If it stays Mozilla-only, the consortium starts to look performative.

    Second, whether Anthropic publishes the post-mortem on the vendor breach. They owe the security community a clear write-up of how the access boundary failed, because every other lab building the same kind of restricted tier is making the same mistakes right now.

    Third, the cheap-knockoff timeline. AISI’s evaluation showed performance scaling smoothly with inference budget up to 100M tokens. The capability isn’t a moat — it’s a price point. I’d give it six months before something open-weight reaches “Mythos-minus-30%,” and at that point the Glasswing model of restricted access stops working as a containment strategy and starts working only as a head-start.

    I’d love to be wrong on the third one.

    Sources

  • Mythos Slipped the Cage: Notes on Glasswing’s First Real Test

    A little over two weeks ago, Anthropic announced Claude Mythos Preview and put it behind a deliberately small door called Project Glasswing — a phased rollout to “critical industry partners and open source developers” with the explicit goal of giving defenders a head start on a model that, by Anthropic’s own description, can find and exploit zero-days in shipping software. The bet was that if you give a sharp tool to the people patching things first, the asymmetry tilts toward defense long enough to matter.

    Last week the door turned out to be propped open. A small group of users on a private forum stumbled into Mythos through a third-party vendor environment — by, as Fortune phrased it, “guessing where it was located” — on the same day the limited-access program was announced. Anthropic has confirmed it’s investigating. As I write this, no one is claiming a catastrophic outcome. What we do have is the first real-world stress test of the Glasswing premise, and it’s worth being honest about what that test showed.

    What Anthropic actually built

    Mythos Preview (codename Capybara) is a general-purpose frontier model that posts numbers most of us hadn’t expected to see this year — SWE-bench at 93.9%, USAMO at 97.6%, and a generational jump on cyber tasks specifically. The vibe in the public materials is unusually direct for Anthropic: red.anthropic.com calls it “strikingly capable at computer security tasks,” notes that during testing it discovered “thousands of high-severity vulnerabilities” across “every major operating system and web browser,” and frames Glasswing as a deliberate attempt to bias the rollout toward defenders.

    That framing matters because it’s not the standard “we hope it goes well” disclosure. Anthropic is explicitly saying this model raises the offensive ceiling enough that the order of access changes the threat model. That’s a real claim, and it’s one the next two sections actually back up.

    The numbers from AISI

    The UK AI Security Institute’s evaluation, published April 13, is the cleanest third-party look so far. Two findings stuck with me:

    On expert-level capture-the-flag challenges — tasks that no model could complete at all before April 2025 — Mythos Preview succeeds 73% of the time. That’s not “AI is getting better at security CTFs.” That’s a category change.

    On AISI’s “The Last Ones” range — a 32-step simulated corporate-network attack they estimate would take a human professional roughly 20 hours — Mythos became the first model to solve it end-to-end, doing so in 3 of 10 attempts, with an average of 22/32 steps completed. Claude Opus 4.6, the previous best, averaged 16. AISI’s chart shows performance still scaling up at the 100M-token budget they tested; they expect more compute to keep extracting more capability. Translation for defenders: the bottleneck right now is inference budget, not capability.

    The honest caveat AISI prints in plain English is that their ranges lack active defenders, EDR, and meaningful detection penalties. Mythos can chain a kill chain on a soft target. Whether it does so against a hardened, monitored estate is the next evaluation, not this one.

    How the leak happened (and didn’t)

    The leak details we have are thin but instructive. Per the SiliconANGLE and CBS reports, the access path was a third-party vendor environment, not an Anthropic-side credential break. Per Fortune, the discovery vector was effectively guessing — pattern-matching where a limited-access endpoint might be hosted, then trying it. That’s the oldest move in the book: you don’t break the lock, you find the door no one remembered putting on the master key.

    This is the bit I keep returning to. Glasswing’s threat model assumes the perimeter you have to defend includes every partner you handed access to. The model is hard. Vendor-environment hygiene is hard in a different, much more boring way. The boring way is the one that broke first.

    What this means for defenders this week

    A few things I’m doing or recommending around our estate, none of them novel, all of them more urgent than they were on April 7:

    The Cyber Essentials basics that NCSC and AISI both pointed at — patch cadence, access control, configuration baselining, real logging — are now the difference between “vulnerable to a skilled human attacker over a weekend” and “vulnerable to an autonomous agent over a coffee break.” If your patch SLA is 30 days for highs, that window is now quite a bit more expensive.

    If you’re a partner in any frontier-model preview, treat the access credentials as a Tier-0 secret on par with domain admin. The Glasswing leak is going to make every vendor questionnaire about model access materially more painful for the next twelve months, and rightly so.

    Detection assumptions need a refresh. Most of our content is tuned to human pacing and human mistakes. An agent that runs 22 steps of a kill chain in a single autonomous session won’t make the small, slow tells we instrument for. The next round of detection engineering is going to be about behavior-rate signals, not signatures.

    What I’m watching

    Three things over the next couple of weeks. First, whether Anthropic publishes a real post-mortem on the vendor-side leak — not a “we are investigating” line, but the kind of write-up that lets the rest of us learn from a partner’s misconfiguration. Second, whether the UK government’s reported discussions about limited Mythos access produce any public structure for state-level defender programs; that’s the natural next ring outside Glasswing. Third, whether AISI’s hardened-range follow-up actually shows the capability gap I expect — because if Mythos still solves a defended estate at non-trivial rates, the calculus described in the foreign-policy commentary stops being theoretical and starts dictating procurement decisions.

    For now, my read is unchanged from a month ago: the model is real, the defender-first framing is the right framing, and the Glasswing leak is a caution about implementation rather than a refutation of the strategy. The asymmetry window is still there. It’s just smaller than Anthropic wanted it to be.

    Sources

  • Glasswing Has a Crack: What the Mythos Leak Tells Us About Controlled AI Releases

    Three days ago, Gizmodo reported that an unidentified group is using Claude Mythos without Anthropic’s permission. Anthropic has confirmed the access. That sentence — short, factual — is the most consequential thing said about frontier AI this month, and it lands at the worst possible moment for the controlled-release thesis the entire industry has been quietly converging on. I’ve been sitting with this since the headline broke, and I think the security community is underreacting.

    Here’s where my head is at.

    What Mythos actually does that Opus 4.6 didn’t

    The capabilities gap between Mythos Preview and Opus 4.6 isn’t incremental — it’s a regime change, and I want to anchor on the numbers before getting to the governance angle.

    Anthropic’s own red-team writeup is the cleanest source: on their internal vulnerability-discovery benchmark, Opus 4.6 generated zero crashes at tier 3 across roughly 175 attempts. Mythos Preview produced 595 crashes at tiers 1 and 2, added crashes at tiers 3 and 4, and chained an exploit that an unmodified Opus 4.6 reportedly couldn’t develop in hundreds of tries. The UK AI Security Institute independently saw the same pattern: 73% success on expert-level capture-the-flag tasks (no model could complete any of these before April 2025), and on AISI’s 32-step “The Last Ones” enterprise-network attack range, Mythos Preview is the first model to solve it end-to-end — three out of ten attempts — averaging 22 of 32 steps. Opus 4.6, the next best, averaged 16.

    That last figure is the one I keep coming back to. Multi-host, multi-stage attack simulation that human professionals estimate at 20 hours of work, completed start-to-finish by a language model. We are no longer arguing about whether AI can meaningfully assist offensive security. The argument now is about distribution.

    Project Glasswing was supposed to be the answer

    Anthropic’s response to those numbers was to not release Mythos to the public. Instead, they stood up Project Glasswing — a controlled-access program that, as best I can piece together from public reporting, hands Mythos to a hand-picked set of critical-infrastructure operators and large platforms (Apple, Google, Microsoft, Cisco, Amazon are all named) plus a small set of open-source defenders. The pitch is straightforward: give the people defending the most important systems a head-start over the eventual day when models of this caliber are widely available.

    I find the logic basically sound. If you accept that capabilities of this kind will inevitably proliferate — and I do — then a 6-to-18-month defender lead is genuinely valuable. It’s the AI-safety equivalent of disclosing a critical CVE to vendors before going public.

    The problem is that Project Glasswing’s threat model assumed the only people with access were people Anthropic gave access to.

    The leak changes the math

    We don’t yet know the shape of the unauthorized access. Is it credential theft from a Glasswing partner? An insider at Anthropic? A weights exfiltration? An API-key compromise? Each of those implies wildly different remediation, and Anthropic has been understandably tight-lipped while they investigate. But the existence of the access — confirmed, not just alleged — does two things:

    First, it collapses the “defender head-start” argument from a temporal advantage into a race condition. If an unknown actor has had Mythos for some unknown number of weeks, the defenders who got it on April 8 may already be behind, not ahead. We don’t know what’s been done with it.

    Second, and this is the part I think is being undersold: it sets a precedent. The next time a frontier lab argues that a model is too dangerous to release publicly but acceptable to share with twenty named partners, the counter-argument is now empirical, not hypothetical. “Controlled access leaks” stops being a thought experiment.

    What I think MSPs and IT teams should actually do this week

    I run a small managed-services practice, so I’ll keep this concrete. None of us are getting Mythos Preview through Glasswing. That doesn’t matter — the implications still land on our desks.

    Patch hygiene gets re-prioritized. AISI is explicit that Mythos succeeds against systems with weak posture and gets stuck against well-defended ones. The line between those two states is mostly your patch SLA, your egress controls, and whether you have any meaningful EDR coverage on the endpoints attackers actually land on.

    Detection-engineering for agentic behavior moves up the list. The kill-chain fingerprint of a language-model attacker — long pauses for reasoning, retries on noisy commands, oddly verbose error-handling — is observable, and it’s not what your SIEM rules were tuned for. I’m going to spend some of next sprint on this.

    Secrets rotation gets a second look. If the Mythos leak turns out to be credential-driven, a lot of vendors are about to discover that “service account password unchanged since 2021” is a finding now.

    What I’m watching

    Three things, in order. Whether Anthropic discloses the vector of the unauthorized access (vs. only the fact of it) — that determines whether Glasswing partners need to assume their own footholds are compromised. Whether AISI’s planned hardened-environment evaluations land before another model in this tier ships. And whether the EU and UK regulators treat the leak as evidence that voluntary controlled-release programs need a statutory backstop, or whether they let the labs self-correct.

    This is the part of the story where the policy moves faster than the model improvements, or it doesn’t. Either outcome is informative.

    Sources