Opus 4.7: The Coders’ Messiah, While Mythos Lurks in Shadows

Ah, the grand theater of progress! Anthropic, that cunning maestro of artificial whimsy, has unleashed upon the world Claude Opus 4.7, a model so refined it could debug the very fabric of reality-or so they claim. With a flourish of their digital wand, they extend their lead over OpenAI and Google, leaving the latter to ponder their mortal coils in silence. Yet, like a magician guarding his secrets, Anthropic draws a line in the sand: some toys are for the public, others remain locked in the vault, whispered about in hushed tones.

Opus 4.7, they say, is a “notable improvement” over its predecessor, Opus 4.6. A phrase so modest, one might think it was uttered by a bureaucrat in a Kafkaesque office. Developers, those modern-day alchemists, rejoice! For this model can handle their most vexing coding tasks with the confidence of a cat strolling through a mouse convention. Long-running jobs? It tackles them with the rigor of a Soviet bureaucrat filling out paperwork. Instructions? Followed with the literalism of a lawyer parsing a contract. And self-verification? Oh, it checks its work like a paranoid accountant counting rubles.

Pricing, they assure us, remains unchanged: $5 per million input tokens and $25 per million output tokens. A bargain, no doubt, for the privilege of conversing with a digital savant. The model is already strutting its stuff across Anthropic’s API, Amazon Bedrock, Google Cloud’s Vertex AI, and Microsoft Foundry. Even GitHub Copilot has rolled it out for its Pro+, Business, and Enterprise users-because why should the proletariat have all the fun?

A Benchmark Lead, But a Tightrope Walk

Opus 4.7, with the grace of a prima ballerina, retakes the top spot among publicly available frontier models. It outscores OpenAI’s GPT-5.4 and Google’s Gemini 3.1 Pro on key benchmarks, a victory so narrow it could be measured with a micrometer. VentureBeat’s Carl Franzen, ever the pragmatist, notes that Opus 4.7 leads GPT-5.4 by a mere seven-to-four on directly comparable benchmarks. The gap, it seems, is shrinking faster than a Moscow apartment in winter.

On SWE-bench Pro and SWE-bench Verified, the model shines like a beacon in the darkness, handling complex engineering work with the finesse of a master craftsman. Early-access testers, those lucky few, report gains so outsized they could only be described as miraculous. Cursor’s Michael Truell claims the model cleared 70% on CursorBench, up from 58% for Opus 4.6. XBOW’s Oege de Moor, with a flourish, reports a leap from 54.5% to 98.5% on their visual-acuity benchmark-a change so dramatic it could only be compared to a Bolshevik revolution in the world of autonomous penetration testing. Rakuten’s Yusuke Kaji, not to be outdone, declares the model resolved three times more production tasks than its predecessor. Truly, a triumph of silicon over flesh!

Vision, too, has been upgraded. Opus 4.7 can process images up to 2,576 pixels on the long edge, a resolution so high it could spot a flea on a camel’s back. This opens the door to use cases that depend on fine visual detail, such as parsing dense screenshots and extracting structured data from complex technical diagrams. A boon for those who toil in the trenches of digital minutiae.

The Weaknesses, Laid Bare

Ah, but even the mightiest of models has its Achilles’ heel. The release notes, with a candor rare in this age of corporate obfuscation, admit where Opus 4.7 falls short. GPT-5.4 still leads in agentic search, multilingual question answering, and some terminal-based coding tasks. And in a twist worthy of a Bulgakov novel, Opus 4.7 scored fractionally lower than its predecessor on cybersecurity vulnerability reproduction-a regression Anthropic attributes to its new automated cyber safeguards. Progress, it seems, is a double-edged sword.

Migration, too, is not without its pitfalls. The model’s updated tokenizer maps inputs to 1.0-1.35 times as many tokens as Opus 4.6, and it thinks harder at higher effort levels, producing more output tokens in later turns. Developers, those poor souls, may need to re-tune their prompts, for Opus 4.7 takes instructions with the literalism of a bureaucrat interpreting a decree. And Anthropic’s alignment assessment? It rates the model as “largely well-aligned and trustworthy, though not fully ideal in its behavior.” A phrase so vague, it could describe a character from The Master and Margarita.

The Mythos Shadow

But the true drama lies in what Anthropic is not shipping. Opus 4.7, they insist, is “less broadly capable than our most powerful model, Claude Mythos Preview.” Ah, Mythos! That elusive, mythical beast, unveiled under Project Glasswing and restricted to a mere 40 vetted enterprise and government partners. A system so powerful it could autonomously discover and exploit zero-day vulnerabilities at a scale that makes human researchers look like amateurs. Yet, it remains locked away, a secret shared only with the likes of Apple, Google, Microsoft, Amazon Web Services, CrowdStrike, and JPMorgan Chase. A coalition so exclusive, it could be mistaken for a cabal of sorcerers.

Gizmodo’s Jake Peterson, ever the cynic, observes that the Opus 4.7 announcement doubles as marketing for the system Anthropic refuses to sell. Legitimate security researchers, those modern-day knights, can apply for broader access through the Cyber Verification Program-a controlled on-ramp for vulnerability research, penetration testing, and red-teaming work. A gesture so magnanimous, it could only be described as a bone thrown to the masses.

The dual-track strategy, it seems, has implications beyond the AI industry. Bitcoin, that digital gold, trades near $74,500, steady as a rock in a storm. Yet, the $200 billion locked in smart contracts across Ethereum, Solana, and other chains sits behind defenses that Anthropic has warned become “considerably weaker” against model-assisted adversaries. A reminder that in this game of cat and mouse, the cat has just grown sharper claws.

What Developers Get Today

Alongside Opus 4.7, Anthropic has rolled out a new “xhigh” effort level, giving developers finer control over the trade-off between reasoning depth and latency. Task budgets, too, have entered public beta, allowing developers to cap token spend on autonomous agents and prevent runaway bills. In Claude Code, a new /ultrareview slash command runs a dedicated review session, flagging bugs and design issues with the precision of a senior reviewer. And “auto mode,” that marvel of convenience, has been extended to Max plan subscribers. A veritable feast of features, though one wonders if it’s enough to satisfy the insatiable appetite of the developer class.

For those weighing the upgrade, Anthropic’s advice is to start with high or xhigh effort for coding and agentic use cases, measure token usage on real traffic, and consult the migration guide before rolling the model into production. The headline, it seems, is that frontier capability arrives on a two-month cadence, at unchanged prices, while the truly transformative version remains behind closed doors. A tale as old as time itself: the haves and the have-nots, separated by a digital velvet rope.

2026-04-16 23:37

A Benchmark Lead, But a Tightrope Walk

The Weaknesses, Laid Bare

The Mythos Shadow

What Developers Get Today

Read More