Anthropic Pushes Claude Toward Long-Haul Work With Opus 4.6
The AI startup says its latest flagship model can sustain agent-driven tasks for longer, reason across million-token contexts and operate more reliably in large codebases.
Topics
News
- AI Is Turning Cloud Break-Ins Into a Race Against the Clock
- Anthropic Pushes Claude Toward Long-Haul Work With Opus 4.6
- ICC Taps Google Gemini to Power AI Fan Experiences at T20 World Cup
- OpenAI Unveils Frontier to Scale AI Agents Inside Companies
- UK Regulator Seeks Explanation From Air India Over Dreamliner Switch Issue
- Alphabet Plans Major Bengaluru Expansion With Space for 20,000 Staff
[Image source: Chetan Jha/MITSMR India]
Anthropic has released Claude Opus 4.6, an upgrade to its most capable artificial-intelligence model, sharpening its focus on sustained, economically useful work rather than conversational performance.
The company said in its release that the new version plans tasks “more carefully,” maintains agent-driven workflows for longer stretches, and operates more reliably across large codebases. The improvements are aimed at professional users such as software engineers, analysts and researchers, rather than casual consumers.
One headline feature is scale. For the first time in Anthropic’s Opus line, the model offers a one-million-token context window in beta. This allows the system to absorb and reason over enormous volumes of text in a single session.
Anthropic said this represents a qualitative change, noting that Opus 4.6 performs far better than its predecessors on long-context retrieval tests and suffers much less from what engineers call “context rot.”
The company is keen to emphasize practical impact. Opus 4.6, it says, can apply its abilities to “running financial analyses, doing research, and using and creating documents, spreadsheets, and presentations.” Within Cowork, Anthropic’s autonomous work environment, the model can combine these skills to operate with minimal supervision.
Benchmarks are deployed liberally. Anthropic claims state-of-the-art performance on several evaluations, including Terminal-Bench 2.0 for agentic coding and Humanity’s Last Exam, a multidisciplinary reasoning test.
On GDPval-AA, which measures performance on economically valuable knowledge work, the firm says Opus 4.6 “outperforms the industry’s next-best model by around 144 Elo points.” Translated into plainer language, that means it scores higher roughly 70 percent of the time.
Anthropic also highlights improvements in judgment. Internally, the company says engineers found that the model “brings more focus to the most challenging parts of a task without being told to” and “handles ambiguous problems with better judgment.”
The trade-off is that deeper thinking can add latency and cost, a problem the firm addresses with new effort controls that let developers dial reasoning up or down.
Safety remains a central theme. The release said Opus 4.6 showed “an overall safety profile as good as, or better than, any other frontier model in the industry,” with low rates of misaligned behaviour and fewer unnecessary refusals. Given the model’s stronger cybersecurity skills, Anthropic said it has introduced new probes and safeguards and is exploring real-time interventions to block abuse.
Commercially, the update is incremental rather than disruptive. Pricing remains unchanged for most use cases, and the model is available immediately through Anthropic’s products, its API and major cloud platforms. Yet the company’s strongest endorsement is how it uses the system itself. “We build Claude with Claude,” the release said, describing how engineers rely on the model daily for real work.
The launch follows a period of market unease around AI’s impact on enterprise software. Last month, Anthropic’s introduction of new automation tools within Claude Cowork intensified investor concerns that AI agents could displace specialised software across legal, sales, marketing and finance functions.
Investor reaction was swift. A Goldman Sachs basket of US software stocks fell 6%, marking its worst single-day decline since April, while nearly $285 billion in market value was wiped out across software, financial services and asset management companies. Shares of major software firms including Salesforce and ServiceNow dropped sharply as well.
The impact spilled over to Indian technology stocks, with US-listed ADRs of Infosys and Wipro falling 6% and 5%, respectively.
