When was Claude Opus 4.7 released?

Released on April 16, 2026. The API model ID is claude-opus-4-7, available simultaneously across four channels: the official Anthropic API, AWS Bedrock, Azure, and Google Vertex. It is the strongest model Anthropic has made public as of May 2026.

How much does Claude Opus 4.7 cost?

$5 per million input tokens and $25 per million output tokens, the same as Opus 4.6. A prompt-cache hit saves 90%, and the batch API is half price. But 4.7 uses a new tokenizer, so the same text consumes 1.0 to 1.35x the tokens, which means your real bill could rise 0-35%. On Artificial Analysis, running the same benchmark suite, Anthropic actually saved 11% ($4,406 vs $4,970), showing that efficiency gains can partly offset the tokenizer hike.

How much stronger is Opus 4.7 than Opus 4.6?

SWE-bench Verified 87.6% vs 80.8% (+6.8), SWE-bench Pro 64.3% vs 53.4% (+10.9, the biggest jump), Terminal-Bench 69.4% vs 65.4%, OSWorld 78% vs 72.7%, CharXiv vision 82.1% vs 69.1% (+13), hallucination rate 36% vs 61% (an important improvement). All three directions, agent, vision, and long-running tasks, clearly improved.

Is Opus 4.7 stronger than GPT-5.4 and Gemini 3.1 Pro?

It depends on the use case. Agent coding: Opus 4.7 leads on SWE-bench Pro at 64.3% (GPT-5.4 is 57.7%, Gemini 3.1 Pro 54.2%). Terminal code: GPT-5.5 scores 82.7% on Terminal-Bench 2.0, clearly above Opus 4.7's 69.4%. Web research: GPT-5.4 Pro takes 89.3%, well ahead of Opus 4.7's 79.3%. Scientific reasoning GPQA: the three are nearly tied at 94%. In one line: Opus 4.7 is the king of agent coding and long-running workflows, but it is not first on every single benchmark.

What new features does Opus 4.7 have?

Four: (1) high-resolution image support, up to 2576px / 3.75MP, a big improvement for computer use, screenshot understanding, and document processing; (2) a new xhigh effort level for coding and agent use; (3) task budgets (beta), giving an agent a token budget to allocate itself; (4) adaptive thinking as the only reasoning mode, with extended thinking's budget_tokens config removed. temperature / top_p / top_k can no longer be set either.

Should I upgrade to Opus 4.7?

If you use Opus 4.6 for agent coding / computer use / long agentic loops: upgrade, the +10.9 on SWE-bench Pro is a step change. If you use Sonnet 4.6 and cost isn't a concern: worth a try, the quality is a clear notch higher. If you use GPT-5.4 Pro heavily for web research: don't switch, GPT is stronger there. If you use DeepSeek R1 / Qwen to save cost: don't switch, Opus 4.7 is $5/$25 and DeepSeek is $0.55/$2.19, a 9x difference.

深度评测 · 2026-05-12 · by @zayuerweb-dev

Claude Opus 4.7 Review: SWE-bench 87.6%, Same Price, Who Should Upgrade

Claude Opus 4.7, which Anthropic shipped on 2026-04-16, is the most substantial Claude release of the past year. SWE-bench Pro jumped 10.9 points in a single version, the hallucination rate dropped from 61% to 36%, high-resolution image support arrived, and the price held steady (though a new tokenizer means a hidden 0-35% increase). This review uses Anthropic's official docs, hands-on testing from Vellum, and Artificial Analysis data to put Opus 4.7 next to 4.6, GPT-5.4, Gemini 3.1 Pro, and Sonnet 4.6. By the end you'll know whether to upgrade, how to upgrade, and which cases actually argue against it.

30-second verdict

Agent coding / long agentic loops: upgrade. SWE-bench Pro 64.3% is first in the industry.
Computer use / screenshot understanding: upgrade. OSWorld 78%, 2576px high-resolution image support.
Knowledge work (docs, slides, charts): upgrade. CharXiv vision 82.1% (+13 points).
Web research / long tool chains: not necessary. GPT-5.4 Pro still leads on BrowseComp at 89.3%.
Terminal coding (CLI-heavy): not necessary. GPT-5.5 hits 82.7% on Terminal-Bench, well above Opus 4.7's 69.4%.
Cost-sensitive batch jobs: do not. At $5/$25 it's 9x DeepSeek R1 and 1.67x Sonnet 4.6.
When in doubt: Opus 4.7 for the hard tasks, Sonnet 4.6 for routine calls, DeepSeek R1 for batch.

Compare every model live on Check.AI →

Core specs

Item	Claude Opus 4.7
API model ID	`claude-opus-4-7`
Release date	2026-04-16
Context window	1,000,000 tokens
Max output	128,000 tokens
Input price	$5.00 / million tokens
Output price	$25.00 / million tokens
Cache hit	input price × 0.1 (90% off)
Batch API	half price on input/output
High-resolution images	2576px / 3.75MP (previous gen 1568px / 1.15MP)
Availability	Anthropic API, AWS Bedrock, Azure, Google Vertex

Key benchmarks vs the last gen and rivals

Benchmark	Opus 4.7	Opus 4.6	GPT-5.4	Gemini 3.1 Pro
SWE-bench Verified (agent coding)	87.6%	80.8%	N/A	80.6%
SWE-bench Pro (harder)	64.3%	53.4%	57.7%	54.2%
Terminal-Bench 2.0 (CLI)	69.4%	65.4%	82.7% (GPT-5.5)	68.5%
MCP-Atlas (multi-tool calling)	77.3%	75.8%	68.1%	73.9%
Finance Agent v1.1	64.4%	60.1%	61.5% (Pro)	59.7%
OSWorld-Verified (computer use)	78.0%	72.7%	75.0%	N/A
BrowseComp (web research)	79.3%	83.7%	89.3% (Pro)	85.9%
GPQA Diamond (scientific reasoning)	94.2%	91.3%	94.4% (Pro)	94.3%
CharXiv (visual reasoning)	82.1%	69.1%	N/A	N/A
Hallucination rate (lower is better)	36%	61%	N/A	N/A

Data from Anthropic's official docs, Vellum evaluations, and the Artificial Analysis Intelligence Index, current as of May 2026. "N/A" means that source did not publish the figure. GPT-5.4 Pro is OpenAI's higher-effort version, at a higher price.

5 changes that actually matter

1. SWE-bench Pro +10.9 points: the agent-coding inflection point

SWE-bench Verified has been pushed past 80% to the point that nobody cares. SWE-bench Pro is the real agent-coding benchmark for 2026: harder, demands multi-step planning, requires cross-file coordination. The 10.9-point jump from Opus 4.6 to 4.7 (53.4% to 64.3%) is the largest single-version gain across all frontier models in the past year, leaving GPT-5.4's 57.7% and Gemini 3.1 Pro's 54.2% well behind.

What it means in practice: where Claude Code used to land a large refactor on the first try about 60% of the time, it's now 75%+. One fewer retry pays for the upgrade.

2. Hallucination rate cut from 61% to 36%

This is the most dramatic number Anthropic published. On the same test suite, Opus 4.6 hallucinated 61% of the time; 4.7 only 36%. The mechanism is that the model is more willing to say "I don't know" rather than make something up. For production that matters most where a wrong answer costs more than no answer: automated support, legal RAG, medical assistance. For those, 4.7 is a mandatory upgrade.

3. High-resolution image support (computer use is finally usable)

The image ceiling rose from 1568px / 1.15MP to 2576px / 3.75MP. Coordinates now map 1:1 to pixels, so no scale-factor conversion. That's a step change for three cases:

Computer use: full-screen captures aren't blurry, and button targeting is far more accurate.
Document / form understanding: scanned PDFs and contract screenshots are much more readable.
Artifact / chart analysis: CharXiv vision rose from 69.1% to 82.1% (+13).

4. New tokenizer: your bill could rise 0-35%

The price sheet still says $5/$25, but the same Chinese text, code, or data now uses 1.0 to 1.35x the tokens on 4.7. In other words:

Plain short English: essentially no difference.
Chinese, code, data: possibly 20-35% more.
The real effect may be offset by Opus 4.7's smaller output (35% fewer output tokens on the same Artificial Analysis benchmark suite).

Best practice: before upgrading, run 100-500 of your real requests and measure the bill change yourself. Don't take "the price is unchanged" at face value.

5. xhigh effort + task budgets (new tools for agent workflows)

Anthropic added an xhigh effort level (harder-working than high, spends more tokens but is steadier). There's also a new task_budget beta header that gives an agent a total token budget to allocate itself. The model can see the countdown, so it prioritizes and wraps up on time.

It doesn't mean much for indie developers, but it's a step change for enterprise agent workflows (CI/CD integration, automated PR review).

3 breaking API changes to read before upgrading

Extended thinking is gone. Setting thinking: {"type": "enabled", "budget_tokens": N} returns a 400. Use thinking: {"type": "adaptive"} + effort: "high" instead.
temperature / top_p / top_k are all gone. Setting a non-default value returns a 400. Control behavior through the prompt.
Thinking content isn't returned by default. Products that stream the reasoning process in the UI will see long blank stretches. You have to explicitly turn on display: "summarized".

Adaptive thinking is also off by default: set nothing and it won't think at all. That's the biggest behavioral difference from 4.6. Claude Code, Cursor, and Cline have already updated; if you wrote your own SDK integration, you'll need to change it.

Who should upgrade, who shouldn't, who can skip

🟢 Upgrade

Using Opus 4.6 for Claude Code, agent coding, long agentic loops.
Running computer use, screenshot understanding, document extraction.
Doing RAG / support where you'd rather not answer than answer wrong.
Using a multi-tool agent (MCP-Atlas 77.3%, first in the industry).

🟡 Worth upgrading, but A/B test first

On Sonnet 4.6 and wanting a quality bump: Opus is 1.67x the price, so check whether your task complexity justifies it.
Web research / multi-search apps: GPT-5.4 Pro still leads BrowseComp at 89.3%.
Chinese-heavy traffic: the tokenizer change adds +20-35% for Chinese, so run the numbers.

🔴 Don't bother

Using DeepSeek R1 / Qwen3 / GLM-4.6 for cost-sensitive batch: Opus is 5-10x their price.
Pure terminal CLI heavy use: GPT-5.5 leads Terminal-Bench by a wide margin at 82.7%.
Already on GPT-5.4 Pro for web research / deep search: same generation, no reason to switch.

Real cost estimate (same workload)

Assume a code-review agent handling 500 PRs a month, each averaging 40K tokens in, 4K tokens out, and 3 tool calls.

Model	Monthly cost	SWE-bench Pro	Recommendation
Opus 4.7	~$150	64.3%	Critical PRs + complex refactors
Opus 4.6	~$130	53.4%	No reason to keep it, upgrade to 4.7
Sonnet 4.6	~$90	~50%	Routine PRs, the value pick
GPT-5.4	~$75	57.7%	CLI / terminal tasks
DeepSeek R1	~$15	~52%	Cost-sensitive batch

Estimates for reference only, before prompt cache and batch discounts. Heavy cache reuse can lower Opus 4.7's real cost by 40-60%.

What to watch over the next 6 months

When Sonnet 4.7 arrives. The historical pattern: a Sonnet version follows an Opus release by 2-4 months. Expected Q3 2026.
Whether Gemini 3.5 / GPT-6 overtake it. All three have clustered above 80% on SWE-bench Verified; the next jump comes down to who breaks 90% first.
The price war. DeepSeek R2 is expected in Q3 and could widen the 1:9 value gap again.
Whether task budget / xhigh become an industry standard. If OpenAI and Google follow, agent workflows will standardize around them.
Whether the tokenizer "hidden hike" becomes the new normal. Sticker price unchanged but more tokens used: other vendors may copy it.

FAQ

When was Opus 4.7 released? April 16, 2026. API ID claude-opus-4-7.

Did the price change? Not on the surface ($5/$25), but the new tokenizer uses 1-1.35x the tokens for Chinese/code, so your real bill could rise 0-35%.

Do I have to upgrade? For agent coding / computer use / RAG, yes. For low-value batch and CLI-heavy work, no.

How does it compare to GPT-5.4? Opus is stronger on SWE-bench Pro (64.3% vs 57.7%); GPT is stronger on BrowseComp (89.3% vs 79.3%). On GPQA the three are nearly tied.

Does upgrading require code changes? Yes. Extended thinking budget and temperature/top_p/top_k are all gone, and thinking content isn't returned by default.

Does the 1M context cost extra? No. The 1M context is standard pricing, with no long-context premium.

→ Compare Opus 4.7 vs other models live on Check.AI

Sources

Claude Opus 4.7 official release and specs · 2026-05-22
SWE-bench / benchmark comparison · 2026-05-22
Vellum hands-on testing · 2026-05-22