# Anthropic ships Sonnet 4.7 with native code execution

> Sonnet 4.7 shipped this morning with Python execution baked directly into the API — no separate sandbox, no harness wiring, one new boolean parameter. Same model pricing as 4.6 plus a per-execution surcharge.

- **Pillar:** News
- **Author:** Aditya Marin Gasga (Founding Editor)
- **Published:** 2026-05-31T00:00:00.000Z
- **Tags:** anthropic, sonnet, claude, tool-use, launch

## TL;DR

Anthropic released Sonnet 4.7 this morning with native Python code execution as a first-class API feature. Set code_execution: true on the request and the model runs Python inline in a sandboxed environment, returning both the code and its output. Same model pricing as Sonnet 4.6, plus a $0.01-per-execution sandbox surcharge. Replaces the 'wrap Claude in a Python sandbox' pattern for simple tasks; doesn't replace dedicated agent platforms for multi-step workflows.

## Key takeaways

1. Sonnet 4.7 ships native Python execution as a first-class API feature — same pricing as 4.6 plus $0.01/execution sandbox surcharge.
2. Replaces the 'wrap Claude in a Python sandbox' pattern that every coding-agent stack currently has — fewer moving pieces, lower latency.
3. Doesn't replace dedicated agent platforms like Devin for complex multi-step workflows; this is for one-shot or short-sequence code execution.
4. Existing Sonnet 4.6 customers get 4.7 with no migration — same API surface, just one new optional parameter.
5. First Anthropic release to bundle tool execution into the model call itself; expect OpenAI and Google to match within weeks.

import PullQuote from '~/components/article/PullQuote.astro';

Anthropic released Sonnet 4.7 this morning, and the headline feature is one that sounds modest in description and lands as a meaningful shift in practice: native Python code execution, built directly into the API.

Add `code_execution: true` to your Claude request. The model runs Python inline in a sandboxed environment, returns both the code it wrote and the output of running it, and threads that output back into its reasoning. No separate sandbox provisioning, no harness wiring, no managing execution state between turns. One new optional parameter on an API surface that's otherwise byte-compatible with Sonnet 4.6.

That's the whole change at the API level. The implication is bigger.

## What's new

Until this morning, every team running Claude for any task that needs to actually run code — data analysis, chart generation, format conversions, math beyond arithmetic, anything involving libraries — was operating one of three patterns. They either rolled their own Python sandbox (Docker container, Modal endpoint, AWS Lambda), wrapped Claude with an agent framework that provided one (LangChain, Anthropic's own demo harness, OpenAI's function-calling pattern), or just had Claude generate code and asked the user to run it themselves.

All three patterns work. They also each require ongoing maintenance, security review, and a non-trivial latency budget — typically 300-800ms per execution round-trip just for the sandbox setup. For the common case (model generates code, model wants the output, model continues reasoning), that latency compounds across every step. A multi-step analysis that involves three code executions can spend 2-3 seconds in sandbox overhead before the model has done any actual thinking.

Sonnet 4.7's native execution collapses that to a single API call. The sandbox runs in Anthropic's own infrastructure, alongside the model inference, with no round-trip back to your stack. In Anthropic's own [release notes](https://www.anthropic.com/news), they cite typical-case latency improvements of 60-70% on multi-step analytical tasks. We weren't able to replicate that number on arbitrary workloads — our internal tests showed closer to a 40% improvement on average — but the direction is clearly right and the engineering simplification is the bigger win.

The sandbox itself is restrictive in the ways you'd expect. Python 3.13, pre-installed scientific Python stack (numpy, pandas, scipy, matplotlib), no network access, no filesystem persistence between calls, 60-second execution ceiling per call, 4GB memory cap. Anthropic explicitly positions this as "useful tool execution," not "general compute platform." If your use case needs a long-running process, a custom binary, or anything that talks to the internet, you're still building your own sandbox.

## What it costs

Model usage is unchanged from Sonnet 4.6: $3/Mtok input, $0.30/Mtok cached input, $15/Mtok output. Code execution adds a separate $0.01-per-execution sandbox surcharge — billed regardless of whether the Python code completed successfully. For workloads that execute code once per request, this is rounding error on the token cost. For workloads that loop (the model writes code, sees output, rewrites, retries), it can become the dominant cost line. Worth budgeting explicitly.

There's no minimum commitment, no separate enabling step, and the feature is on by default for every Anthropic API customer. Existing Sonnet 4.6 customers can flip the boolean on their next call and have it work.

## What it doesn't replace

The temptation will be to read "native code execution" as "no need for agent platforms." That overshoots. Sonnet 4.7's execution is one-shot or short-loop — write code, run it, see output, write more code. It's not a multi-step agent in the [Devin](/tools/ai-agents-compared-2026) sense, where the model needs to navigate a codebase, manage file state across many turns, open PRs, respond to test failures over hours of execution.

What it replaces is the wrapper pattern. If you're currently maintaining a Python sandbox solely to give Claude a place to run code, you can probably delete that infrastructure on Monday and replace it with the API flag. If your sandbox is doing something more — running on your private data, integrating with your services, persisting state across requests — keep it.

It also doesn't change anything about the actual model capability. Sonnet 4.7 is the same underlying model as Sonnet 4.6 with the execution layer added; benchmark scores are within margin of error on every test set Anthropic published. If you were on Sonnet 4.6 for the quality, you stay on it for the quality. The 4.7 release is about packaging, not capability.

<PullQuote pillar="news">The 4.7 release is about packaging, not capability.</PullQuote>

## What to do today

If you're running Sonnet in production with any code-execution path, the migration check takes ten minutes:

1. Open one of your in-production Claude calls that currently routes through your own Python sandbox.
2. Add `code_execution: true` to the request, drop the sandbox call.
3. Run it. Compare the output.
4. If the output matches, you have a path to delete a non-trivial chunk of infrastructure.

If you've been wanting to add code execution but the sandbox-wrangling held you back, the cost is now one API parameter. Worth the experiment.

The broader story to watch: this is the first release from a frontier vendor where a major tool — code execution — is bundled inside the model call rather than orchestrated externally. OpenAI's Code Interpreter has been around since 2023, but only inside ChatGPT; the API didn't have an equivalent. We'd expect both OpenAI and Google to match within weeks, and we'd expect the next tools to be bundled the same way — web search inside the model call, file reading inside the model call, eventually browser use inside the model call. The trend is consolidation of the tool-use boundary back into the model API surface.

Whether that's good for the agent platform ecosystem (Manus, Devin, the OpenAI agent products) is a real question. The honest answer is probably "fine for now, harder in a year." This release doesn't threaten them; the next three might.