Grok 4.3, released by xAI in late April 2026, is a reasoning model built around cost efficiency rather than topping benchmarks. It offers a 1M-token context, strong agentic and factual-accuracy scores, and pricing roughly 5–10x cheaper than top-tier models like Claude Opus — but it trails the leaders on the hardest coding benchmarks and shipped with some regressions. It's the better default for cost-sensitive, high-volume workloads, not for squeezing out the absolute best result.
Key takeaways
- Grok 4.3 launched in late April 2026 at roughly $1.25 / $2.50 per million input/output tokens — aggressively cheap for its intelligence tier.
- Its biggest gain is agentic task performance, where it leapt well past its predecessor and several pricier rivals.
- It carries a 1M-token context window and added native video input, useful for long-document and multimodal work.
- It does NOT lead the hardest coding benchmarks — Claude's Opus line still tops SWE-bench by double digits.
- The honest positioning: pick Grok 4.3 for cost-sensitive scale, not for the absolute ceiling on the toughest tasks.
Grok 4.3 is the clearest statement xAI has made about what kind of company it wants to be. It is not the smartest model on the board — it doesn’t claim to be — and that’s the point. Released to xAI’s public API on April 30, 2026, Grok 4.3 is a bet that the market is tired of paying frontier prices for frontier benchmarks it doesn’t actually use, and would rather have something nearly as good for a fraction of the cost.
On that bet, it mostly delivers. But “mostly” is doing real work in that sentence, and the gaps matter as much as the wins.
The pitch is price
Start with the number, because xAI clearly wants you to. Grok 4.3 runs around $1.25 per million input tokens and $2.50 per million output tokens. That’s not just cheap in the abstract — it’s cheap relative to its intelligence tier, which is the comparison that matters. For a typical agentic workload (a lot of context going in, a short structured answer coming out), reporting puts Grok 4.3 at roughly 5–10x cheaper than Claude’s Opus line and several times cheaper than Google’s Gemini Pro tier at the same call volume.
If you’re running a model in production at scale — thousands of calls a day, not a handful of chats — that ratio is the whole ballgame. A model that’s 90% as good at 15% of the cost wins the deployment decision almost every time, because the 10% gap rarely shows up in the kind of high-volume, well-scoped tasks that production systems actually run.
Where it genuinely improved
The headline capability gain is agentic performance — the model’s ability to plan and execute multi-step tasks with tool use. On the benchmarks that measure this, Grok 4.3 didn’t inch forward; it jumped, clearing several pricier competitors. For anyone building autonomous agents or multi-step workflows, that’s the upgrade that counts, and it’s the reason the release got attention beyond the price tag.
It also carries a 1M-token context window and native video input, which makes it well-suited to long-document analysis — feeding it a stack of contracts or a large codebase in a single pass — and putting it among the models that can reason over video as a first-class input rather than a bolted-on afterthought. It can even generate structured files like PDFs and spreadsheets directly from a prompt.
On factual reliability, it scored unusually well, with very low hallucination rates in independent testing — a meaningful trait for any workload where being confidently wrong is expensive.
Where it falls short
Here’s the part the launch materials don’t lead with. Grok 4.3 does not top the hardest benchmarks. On the toughest agentic coding evaluations — SWE-bench and its harder variants — reporting indicates Claude’s Opus models still lead by double digits. And the release wasn’t perfectly clean: independent coverage flagged coding regressions and some erratic behavior, the kind of rough edges that suggest a model shipped fast and may need further tuning passes.
It’s also not the fastest to respond. Independent measurements put its time-to-first-token at the higher end of its tier, even though it generates output quickly once it starts. For interactive, latency-sensitive uses, that lag is noticeable.
That’s not a criticism so much as a description of a deliberate tradeoff.
Excellent at exactly what it was built for. Unremarkable — occasionally weak — everywhere else.
How to actually think about it
The clean way to place Grok 4.3 is to stop asking “is it the best?” and start asking “best at what, for whom?”
If your use case is high-volume and cost-sensitive — processing large documents, running agents at scale, structured extraction where the task is well-defined and the model just needs to be reliable and affordable — Grok 4.3 is a strong default, and the price makes it hard to argue with. If you need the absolute ceiling on the hardest coding, math, or reasoning problems, the top-tier models still earn their premium, and Grok 4.3 isn’t where you go.
That’s a narrower pitch than “the smartest AI,” but it’s an honest one, and it’s arguably a smarter business position than chasing a benchmark crown that changes hands every few weeks. xAI is betting that most real-world AI spend isn’t about the hardest 10% of tasks — it’s about doing the ordinary 90% cheaply and reliably. Grok 4.3 is built for that 90%. The bet is sound on paper; the catch is that “cheapest capable model” is the least defensible title in AI. It has changed hands roughly every quarter, and OpenAI and Google can both undercut a price without blinking. So the real test isn’t whether teams adopt Grok 4.3 — at this price, plenty will. It’s whether they’re still on it after the next price cut lands, or whether they’ve already moved to whatever’s cheap and capable that month. Price is a feature anyone can copy. Stickiness is the thing xAI still has to build.
Frequently asked questions
When was Grok 4.3 released?
xAI opened Grok 4.3 to its public API on April 30, 2026, after a mid-April beta, with general availability following in early May 2026.
How much does Grok 4.3 cost?
Public API pricing is around $1.25 per million input tokens and $2.50 per million output tokens, with cached input far cheaper. Requests over roughly 200,000 tokens are billed at a higher rate. That puts it among the lower-cost models at its intelligence level.
Is Grok 4.3 better than Claude or GPT?
Not on the hardest benchmarks. Reporting indicates Claude's Opus models still lead the toughest agentic coding benchmarks like SWE-bench by double digits. Grok 4.3 closes much of the gap on general reasoning while being several times cheaper, which is its actual selling point.
What's new in Grok 4.3 beyond price?
A large jump in agentic task performance, a 1M-token context window, and native video input as a first-class modality. It can also generate structured documents like PDFs and spreadsheets directly from prompts.
Should I switch to Grok 4.3?
If your workload is high-volume and cost-sensitive — long context in, short structured answers out — Grok 4.3 is a strong default. If you need best-in-class results on the hardest coding or reasoning tasks, the top-tier models still justify their price.