Model cost calculator
Type in your monthly token volumes and cache hit rate. The table below shows what each frontier model would actually cost you per month — including the cache discount that headline pricing tables ignore.
Pricing verified . Tap any row to open the current vendor pricing page and sanity-check.
Ranked cheapest → priciest
How the math works
For each model, the per-request input cost depends on whether your token is served from the prefix cache. So the effective input rate is a weighted average:
effective_input_$_per_Mtok =
cache_hit_rate × cached_input_price
+ (1 − cache_hit_rate) × full_input_price
monthly_cost =
input_tokens × effective_input_$_per_Mtok / 1_000_000
+ output_tokens × output_price / 1_000_000 If you're running RAG or any workflow with a stable system prompt, your real hit rate is 60-90% — far above the 0% the vendor price table implicitly assumes. That gap is the whole reason a Sonnet workload that "should" cost $X often costs $X/3 in practice. The calculator above models it honestly.
Caveats before you trust these numbers
- Prices verified against vendor pricing pages on 2026-05-31. Tap any model row in the table to open its current vendor page and sanity-check before committing budget.
- API list pricing only. Enterprise discounts, committed-spend contracts, and provisioned-throughput pricing all differ.
- Long-context surcharges (some vendors charge a higher rate above 200K or 500K tokens per request) aren't modeled here. If you're routinely sending huge prompts, treat the result as a floor.
- Tool calls, embeddings for retrieval, fine-tuning, and image/audio modalities are billed separately and aren't included.
- Cache hit rate is the % of input tokens served from a prefix cache. Output is never cached.
Get pinged when these prices move
We rerun this calculator and email the diff whenever any vendor changes API pricing. No other email.
Free. Unsubscribe in one click.