For each model, the per-request input cost depends on whether your token is served from the prefix cache. So the effective input rate is a weighted average:

effective_input_$_per_Mtok =
    cache_hit_rate × cached_input_price
  + (1 − cache_hit_rate) × full_input_price

monthly_cost =
    input_tokens × effective_input_$_per_Mtok / 1_000_000
  + output_tokens × output_price / 1_000_000

If you're running RAG or any workflow with a stable system prompt, your real hit rate is 60-90%, far above the 0% the vendor price table implicitly assumes. That gap is the whole reason a Sonnet workload that "should" cost $X often costs $X/3 in practice. The calculator above models it honestly.

Caveats before you trust these numbers

Prices verified against vendor pricing pages on 2026-06-04. Tap any model row in the table to open its current vendor page and sanity-check before committing budget.
API list pricing only. Enterprise discounts, committed-spend contracts, and provisioned-throughput pricing all differ.
Long-context surcharges (some vendors charge a higher rate above 200K or 500K tokens per request) aren't modeled here. If you're routinely sending huge prompts, treat the result as a floor.
Tool calls, embeddings for retrieval, fine-tuning, and image/audio modalities are billed separately and aren't included.
Cache hit rate is the % of input tokens served from a prefix cache. Output is never cached.

Get pinged when these prices move

We rerun this calculator and email the diff whenever any vendor changes API pricing. No other email.

Free. Unsubscribe in one click.

Related reading

Get pinged when these prices move

Use this tool anywhere