Tools Interactive · Diagnostic

Prompt-cache hit-rate diagnostic

Paste your system prompt. The diagnostic scans for the patterns that silently destroy your cache hit rate: timestamps, UUIDs, per-user template variables, and the small structural mistakes that fragment your prefix across requests. Returns a score, the specific findings, and the exact fix for each.

This runs entirely in your browser. Nothing leaves the page: no API call, no upload, no logging.

Try an example:

Your system prompt

0 chars · ~0 tokens · below cache threshold

How prefix caching actually works

Every frontier API (Anthropic, OpenAI, Google) caches based on an exact prefix match. When you send a request, the provider checks whether the start of your input matches a recent cached prefix. As long as it does (character for character) you pay the cached input rate (typically 10-25% of full). The moment a character differs, you pay the full rate for that token and every token after it.

So one stray $datetime.now() in the middle of an otherwise static system prompt doesn't just leak that token's cost. It leaks every token from that point to the end. On a 5,000-token prompt with the timestamp at position 200, that's 4,800 tokens paying full rate, every call.

The diagnostic above scans for the specific patterns that cause this and shows you where they are. The fix is almost always the same: move the dynamic content out of the system prompt and into the user message, where it doesn't break anything.

What this diagnostic does and doesn't catch

Catches reliably: ISO timestamps, datetime/Date/time function calls, literal UUIDs and UUID-generation calls, random function calls, per-user/session template variables ({{user_id}}, {{tenant_id}}, etc.), trailing whitespace, mixed indentation, raw JSON.stringify without sorted keys.

Doesn't catch: custom string-substitution patterns specific to your template engine, dynamic data injected via your application code before the prompt reaches the LLM call, anything that varies between deployment environments (a config value that differs dev vs prod), or anything where the cache-killing happens server-side before you see the rendered prompt.

Recommended workflow: paste your rendered prompt (the actual string your application sends), not the template. If the rendered version has the problem, the template is the cause.

Cache-friendly prompt patterns

Static system prompt, version-controlled. The entire system prompt is one string in your repo. Render it from a single source of truth.
Dynamic content in the user message. Timestamps, user IDs, session data, anything per-request: all of it lives in the user turn, not in system.
Canonical serialization for any embedded data. If you have to embed JSON in the prompt (rare), use a deterministic key order: JSON.stringify(obj, Object.keys(obj).sort()) or a library like fast-json-stable-stringify.
Tool list in a stable order. Some agent frameworks shuffle the tool array; sort it deterministically before passing.
Explicit cache_control markers. Anthropic's cache_control: { "type": "ephemeral" } tells the API where to cut the cache boundary. OpenAI does it implicitly. Be explicit when you can.

Get pinged when cache pricing changes

When a vendor changes cache pricing or behavior, we update the diagnostic and email the diff. No other email.

Free. Unsubscribe in one click.

Related reading

Get pinged when cache pricing changes

Use this tool anywhere