# The 5-minute weekly prompt-cache audit your team will actually do

> Most cache audits never happen because nobody owns them. Here's the version that survives — 5 minutes a week, one CSV file, and a GitHub Action that runs it whether the owner remembers or not.

- **Pillar:** Playbooks
- **Author:** Aditya Marin Gasga (Founding Editor)
- **Published:** 2026-05-31T00:00:00.000Z
- **Tags:** caching, production, ops, github-actions, playbook

## TL;DR

Once a week: dump your live rendered system prompt to a text file, run it through a 30-line scoring script (or paste into the diagnostic UI), log the score in a CSV in your repo, alert Slack if it drops below 85. The whole audit takes 5 minutes by hand and 0 minutes once a GitHub Action runs it on cron. The setup pays for itself the first time it catches a regression.

## Key takeaways

1. Most cache audits fail because nobody is named the owner — set a single named person before the cadence, not after.
2. Friday morning, 5 minutes, one Slack message: more frequent gets ignored, less frequent lets regressions compound.
3. The whole loop is three numbers — score, delta from last week, and whether either is below threshold.
4. Automate the boring part with GitHub Actions on cron, so the audit happens even if the owner forgets that Friday.
5. The first run on prod prompts you've never audited will usually find a 30-50% cost leak. Subsequent runs catch regressions before they're large.

import PullQuote from '~/components/article/PullQuote.astro';

Most production teams I've talked to about prompt caching have the same story. They read about cache discounts, they set up logging that reports hit-rate, they look at the dashboard for a week, the number looks fine, and then they stop looking. Six months later, an invoice comes in and someone has to spend a Wednesday afternoon figuring out where the money went.

The audits that catch this kind of drift do exist. They're not technically hard. They fail for the same reason most operational disciplines fail at small companies: nobody owned them. Without a named owner and a calendar slot, "we should audit our prompts" becomes one of the thirty things that should be done weekly and aren't.

This piece is the version of the audit I've seen actually stick. It assumes you've read [Your stated cache hit rate is probably lying to you](/models/your-cache-hit-rate-is-lying) — that piece is the why; this piece is the how. The whole five-step loop is mapped at the top of this piece — by hand the first time, on a cron by the end.

## The shape of an audit that actually runs

Three rules from watching this fail and then watching it work.

**Rule one: one named owner.** Not "the platform team," not "whoever is on-call." A specific person. The owner doesn't have to do the audit themselves — the GitHub Action below does most of it — but they own the existence of the audit, the threshold, and the response when it drops.

**Rule two: weekly cadence, Friday morning.** Daily is too often for what is mostly a stability metric; nobody reads daily reports unless they're on fire. Monthly is too long; by the time you find a regression that's been live for three weeks, you've already paid for it. Friday morning hits the right note: it's recent enough to fix before the weekend, and it's separated from the deploy-heavy middle of the week so the report doesn't get drowned in other noise.

**Rule three: exactly one artifact.** Not a dashboard, not a Notion page, not a Linear ticket. A single CSV in your repo, one row per week, three columns: date, score, delta. The whole point is that the artifact is so small it can't decay. When the audit eventually changes hands, the new owner reads three columns and is fully briefed.

## The 5-minute weekly audit

Five steps. By hand the first time, automated by the end of this playbook.

### Step 1: Dump your live rendered system prompt to a file

Not the template. Not the source. The actual string your application sends to the LLM API on a representative request. The whole gap this audit closes is between "what we think the prompt is" and "what the prompt actually is at runtime."

For most stacks, the easiest dump is a small wrapper around your prompt-rendering function:

```bash
# scripts/dump-prompt.sh
# Renders the production system prompt and writes it to stdout.
node -e "import('./src/prompts/build.js').then(m => process.stdout.write(m.buildSystemPrompt({ env: 'production' })))"
```

If your prompt is built dynamically from config + templates + tool definitions, render it with production config. The number of bytes in the dumped prompt should match (within ±5%) the number of input tokens your API calls report. If it doesn't, you're not auditing the same string the API sees.

### Step 2: Score it

Paste the dump into the [prompt-cache diagnostic](/tools/prompt-cache-diagnostic) — it returns a 0-100 score plus a per-pattern breakdown. Three minutes manual, including the time to read the findings.

For CI, here's a self-contained Node script that runs the same scoring logic:

```js
// scripts/score-cache.mjs
import { readFileSync } from 'node:fs';

const PATTERNS = [
  { id: 'iso-datetime',         regex: /\d{4}-\d{2}-\d{2}[T ]\d{2}:\d{2}:\d{2}/g, points: 25 },
  { id: 'datetime-call',        regex: /(?:datetime\.now\(\)|Date\.now\(\)|new Date\(\)|time\.time\(\))/g, points: 20 },
  { id: 'uuid-literal',         regex: /\b[0-9a-f]{8}-[0-9a-f]{4}-[1-5][0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}\b/gi, points: 30 },
  { id: 'uuid-call',            regex: /(?:uuid\.uuid4\(\)|crypto\.randomUUID\(\)|randomUUID\(\))/g, points: 25 },
  { id: 'random-call',          regex: /(?:Math\.random\(\)|random\.(?:random|choice|randint)\()/g, points: 20 },
  { id: 'tpl-user-id',          regex: /\{\{?\s*(?:user_?id|session_?id|tenant_?id)\s*\}?\}/gi, points: 18 },
  { id: 'tpl-current-time',     regex: /\{\{?\s*(?:timestamp|now|current_?time)\s*\}?\}/gi, points: 22 },
];

const prompt = readFileSync(process.argv[2] || '/dev/stdin', 'utf8');
let score = 100;
const findings = [];

for (const p of PATTERNS) {
  const matches = (prompt.match(p.regex) || []).length;
  if (!matches) continue;
  const deduct = Math.min(matches * p.points, 40);
  score -= deduct;
  findings.push({ id: p.id, count: matches, deducted: deduct });
}

score = Math.max(0, score);
console.log(JSON.stringify({ score, findings, chars: prompt.length }, null, 2));
process.exit(score < 85 ? 1 : 0);
```

Drop that into `scripts/score-cache.mjs`. Pipe a prompt file into it and it returns a JSON report plus exits non-zero if the score is below 85. That non-zero exit is what your CI uses to decide whether to alert.

The threshold of 85 is the point at which Anthropic and OpenAI's caching is essentially uncompromised — you'll still see the full discount on the cached portion. Below 85, you're paying a measurable surcharge. Below 65, you're paying a lot.

### Step 3: Log it

One file in your repo. One row per week. Three columns:

```csv
# docs/cache-audit-log.csv
date,score,delta
2026-05-23,97,+0
2026-05-30,97,+0
2026-06-06,82,-15
2026-06-13,82,+0
```

That's it. The whole audit history fits on one screen for years. The delta column makes regressions obvious without thinking — the only week that requires action is the one where delta is negative and the new score is below threshold.

Commit the file when you update it. The git history doubles as your audit trail.

### Step 4: Alert Slack on a real drop

Define "real drop" tightly so the channel doesn't get noise. The version that worked best for me: alert only when **the score drops below 85 AND the delta is at least −5**. That filters out the natural fluctuation around the threshold (a prompt that's been hovering at 84 isn't news; one that dropped from 95 to 78 is).

Webhook payload — the Slack message a team is most likely to read:

```json
{
  "text": "🚨 Weekly cache audit caught a regression",
  "blocks": [
    {
      "type": "section",
      "text": {
        "type": "mrkdwn",
        "text": "*Cache score dropped to 78* (was 95 last week)\n\n*Findings:*\n• 1× ISO datetime injection (−25)\n• 1× per-user template variable (−18)\n\n*Likely culprit:* recent deploy 2026-06-05.\n*Owner:* @aditya"
      }
    },
    {
      "type": "actions",
      "elements": [
        { "type": "button", "text": { "type": "plain_text", "text": "Open diagnostic" }, "url": "https://signal-ai-8sw.pages.dev/tools/prompt-cache-diagnostic" },
        { "type": "button", "text": { "type": "plain_text", "text": "Open log" }, "url": "https://github.com/your-org/your-repo/blob/main/docs/cache-audit-log.csv" }
      ]
    }
  ]
}
```

Two design notes. First, the owner is @-mentioned in the message itself; the channel knows who responds. Second, the message has a one-click path to the diagnostic and to the audit log — the responder doesn't have to context-switch to find anything.

### Step 5: Make it run without you

The whole audit lives in a GitHub Action that runs every Friday morning regardless of whether anyone remembers. The owner's job is to read the Slack alert when one fires, not to remember to run the script.

```yaml
# .github/workflows/weekly-cache-audit.yml
name: Weekly prompt cache audit
on:
  schedule:
    - cron: '0 14 * * 5'    # Friday 14:00 UTC = 7am PT, 10am ET
  workflow_dispatch: {}     # also allow manual trigger

jobs:
  audit:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: 22 }
      - name: Install deps
        run: npm ci
      - name: Dump production prompt
        run: ./scripts/dump-prompt.sh > /tmp/prompt.txt
      - name: Score it
        id: score
        run: |
          set +e
          node scripts/score-cache.mjs /tmp/prompt.txt > /tmp/result.json
          score=$(node -e "console.log(JSON.parse(require('fs').readFileSync('/tmp/result.json')).score)")
          echo "score=$score" >> $GITHUB_OUTPUT
          echo "Audit score: $score / 100"
      - name: Append to log
        run: |
          today=$(date +%F)
          last_score=$(tail -n 1 docs/cache-audit-log.csv | cut -d, -f2)
          delta=$((${{ steps.score.outputs.score }} - last_score))
          echo "$today,${{ steps.score.outputs.score }},${delta:+}$delta" >> docs/cache-audit-log.csv
          git config user.name "cache-audit-bot"
          git config user.email "bot@example.com"
          git add docs/cache-audit-log.csv
          git diff --quiet --staged || (git commit -m "log: weekly cache audit ${today}" && git push)
      - name: Alert Slack on real drop
        if: steps.score.outputs.score < 85
        env:
          SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK_URL }}
          REPORT: ${{ steps.score.outputs.score }}
        run: |
          curl -X POST -H 'Content-Type: application/json' \
            -d "$(jq -n --arg s "$REPORT" '{text: ("🚨 Cache audit dropped to " + $s + "/100 — see logs")}')" \
            "$SLACK_WEBHOOK"
```

Set `SLACK_WEBHOOK_URL` in your repo secrets (Settings → Secrets and variables → Actions). Push the workflow. You're done.

The Action does four things, in this order: dumps the prompt, scores it, appends to the CSV (committing back to main so the log persists), and alerts Slack if the score is below threshold. You can run it manually from the Actions tab to confirm the pipe works end-to-end before waiting for Friday.

## When the audit catches something — the 15-minute response

The owner gets a Slack alert. The response loop:

1. **Open the linked diagnostic.** Paste the prompt from `/tmp/prompt.txt` (or the most recent commit to `docs/cache-audit-log.csv` if the workflow uploaded it as an artifact). The diagnostic surfaces each finding with the specific fix.
2. **Find the deploy that introduced it.** Compare the failing prompt to the last good version — usually a git blame on whatever template or config file the diagnostic flagged. The culprit is almost always a PR from the last 7 days.
3. **Fix in place or revert.** Move the dynamic content out of the system prompt (into the user message, or into a separate runtime metadata field). Or revert the offending change if the new feature isn't load-bearing yet.
4. **Re-run the audit manually.** Confirm the score is back above 85 before declaring done. Comment the diagnostic score on the original PR.

The whole response loop has been 15-30 minutes every time I've seen it run. The actual cost saved per regression caught is in the thousands per month range on any real production workload.

<PullQuote pillar="playbooks">Operational discipline is what you don't notice. The bill you don't have to investigate next quarter is the audit working.</PullQuote>

## What you'll find on the first run

The first time you run this on a system prompt that's never been audited, expect a score in the 50-80 range. Most prompts have at least one cache leak the team didn't know about — a leftover from a prototype, a logging helper that was supposed to be temporary, a per-user variable from an A/B test that ended six months ago. Cleaning those takes one afternoon and pays back for itself the first week.

After that, weekly audits typically catch one regression every 4-8 weeks — usually small (10-20 point drops), almost always a side effect of a feature deploy that nobody connected to caching. The score will mostly hover near 100 with occasional dips. That's exactly what success looks like.

The audit isn't impressive. It's three numbers in a CSV and a Slack message that rarely fires. That's the entire point — operational discipline is what you don't notice. The bill you don't have to investigate next quarter is the audit working.

If you set this up and the first run finds something material, [the cost calculator](/tools/model-cost-calculator) will tell you what fixing it is worth. The first audit I ever ran on a real customer prompt found two issues totaling roughly $4,200/month in unnecessary spend. Thirty minutes of setup, fifteen minutes of fix. Run yours on Friday.