Claude Prompt Caching: How It Works and How Much It Saves (2026)
Prompt caching is one of the most effective ways to cut Claude API costs. Here is exactly how it works, what it costs to set up, and worked examples showing real savings.
What Is Prompt Caching?
Prompt caching is an API feature that stores repeated context (like a large system prompt or shared document) in fast-retrieval storage. Instead of paying full input token rates every time you include that content, cached reads cost 90% less. The trade-off: the first time you cache content costs 25% more than standard. After that, every read is dramatically cheaper.
Prompt Caching Pricing (Sonnet 4.6)
Standard input
$3.00
per 1M tokens
Cache write (25% premium)
$3.75
per 1M tokens (first cache)
Cache read (90% off)
$0.30
per 1M tokens (subsequent)
| Model | Standard Input | Cache Write | Cache Read |
|---|---|---|---|
| Claude Opus 4.6 | $5.00 | $6.25 | $0.50 |
| Claude Sonnet 4.6 | $3.00 | $3.75 | $0.30 |
| Claude Haiku 3.5 | $0.80 | $1.00 | $0.08 |
Worked Savings Examples
Customer support app with 10,000-token system prompt
Without caching
1,000 API calls x 10,000 tokens x $3/1M = $30.00
With caching
1 cache write: 10,000 x $3.75/1M = $0.038 999 cache reads: 10,000 x $0.30/1M = $2.997 Total: $3.04
90% saving - from $30 to $3.04
Code review app with 50,000-token codebase as context
Without caching
200 reviews x 50,000 tokens x $3/1M = $30.00
With caching
1 cache write: 50,000 x $3.75/1M = $0.19 199 cache reads: 50,000 x $0.30/1M = $2.99 Total: $3.18
89% saving - from $30 to $3.18
Cache TTL: The 5-Minute Rule
Cache entries have a 5-minute Time To Live (TTL). Each request that uses the cache refreshes the TTL by another 5 minutes. This means:
- Active apps: As long as requests come in at least once every 5 minutes, the cache stays warm indefinitely.
- Bursty apps: If you have a gap of more than 5 minutes between requests, the cache expires and the next request pays cache write rate again.
- Overnight jobs: Batch API is a better fit than caching for overnight jobs - no warm cache to maintain.
Prompt Caching vs Batch API: Which to Use?
Use prompt caching when...
- + You need real-time responses (<1 second)
- + You have a large static system prompt repeated every request
- + Your app has consistent traffic with short gaps between requests
- + You want to cache shared documents, codebases, or few-shot examples
Use Batch API when...
- + Results within 24 hours are acceptable
- + Processing large volumes at lowest possible cost (50% off)
- + Overnight runs, bulk analysis, evaluation jobs
- + Traffic is bursty with long idle periods (cache would expire)