A complete breakdown of how Anthropic charges for Claude API access, which model to use for different tasks, and how to take full advantage of prompt caching and batch discounts. Updated March 2025.

Claude API Pricing Overview

Anthropic charges per million tokens processed. A token is roughly 4 characters of English text. Pricing splits between input tokens (your prompt, context, and history) and output tokens (the model's response). Output tokens are consistently more expensive because generating text requires more computation than reading it.

Model	Input /M tokens	Output /M tokens	Context
Claude 3 Opus	$15.00	$75.00	200K
Claude 3.5 Sonnet	$3.00	$15.00	200K
Claude 3 Sonnet	$3.00	$15.00	200K
Claude 3.5 Haiku	$0.80	$4.00	200K
Claude 3 Haiku	$0.25	$1.25	200K

All Claude models share the same 200K token context window regardless of tier. This is a meaningful practical advantage: you do not need to pay for a more expensive model just to access longer context. Every model from Haiku to Opus processes 200,000 tokens per call.

Opus vs Sonnet vs Haiku: Which Should You Use?

Claude 3 Opus ($15/M input): Frontier tasks only

Opus is Anthropic's most capable model but also its most expensive at $15/M input and $75/M output. It outperforms Sonnet on the most complex reasoning tasks, subtle literary analysis, and nuanced instruction following. Most teams should rarely or never use Opus in production pipelines. Reserve it for tasks where Sonnet consistently fails and the quality difference is measurable and valuable.

Claude 3.5 Sonnet ($3/M input): The default choice

Sonnet is the workhorse model for most production applications. At $3/M input and $15/M output, it sits at a sensible price point for tasks requiring high quality: complex writing, nuanced coding, long document analysis, and agentic tasks that need reliable multi-step reasoning. For most teams, Sonnet handles 80% of tasks that might be reflexively sent to Opus.

Claude 3.5 Haiku ($0.80/M input): High-volume production tasks

Haiku is the cost-optimised model for high-volume applications. At $0.80/M input it is 3.75x cheaper than Sonnet while maintaining strong performance on classification, extraction, summarisation, and structured output generation. For applications processing millions of requests daily, Haiku can reduce costs by 70-80% compared to Sonnet with minimal quality loss on the right tasks.

Prompt Caching: Anthropic's Biggest Cost Feature

Anthropic's prompt caching is one of the most cost-effective features available from any AI provider. Cache writes cost slightly more than standard input tokens, but cache reads cost dramatically less. For Sonnet:

Token Type	Claude 3.5 Sonnet	Claude 3.5 Haiku
Standard input	$3.00/M	$0.80/M
Cache write	$3.75/M	$1.00/M
Cache read	$0.30/M	$0.08/M

Cache reads are 10x cheaper than standard input reads on Sonnet. A cached block costs $0.30/M versus $3.00/M for standard input. The cache is maintained for a minimum of 5 minutes and up to several hours depending on usage patterns.

To use prompt caching effectively: mark the beginning of your prompt (system prompt, static context, retrieval results that do not change between requests) with the cache_control parameter. The minimum cacheable block is 1,024 tokens for Sonnet and Opus, and 2,048 tokens for Haiku.

For an application with a 5,000-token system prompt that makes 50,000 API calls per day: without caching, that is 250 million input tokens per day. With caching and a 90% cache hit rate, it becomes 25 million standard reads and 225 million cache reads. At Sonnet rates, that is the difference between $750/day and ($75 + $67.50) = approximately $143/day. Caching saves over $600 per day in this scenario.

Batch API: 50% Off for Async Processing

Anthropic's Message Batches API processes requests asynchronously within a 24-hour window and charges 50% of the standard per-token rate. This applies to both input and output tokens across all models.

At batch rates, Claude 3.5 Sonnet drops to $1.50/M input and $7.50/M output. Haiku drops to $0.40/M input and $2.00/M output. For offline processing workloads that do not need real-time responses, batch API cuts costs in half with no degradation in output quality.

Good batch API use cases include: training data generation, bulk document analysis, offline classification pipelines, content moderation at scale, and scheduled analytics jobs. Requests that must respond to live users in real-time are not suitable for batch.

Estimating Your Monthly Claude API Bill

Example calculation for a typical RAG application using Claude 3.5 Sonnet:

-1,000 requests per day
-System prompt: 2,000 tokens (cacheable)
-User query: 200 tokens per request
-Average output: 500 tokens per response

Without caching: daily input = 1,000 x 2,200 = 2.2M tokens, output = 1,000 x 500 = 0.5M tokens. Cost = (2.2 x $3) + (0.5 x $15) = $6.60 + $7.50 = $14.10/day = approximately $423/month.

With prompt caching on the 2,000-token system prompt (95% cache hit rate): input = 0.1M standard + 2.1M cache reads. Cache write first call. Output unchanged. Cost = (0.1 x $3) + (2.1 x $0.30) + (0.5 x $15) = $0.30 + $0.63 + $7.50 = $8.43/day = approximately $253/month. Caching saves $170/month in this example.

Claude API Pricing: Opus, Sonnet, Haiku, Caching, and Batch 2025