🚨 Stop coding manually. Master vibe coding for free. Join the Course _

AI Model Comparison   •   Tutorials   •   OpenAI   •   Anthropic   •   Browse Models 

🚨 Master vibe coding for free. Join the Course _

What is Claude Prompt Caching? How does it work?

Article Contents

Build & Ship 10x Faster

Switch to Claude Opus 4.5 on Bind AI and experience the next frontier of reasoning.

Try Claude Opus 4.5

Claude just rolled out prompt caching in the Anthropic API which cuts API input costs by up to 90% and reduces latency by up to 80%. Claude 3.5 Sonnet is one of the most advanced LLM available, it is also one of the most expensive models. The current price for their sonnet model is $3 per million input tokens and $15 per million output tokens, which is substantial.

Who is Prompt Caching useful for?

Prompt caching is useful for use cases or applications where you have a repeated use of a single prompt. 

How does Claude Prompt Caching Work

To use Claude prompt caching you need to use the Anthropic API and add the cache control attribute to the content you want to cache. This is what it looks like, along with its header

“cache_control”: {“type”: “ephemeral”}

“anthropic-beta”: “prompt-caching-2024-07-31”

 

When you make an API call with these additions, it checks if the designated parts of your prompt are already cached from a recent query. If so, it uses the cached prompt, speeding up processing time and reducing costs.

Costs for Prompt Caching

The initial API call costs $3.75 per million tokens, which is a bit expensive  and it accounts for storing the prompt in the cache. After the cache, all subsequent calls are one-tenth the normal price, $0.3 per million tokens per read.

Prompt caching works in multi-turn conversations too. You can progressively move the cache control breakpoints to cache previous turns as the conversation advances. This is useful in combo with features like Tool Use, which may add many tokens to the context window each turn.

Faster response Latencies

You can get upto 79% faster responses for cached prompts with a potential 90% cost reduction. Claude expects further latency improvements over the coming weeks, particularly for shorter prompts of a few thousand tokens in length.

– Cache lifetime (TTL) is 5 minutes, resetting with each use

– Prompts are cached at 1024-token boundaries

– You can define up to 4 cache breakpoints in a prompt

– Support for caching prompts shorter than 1024 tokens is coming soon

Here are some helpful links to read more about it

AI_INIT(); WHILE (IDE_OPEN) { VIBE_CHECK(); PROMPT_TO_PROFIT(); SHIP_IT(); } // 100% SUCCESS_RATE // NO_DEBT_FOUND

Your FreeVibe Coding Manual_

Join Bind AI’s Vibe Coding Course to learn vibe coding fundamentals, ship real apps, and convert it from a hobby to a profession. Learn the math behind web development, build real-world projects, and get 50 IDE credits.

ENROLL FOR FREE _
No credit Card Required | Beginner Friendly

Build whatever you want, however you want, with Bind AI.