Prompt caching lets a model reuse the processed form of a stable context prefix instead of recomputing it on every turn, turning the context ordering decisions — stable content first, volatile content last — directly into lower latency and spend. This page will cover token economics for agentic workloads: cache hit rate, what invalidates a cache, request batching, and model selection and routing.
For a concrete instance — why keeping CLAUDE.md and tool schemas stable at the front of the window makes a session cheap — see the section’s running example, Claude Code: writing a page for this wiki.
This page is a stub. In the meantime:
- Anthropic — Prompt caching
- Anthropic — Message Batches
- Anthropic — Pricing