AI API cost glossary
Cached tokens
Cached tokens are repeated prompt tokens that qualify for a provider's prompt-caching mechanism and may use a separate cache-hit rate.
Why it matters for API cost
Stable prompt prefixes can reduce eligible repeated-input cost, but cache writes, retention, minimum sizes, and hit rules differ by provider.
Formula
Example
Do not assume a 100% hit rate. Measure the share of repeated prompt tokens that actually receives cache-hit billing.
Frequently asked questions
Are cache writes priced the same as cache hits?
Not necessarily. Some providers publish separate write and hit rates or retention charges.
Related guides
How to Calculate LLM Tokens and Estimate API Cost
Learn how to estimate LLM tokens, convert input and output usage into API cost, and build a realistic monthly AI budget.
OpenOpenAI API Pricing Explained: Tokens, Cache, Batch, and Cost
Understand OpenAI API input, output, cached input, batch, image, audio, and embedding pricing with practical formulas.
Open