Skip to main content

AI API cost glossary

Cached tokens

Cached tokens are repeated prompt tokens that qualify for a provider's prompt-caching mechanism and may use a separate cache-hit rate.

Why it matters for API cost

Stable prompt prefixes can reduce eligible repeated-input cost, but cache writes, retention, minimum sizes, and hit rules differ by provider.

Formula

cached input cost = eligible cache-hit tokens ÷ 1,000,000 × cached-input rate

Example

Do not assume a 100% hit rate. Measure the share of repeated prompt tokens that actually receives cache-hit billing.

Frequently asked questions

Are cache writes priced the same as cache hits?

Not necessarily. Some providers publish separate write and hit rates or retention charges.