AI API cost glossary

Cached tokens

Cached tokens are repeated prompt tokens that qualify for a provider's prompt-caching mechanism and may use a separate cache-hit rate.

Why it matters for API cost

Stable prompt prefixes can reduce eligible repeated-input cost, but cache writes, retention, minimum sizes, and hit rules differ by provider.

cached input cost = eligible cache-hit tokens ÷ 1,000,000 × cached-input rate

Do not assume a 100% hit rate. Measure the share of repeated prompt tokens that actually receives cache-hit billing.

Are cache writes priced the same as cache hits?

Not necessarily. Some providers publish separate write and hit rates or retention charges.

How to Calculate LLM Tokens and Estimate API Cost

Learn how to estimate LLM tokens, convert input and output usage into API cost, and build a realistic monthly AI budget.

OpenAI API Pricing Explained: Tokens, Cache, Batch, and Cost

Understand OpenAI API input, output, cached input, batch, image, audio, and embedding pricing with practical formulas.