AI API Cost Glossary
Short, practical definitions for the units and assumptions used in AI API cost estimates.
Input tokens
Input tokens are the tokenized units sent to a model, including instructions, user content, conversation history, retrieved context, and tool definitions.
Read definitionOutput tokens
Output tokens are the tokenized units generated by a language model, including visible responses and any billable reasoning or thinking tokens defined by the provider.
Read definitionCached tokens
Cached tokens are repeated prompt tokens that qualify for a provider's prompt-caching mechanism and may use a separate cache-hit rate.
Read definitionEmbedding tokens
Embedding tokens are input tokens processed by an embedding model to create vector representations for search, clustering, or retrieval.
Read definitionContext window
A context window is the amount of tokenized input and output a model can consider within a request or conversation, subject to provider-specific rules.
Read definitionTokens per minute
Tokens per minute is a throughput or rate-limit measure describing how many input or output tokens an account or model can process during a minute.
Read definitionRequests per day
Requests per day is the number of billable API calls made during a day. TokenMath commonly derives it from requests per active user multiplied by active users.
Read definitionCost per request
Cost per request is the sum of all billable usage generated by one API call, commonly input token cost plus output token cost for a text model.
Read definition