AI API Cost Glossary

Short, practical definitions for the units and assumptions used in AI API cost estimates.

Input tokens

Input tokens are the tokenized units sent to a model, including instructions, user content, conversation history, retrieved context, and tool definitions.

Read definition

Output tokens

Output tokens are the tokenized units generated by a language model, including visible responses and any billable reasoning or thinking tokens defined by the provider.

Read definition

Cached tokens

Cached tokens are repeated prompt tokens that qualify for a provider's prompt-caching mechanism and may use a separate cache-hit rate.

Read definition

Embedding tokens

Embedding tokens are input tokens processed by an embedding model to create vector representations for search, clustering, or retrieval.

Read definition

Context window

A context window is the amount of tokenized input and output a model can consider within a request or conversation, subject to provider-specific rules.

Read definition

Tokens per minute

Tokens per minute is a throughput or rate-limit measure describing how many input or output tokens an account or model can process during a minute.

Read definition

Requests per day

Requests per day is the number of billable API calls made during a day. TokenMath commonly derives it from requests per active user multiplied by active users.

Read definition

Cost per request

Cost per request is the sum of all billable usage generated by one API call, commonly input token cost plus output token cost for a text model.

Read definition