Embedding cost guide
Embedding Cost Explained: Tokens, Chunks, and Re-indexing
Embedding costs are usually small per million tokens, but duplicate chunks, frequent re-indexing, and uncontrolled document growth can compound. Estimate the initial corpus and ongoing refresh workload separately.
Calculate corpus token volume
Estimate tokens per chunk, multiply by chunk count, then include document updates and overlap. Chunk overlap improves retrieval context but embeds repeated text.
A full refresh reprocesses the entire corpus. Incremental indexing only embeds changed or new content and is often the larger operational saving.
Separate embedding and vector storage
The embedding API charge creates vectors; the vector database has separate storage, indexing, query, replica, and network costs. Budget both layers.
Worked example
text-embedding-3-small: 1M input tokens
Using the versioned rates below, this example workload is estimated at $0.02. This isolates provider usage only and does not include taxes, regional premiums, retries, storage, network traffic, or unrelated infrastructure.
Current pricing references
These versioned records support the examples above. Check the date and provider source before using them in a production forecast.
| Provider / model | Input or unit | Output | Status | Source |
|---|---|---|---|---|
OpenAI text-embedding-3-small | $0.02 per 1M tokens | — | Verified | OpenAI embedding model pricing Checked Jun 21, 2026 |
OpenAI text-embedding-3-large | $0.13 per 1M tokens | — | Verified | OpenAI embedding model pricing Checked Jun 21, 2026 |
Google Gemini Gemini Embedding 001 | $0.15 per 1M tokens | — | Verified | Google Gemini Developer API pricing Checked Jun 21, 2026 |
Frequently asked questions
Do vector dimensions change embedding API cost?
Token-based API pricing is driven by input tokens, though dimensions affect vector storage and may vary by model or configuration.
Should unchanged documents be re-embedded?
Usually not. Content hashes and incremental updates can prevent unnecessary API and indexing work.
Related calculators and guides
Related glossary terms
Input tokens
Input tokens are the tokenized units sent to a model, including instructions, user content, conversation history, retrieved context, and tool definitions.
OpenOutput tokens
Output tokens are the tokenized units generated by a language model, including visible responses and any billable reasoning or thinking tokens defined by the provider.
OpenCost per request
Cost per request is the sum of all billable usage generated by one API call, commonly input token cost plus output token cost for a text model.
Open