Skip to main content

Embedding cost guide

Embedding Cost Explained: Tokens, Chunks, and Re-indexing

Embedding costs are usually small per million tokens, but duplicate chunks, frequent re-indexing, and uncontrolled document growth can compound. Estimate the initial corpus and ongoing refresh workload separately.

Formula-driven examplesSource-linked pricing snapshots

Calculate corpus token volume

Estimate tokens per chunk, multiply by chunk count, then include document updates and overlap. Chunk overlap improves retrieval context but embeds repeated text.

A full refresh reprocesses the entire corpus. Incremental indexing only embeds changed or new content and is often the larger operational saving.

embedding cost = total embedded tokens ÷ 1,000,000 × model input rate

Separate embedding and vector storage

The embedding API charge creates vectors; the vector database has separate storage, indexing, query, replica, and network costs. Budget both layers.

Worked example

text-embedding-3-small: 1M input tokens

Using the versioned rates below, this example workload is estimated at $0.02. This isolates provider usage only and does not include taxes, regional premiums, retries, storage, network traffic, or unrelated infrastructure.

Current pricing references

These versioned records support the examples above. Check the date and provider source before using them in a production forecast.

Provider / modelInput or unitOutputStatusSource

OpenAI

text-embedding-3-small

$0.02 per 1M tokensVerifiedOpenAI embedding model pricing

Checked Jun 21, 2026

OpenAI

text-embedding-3-large

$0.13 per 1M tokensVerifiedOpenAI embedding model pricing

Checked Jun 21, 2026

Google Gemini

Gemini Embedding 001

$0.15 per 1M tokensVerifiedGoogle Gemini Developer API pricing

Checked Jun 21, 2026

Frequently asked questions

Do vector dimensions change embedding API cost?

Token-based API pricing is driven by input tokens, though dimensions affect vector storage and may vary by model or configuration.

Should unchanged documents be re-embedded?

Usually not. Content hashes and incremental updates can prevent unnecessary API and indexing work.

Related calculators and guides