Use-case cost estimate

How Much Does a RAG App Cost?

A retrieval-augmented generation budget has at least three layers: embedding the corpus, storing vectors, and generating answers. Query, reranking, hosting, and network charges can add more.

Default workload assumptions

These values make the example reproducible. They are planning assumptions, not measured usage from your application.

Documents: 10,000
Chunks per document: 5
Tokens per chunk: 250
Corpus refreshes per month: 1
RAG answers per month: 10,000
Vector dimensions: 1,536

Calculator-style cost example

Estimate

Embedding refresh: $0.275One full corpus refresh per month
Vector storage assumption: $0.116$0.25/GiB-month assumption; not a provider quote
Generated answers: $12.8010,000 RAG answers per month

Estimated monthly cost

$13.19

Estimated yearly cost

$158.29

text-embedding-3-small

Last verified Jun 21, 2026 · OpenAI embedding model pricing

Verified

GPT-4.1 mini

Last verified Jun 21, 2026 · OpenAI model pricing

Verified

Vector storage uses a $0.25 per GiB-month planning assumption.
Query, compute, reranking, hosting, and network costs are excluded.

Formula

monthly RAG estimate = embedding refresh + vector storage assumption + answer-generation cost

Main cost drivers

Corpus size, chunk overlap, and refresh policy
Vector dimensions, metadata, replicas, and index overhead
Retrieved context tokens per answer
Query, reranking, and managed database charges

Ways to reduce cost

Embed only changed documents
Reduce redundant chunk overlap
Retrieve fewer high-quality passages
Tune dimensions, replicas, and metadata intentionally

Frequently asked questions

Is vector storage pricing a provider quote here?

No. The example uses a clearly labeled storage-rate assumption so the storage utility can demonstrate sizing. Replace it with your vendor's rate.

Does the estimate include vector queries?

No. Query units, compute, reranking, and network costs vary by service and are not represented in the current pricing catalog.

Related pricing pages

OpenAI API pricing

Review source-linked OpenAI model and service rates.

Open

Gemini API pricing

Compare Flash, Pro, image, and embedding records.

Open

Anthropic Claude pricing

Review Haiku, Sonnet, and Opus pricing snapshots.

Open

Related glossary terms

Input tokens

Input tokens are the tokenized units sent to a model, including instructions, user content, conversation history, retrieved context, and tool definitions.

Open

Requests per day

Requests per day is the number of billable API calls made during a day. TokenMath commonly derives it from requests per active user multiplied by active users.

Open

Cost per request

Cost per request is the sum of all billable usage generated by one API call, commonly input token cost plus output token cost for a text model.

Open