Use-case cost estimate
How Much Does a RAG App Cost?
A retrieval-augmented generation budget has at least three layers: embedding the corpus, storing vectors, and generating answers. Query, reranking, hosting, and network charges can add more.
Default workload assumptions
These values make the example reproducible. They are planning assumptions, not measured usage from your application.
- Documents
- 10,000
- Chunks per document
- 5
- Tokens per chunk
- 250
- Corpus refreshes per month
- 1
- RAG answers per month
- 10,000
- Vector dimensions
- 1,536
Calculator-style cost example
Estimate- Embedding refresh
- $0.275One full corpus refresh per month
- Vector storage assumption
- $0.116$0.25/GiB-month assumption; not a provider quote
- Generated answers
- $12.8010,000 RAG answers per month
Estimated monthly cost
$13.19
Estimated yearly cost
$158.29
text-embedding-3-small
Last verified Jun 21, 2026 · OpenAI embedding model pricing
GPT-4.1 mini
Last verified Jun 21, 2026 · OpenAI model pricing
- Vector storage uses a $0.25 per GiB-month planning assumption.
- Query, compute, reranking, hosting, and network costs are excluded.
Formula
Main cost drivers
- Corpus size, chunk overlap, and refresh policy
- Vector dimensions, metadata, replicas, and index overhead
- Retrieved context tokens per answer
- Query, reranking, and managed database charges
Ways to reduce cost
- Embed only changed documents
- Reduce redundant chunk overlap
- Retrieve fewer high-quality passages
- Tune dimensions, replicas, and metadata intentionally
Frequently asked questions
Is vector storage pricing a provider quote here?
No. The example uses a clearly labeled storage-rate assumption so the storage utility can demonstrate sizing. Replace it with your vendor's rate.
Does the estimate include vector queries?
No. Query units, compute, reranking, and network costs vary by service and are not represented in the current pricing catalog.
Related pricing pages
Related glossary terms
Input tokens
Input tokens are the tokenized units sent to a model, including instructions, user content, conversation history, retrieved context, and tool definitions.
OpenRequests per day
Requests per day is the number of billable API calls made during a day. TokenMath commonly derives it from requests per active user multiplied by active users.
OpenCost per request
Cost per request is the sum of all billable usage generated by one API call, commonly input token cost plus output token cost for a text model.
Open