Gemini cost comparison
Gemini Flash vs Pro Pricing: Cost Comparison Guide
Gemini Flash and Pro target different cost and capability profiles. The defensible comparison applies the same input, output, and request volume to both models and checks whether long-context pricing changes the rate.
Compare total request cost, not input price alone
A workload with short prompts and long generated answers is driven heavily by output pricing. A retrieval workload with large context can be input-heavy. Use your actual ratio when comparing Flash and Pro.
If a model publishes a higher long-context tier, crossing the threshold can affect both input and output rates for that request.
Treat model quality as a separate decision
The lower priced model is not automatically the lower-cost product choice. Retry rates, failure handling, latency, and task quality can change the effective cost per successful outcome.
Worked example
Gemini 2.5 Flash: 1M input tokens + 1M output tokens
Using the versioned rates below, this example workload is estimated at $2.80. This isolates provider usage only and does not include taxes, regional premiums, retries, storage, network traffic, or unrelated infrastructure.
Current pricing references
These versioned records support the examples above. Check the date and provider source before using them in a production forecast.
| Provider / model | Input or unit | Output | Status | Source |
|---|---|---|---|---|
Google Gemini Gemini 2.5 Flash | $0.30 per 1M tokens | $2.50 / 1M | Verified | Google Gemini API pricing Checked Jun 21, 2026 |
Google Gemini Gemini 2.5 Pro | $1.25 per 1M tokens | $10.00 / 1M | Verified | Google Gemini API pricing Checked Jun 21, 2026 |
Frequently asked questions
Is Gemini Flash always cheaper than Pro?
Its listed token rates are lower in this snapshot, but total product cost also depends on quality, retries, context size, and operational requirements.
How should I compare long prompts?
Use the full prompt token count and apply any published long-context tier before multiplying by traffic.
Related calculators and guides
Related glossary terms
Input tokens
Input tokens are the tokenized units sent to a model, including instructions, user content, conversation history, retrieved context, and tool definitions.
OpenOutput tokens
Output tokens are the tokenized units generated by a language model, including visible responses and any billable reasoning or thinking tokens defined by the provider.
OpenCost per request
Cost per request is the sum of all billable usage generated by one API call, commonly input token cost plus output token cost for a text model.
Open