AI product budgeting

How Much Does an AI Chatbot Cost to Run?

An AI chatbot budget combines model inference with retrieval, storage, observability, and operational overhead. Start with cost per conversation turn, then scale by active users and turns per user.

Formula-driven examplesSource-linked pricing snapshots

Model the conversation turn

Include the system prompt, conversation history, retrieved context, user message, and generated response. History growth can make later turns more expensive than the first.

Use requests per active user per day rather than total signups. An activation assumption that is too optimistic can distort both cost and revenue planning.

monthly model cost = cost per turn × turns per active user per day × active users per day × 30

Add non-model costs

Production chatbots may also pay for embeddings, vector search, reranking, moderation, transcription, image generation, logging, support, and retries. Keep these line items visible instead of hiding them in a single model-rate assumption.

Worked example

GPT-4.1 mini: 1M input tokens + 1M output tokens

Using the versioned rates below, this example workload is estimated at $2.00. This isolates provider usage only and does not include taxes, regional premiums, retries, storage, network traffic, or unrelated infrastructure.

Current pricing references

These versioned records support the examples above. Check the date and provider source before using them in a production forecast.

Provider / model	Input or unit	Output	Status	Source
OpenAI GPT-4.1 mini	$0.40 per 1M tokens	$1.60 / 1M	Verified	OpenAI model pricing Checked Jun 21, 2026
Google Gemini Gemini 2.5 Flash	$0.30 per 1M tokens	$2.50 / 1M	Verified	Google Gemini API pricing Checked Jun 21, 2026
Anthropic Claude Haiku 4.5	$1.00 per 1M tokens	$5.00 / 1M	Verified	Anthropic Claude API pricing Checked Jun 21, 2026

Provider / model

Input or unit

Output

Status

Source

OpenAI

GPT-4.1 mini

$0.40 per 1M tokens

$1.60 / 1M

Verified

OpenAI model pricing

Checked Jun 21, 2026

Google Gemini

Gemini 2.5 Flash

$0.30 per 1M tokens

$2.50 / 1M

Verified

Google Gemini API pricing

Checked Jun 21, 2026

Anthropic

Claude Haiku 4.5

$1.00 per 1M tokens

$5.00 / 1M

Verified

Anthropic Claude API pricing

Checked Jun 21, 2026

Frequently asked questions

What is the biggest chatbot cost driver?

At scale, generated output, repeated context, and request volume are common drivers. The dominant factor depends on the model and conversation design.

How much safety buffer should I add?

Use observed variance when available. Early planning often benefits from a modest buffer for longer responses, prompt wrappers, and traffic uncertainty.

Related calculators and guides

Estimate chatbot API cost

Project per-request through yearly model spend.

Open

Plan a fixed API budget

Convert a monthly budget into request limits.

Open

Related glossary terms

Input tokens

Input tokens are the tokenized units sent to a model, including instructions, user content, conversation history, retrieved context, and tool definitions.

Open

Output tokens

Output tokens are the tokenized units generated by a language model, including visible responses and any billable reasoning or thinking tokens defined by the provider.

Open

Cost per request

Cost per request is the sum of all billable usage generated by one API call, commonly input token cost plus output token cost for a text model.

Open