AI SaaS unit economics
How to Estimate AI SaaS API Costs and Gross Margin
AI SaaS pricing needs a usage model, not a single model rate. Tie inference cost to active behavior, add variable infrastructure, and compare the result with revenue per paying account.
Build a per-active-user cost
Define requests per active user, input and output tokens per request, active days, and the mix of models or modalities. Segment power users because averages can hide a loss-making tail.
Apply the same assumptions to free trials, paid tiers, and internal usage. Rate limits and included credits can make variable cost more predictable.
Stress-test the assumptions
Run a base case, high-usage case, and provider-price-change case. Include retries, failed generations, background jobs, evaluation traffic, and customer support usage.
Worked example
GPT-4.1 mini: 1M input tokens + 1M output tokens
Using the versioned rates below, this example workload is estimated at $2.00. This isolates provider usage only and does not include taxes, regional premiums, retries, storage, network traffic, or unrelated infrastructure.
Current pricing references
These versioned records support the examples above. Check the date and provider source before using them in a production forecast.
| Provider / model | Input or unit | Output | Status | Source |
|---|---|---|---|---|
OpenAI GPT-4.1 mini | $0.40 per 1M tokens | $1.60 / 1M | Verified | OpenAI model pricing Checked Jun 21, 2026 |
Google Gemini Gemini 2.5 Flash | $0.30 per 1M tokens | $2.50 / 1M | Verified | Google Gemini API pricing Checked Jun 21, 2026 |
Anthropic Claude Sonnet 4.6 | $3.00 per 1M tokens | $15.00 / 1M | Verified | Anthropic Claude API pricing Checked Jun 21, 2026 |
Frequently asked questions
Should AI API cost be part of cost of goods sold?
Usage-driven inference and related infrastructure are commonly treated as variable service-delivery costs for unit-economics analysis. Confirm accounting treatment with a qualified professional.
How do I control heavy-user cost?
Use transparent quotas, credits, rate limits, model routing, caching, and tier-specific limits informed by measured usage distributions.
Related calculators and guides
Related glossary terms
Input tokens
Input tokens are the tokenized units sent to a model, including instructions, user content, conversation history, retrieved context, and tool definitions.
OpenOutput tokens
Output tokens are the tokenized units generated by a language model, including visible responses and any billable reasoning or thinking tokens defined by the provider.
OpenCost per request
Cost per request is the sum of all billable usage generated by one API call, commonly input token cost plus output token cost for a text model.
Open