Calculate and Compare AI Model Costs
Compare pricing across 40+ models from 10 providers. Toggle caching & batch discounts. Paste text to count tokens instantly.
| Model | Provider | Input/1M | Output/1M | Monthly ▼ |
|---|
AI providers charge per token — a fragment of text roughly equivalent to three-quarters of a word. Every request you send (input) and every response generated (output) is metered in tokens. Output tokens cost 2 to 5 times more because generation requires sequential computation, while input can be processed in parallel.
The market now spans over 40 commercially available models from 10+ providers, ranging from Mistral Nemo at $0.02 per million input tokens to GPT-5.5 Pro at $30.00 — a 1,500x spread. Choosing the right model for your workload is the single biggest lever on your AI bill.
Each provider uses a different tokenizer to split text into tokens. OpenAI uses tiktoken (based on byte-pair encoding), Anthropic uses its own BPE tokenizer, and Google uses SentencePiece. The same 1,000-word English paragraph might produce ~1,300 tokens on one model and ~1,200 on another — a difference of about 5-10%. For cost planning this variance is minor, but for precise budgeting at scale, paste your actual prompts into each provider's token counter.
Models like GPT-4.1 Nano ($0.10/M), Grok 4.1 Fast ($0.20/M), DeepSeek V3 ($0.25/M), and Gemini 2.5 Flash ($0.15/M) handle classification, formatting, extraction, and simple Q&A at a fraction of the cost. For high-volume, latency-tolerant workloads, these offer exceptional value.
GPT-4.1 ($2/$8), GPT-5 ($0.63/$5), Claude Haiku 4.5 ($1/$5), Mistral Large 3 ($2/$6), and DeepSeek R1 ($0.55/$2.19) hit the balance of quality and cost for content generation, coding, and analysis. Most production applications land here.
Claude Sonnet 4.6 ($3/$15), GPT-5.4 ($2.50/$15), and Gemini 2.5 Pro ($1.25/$10) deliver the best output quality for complex reasoning. GPT-5.5 ($5/$30) and Claude Opus 4.7 ($5/$25) are the ceiling — reserved for tasks where accuracy is paramount and volume is low.
First, enable prompt caching to avoid reprocessing stable system prompts — Anthropic offers 90% off cached reads (with 1.25x write cost for 5-min TTL or 2x for 1-hour TTL), OpenAI offers up to 75% off, and Google offers similar discounts. Second, route non-urgent work through Batch APIs for a flat 50% discount on all tokens — all three major providers support this. Third, start with the cheapest model that meets your quality threshold and only upgrade where output noticeably improves.
Some providers charge more when you use extended context windows. Google Gemini doubles input and output pricing when prompts exceed 200K tokens. Most other providers charge flat rates regardless of context usage. If you frequently use long contexts, factor this into your cost comparison.
The estimator uses ~1 token per 4 characters, the industry-standard heuristic. It's accurate within 5-10% for English text. Each provider uses a different tokenizer (OpenAI's tiktoken, Anthropic's BPE tokenizer, Google's SentencePiece), so actual counts vary slightly. For precise counts, use each provider's official tokenizer tool.
All prices were verified against official provider pricing pages in April 2026. AI providers adjust rates frequently — OpenAI in particular has released multiple model generations (GPT-5, 5.4, 5.5) with different price points in 2025-2026. We recommend confirming on the provider's page before committing budget.
Prompt caching stores frequently-used prompt prefixes so they don't need reprocessing. With Anthropic, cache writes cost 1.25x (5-min TTL) or 2x (1-hour TTL) of the base input price, but cache reads cost only 10% — so it pays off after just 1-2 reads. OpenAI offers up to 75% off cached input. Google offers similar discounts plus a per-hour storage fee for cached context.
Batch APIs process requests asynchronously within 24 hours instead of real-time, in exchange for a flat 50% discount. Anthropic, OpenAI, and Google all offer batch endpoints. Great for bulk content generation, evaluation runs, and any workload that doesn't need instant responses.
Google Gemini models use tiered pricing: standard rates for prompts up to 200K tokens, and 2x rates for prompts exceeding 200K. This calculator uses the standard (under 200K) pricing. If you regularly use very long contexts with Gemini, your actual costs will be higher.
Reasoning models like DeepSeek R1, o3, o4-mini, and Gemini 2.5 Flash can produce internal "thinking" tokens during processing. For most models these are billed at the standard output rate. Gemini 2.5 Flash charges $3.50/M for thinking tokens vs $0.60/M for non-thinking output — a significant difference. This calculator uses standard output pricing.