How does prompt caching work?

Stores reused prompt prefixes. Anthropic: 10% read cost. OpenAI: up to 75% off. Google: similar discounts.

What is the Batch API?

Processes requests async within 24 hours for a flat 50% discount.

Price My Prompt — Calculate and Compare AI Model Costs

Q: How accurate is the token estimate?

Uses ~1 token per 4 characters, accurate within 5-10% for English. Each provider uses a different tokenizer.

Q: Are these prices up to date?

All prices verified against official provider pricing pages in April 2026.

Q: What are thinking tokens?

Reasoning models produce internal thinking tokens. Most bill at standard output rate.

Paste or type your text

0 chars

estimated tokens

words

Approximate (~1 token per 4 characters). Each provider uses a different tokenizer so counts vary by ~5-10%. OpenAI's tiktoken, Anthropic's tokenizer, and Google's SentencePiece each split text slightly differently.

Cost as input tokens

Your usage

Requests per day

100

Days per month

Avg input tokens / request

500

~375 words

Avg output tokens / request

1,000

~750 words

Cost optimization

Prompt caching Batch API (50% off)

Prompt caching reuses your system prompt across requests. Cache reads cost 10% of input price (Anthropic), 25% off (OpenAI), or 75% off (Google). Batch API runs requests async within 24h for 50% off.

Results

Model	Provider	Input/1M	Output/1M	Monthly ▼

How AI LLM Pricing Works in 2026

AI providers charge per token — a fragment of text roughly equivalent to three-quarters of a word. Every request you send (input) and every response generated (output) is metered in tokens. Output tokens cost 2 to 5 times more because generation requires sequential computation, while input can be processed in parallel.

The market now spans over 40 commercially available models from 10+ providers, ranging from Mistral Nemo at $0.02 per million input tokens to GPT-5.5 Pro at $30.00 — a 1,500x spread. Choosing the right model for your workload is the single biggest lever on your AI bill.

How tokenization differs across providers

Each provider uses a different tokenizer to split text into tokens. OpenAI uses tiktoken (based on byte-pair encoding), Anthropic uses its own BPE tokenizer, and Google uses SentencePiece. The same 1,000-word English paragraph might produce ~1,300 tokens on one model and ~1,200 on another — a difference of about 5-10%. For cost planning this variance is minor, but for precise budgeting at scale, paste your actual prompts into each provider's token counter.

Budget models: when cheap is good enough

Models like GPT-4.1 Nano ($0.10/M), Grok 4.1 Fast ($0.20/M), DeepSeek V3 ($0.25/M), and Gemini 2.5 Flash ($0.15/M) handle classification, formatting, extraction, and simple Q&A at a fraction of the cost. For high-volume, latency-tolerant workloads, these offer exceptional value.

Mid-range models: the production sweet spot

GPT-4.1 ($2/$8), GPT-5 ($0.63/$5), Claude Haiku 4.5 ($1/$5), Mistral Large 3 ($2/$6), and DeepSeek R1 ($0.55/$2.19) hit the balance of quality and cost for content generation, coding, and analysis. Most production applications land here.

Premium and flagship: when quality justifies the premium

Claude Sonnet 4.6 ($3/$15), GPT-5.4 ($2.50/$15), and Gemini 2.5 Pro ($1.25/$10) deliver the best output quality for complex reasoning. GPT-5.5 ($5/$30) and Claude Opus 4.7 ($5/$25) are the ceiling — reserved for tasks where accuracy is paramount and volume is low.

Three ways to cut your AI LLM bill

First, enable prompt caching to avoid reprocessing stable system prompts — Anthropic offers 90% off cached reads (with 1.25x write cost for 5-min TTL or 2x for 1-hour TTL), OpenAI offers up to 75% off, and Google offers similar discounts. Second, route non-urgent work through Batch APIs for a flat 50% discount on all tokens — all three major providers support this. Third, start with the cheapest model that meets your quality threshold and only upgrade where output noticeably improves.

Context window pricing tiers

Some providers charge more when you use extended context windows. Google Gemini doubles input and output pricing when prompts exceed 200K tokens. Most other providers charge flat rates regardless of context usage. If you frequently use long contexts, factor this into your cost comparison.

Frequently asked questions

How accurate is the token estimate?

The estimator uses ~1 token per 4 characters, the industry-standard heuristic. It's accurate within 5-10% for English text. Each provider uses a different tokenizer (OpenAI's tiktoken, Anthropic's BPE tokenizer, Google's SentencePiece), so actual counts vary slightly. For precise counts, use each provider's official tokenizer tool.

Are these prices up to date?

All prices were verified against official provider pricing pages in April 2026. AI providers adjust rates frequently — OpenAI in particular has released multiple model generations (GPT-5, 5.4, 5.5) with different price points in 2025-2026. We recommend confirming on the provider's page before committing budget.

How does prompt caching actually work?

Prompt caching stores frequently-used prompt prefixes so they don't need reprocessing. With Anthropic, cache writes cost 1.25x (5-min TTL) or 2x (1-hour TTL) of the base input price, but cache reads cost only 10% — so it pays off after just 1-2 reads. OpenAI offers up to 75% off cached input. Google offers similar discounts plus a per-hour storage fee for cached context.

What's the Batch API and who offers it?

Batch APIs process requests asynchronously within 24 hours instead of real-time, in exchange for a flat 50% discount. Anthropic, OpenAI, and Google all offer batch endpoints. Great for bulk content generation, evaluation runs, and any workload that doesn't need instant responses.

Why do some models show different pricing tiers for context length?

Google Gemini models use tiered pricing: standard rates for prompts up to 200K tokens, and 2x rates for prompts exceeding 200K. This calculator uses the standard (under 200K) pricing. If you regularly use very long contexts with Gemini, your actual costs will be higher.

What are "thinking tokens" and do they cost extra?

Reasoning models like DeepSeek R1, o3, o4-mini, and Gemini 2.5 Flash can produce internal "thinking" tokens during processing. For most models these are billed at the standard output rate. Gemini 2.5 Flash charges $3.50/M for thinking tokens vs $0.60/M for non-thinking output — a significant difference. This calculator uses standard output pricing.