Free Tool
Enter your monthly token usage and instantly compare API costs across 35+ models from OpenAI, Anthropic, Google, Meta, Mistral, DeepSeek, and more.
LLM APIs charge per token, not per character or word. One token is roughly 4 characters of English text — a typical 1,000-word document is about 1,300 tokens.
The prompt you send to the model: system instructions, conversation history, context, and the user's message. Input tokens are usually cheaper than output.
The model's response. Output tokens typically cost 3–5× more than input tokens because they're generated one at a time, while input is processed in batch.
Multiply your daily token counts by 30. For a chatbot handling 1,000 conversations/day with average 500 input + 200 output tokens: 15M input + 6M output per month.
Neureus routes every request through a response cache and prompt compression layer, returning savings to you as a permanent 10% discount off OpenRouter published rates.
All 35 models available via one Neureus API key, priced 10% below OpenRouter.
| Model | Context | OpenRouter Input /M | OpenRouter Output /M | Neureus Input /M | Neureus Output /M |
|---|---|---|---|---|---|
| Llama 3.2 1B Cloudflare AI | 128K | $0.027 | $0.20 | $0.024 | $0.18 |
| Llama 3.1 8B Instruct Meta | 128K | $0.050 | $0.050 | $0.045 | $0.045 |
| Llama 3.2 3B Cloudflare AI | 128K | $0.051 | $0.34 | $0.046 | $0.30 |
| Qwen 2.5 Coder 32B Qwen | 33K | $0.070 | $0.16 | $0.063 | $0.14 |
| Gemini 2.0 Flash Google | 1M | $0.10 | $0.40 | $0.090 | $0.36 |
| Mistral Small 3.1 Mistral AI | 128K | $0.10 | $0.30 | $0.090 | $0.27 |
| GPT-4o mini OpenAI | 128K | $0.15 | $0.60 | $0.14 | $0.54 |
| Gemini 2.5 Flash Google | 1M | $0.15 | $0.60 | $0.14 | $0.54 |
| Command R Cohere | 128K | $0.15 | $0.60 | $0.14 | $0.54 |
| DeepSeek V3 DeepSeek | 66K | $0.27 | $1.10 | $0.24 | $0.99 |
| Llama 4 Scout 17B Cloudflare AI | 128K | $0.27 | $0.85 | $0.24 | $0.77 |
| Llama 3.3 70B Cloudflare AI | 128K | $0.29 | $2.25 | $0.26 | $2.03 |
| Codestral Mistral AI | 256K | $0.30 | $0.90 | $0.27 | $0.81 |
| Llama 3.1 70B Instruct Meta | 128K | $0.35 | $0.40 | $0.32 | $0.36 |
| Qwen 2.5 72B Instruct Qwen | 128K | $0.35 | $0.40 | $0.32 | $0.36 |
| Mistral Small 3.1 24B Cloudflare AI | 128K | $0.35 | $0.56 | $0.32 | $0.50 |
| GPT-4.1 mini OpenAI | 1M | $0.40 | $1.60 | $0.36 | $1.44 |
| DeepSeek R1 32B Cloudflare AI | 32K | $0.50 | $4.88 | $0.45 | $4.39 |
| Nemotron 120B Cloudflare AI | 128K | $0.50 | $1.50 | $0.45 | $1.35 |
| DeepSeek R1 DeepSeek | 66K | $0.55 | $2.19 | $0.50 | $1.97 |
| Llama 3.3 70B Instruct Meta | 128K | $0.59 | $0.79 | $0.53 | $0.71 |
| QwQ 32B Cloudflare AI | 32K | $0.66 | $1.00 | $0.59 | $0.90 |
| Qwen Coder 32B Cloudflare AI | 32K | $0.66 | $1.00 | $0.59 | $0.90 |
| Claude Haiku 4.5 Anthropic | 200K | $0.80 | $4.00 | $0.72 | $3.60 |
| Kimi K2 Cloudflare AI | 128K | $0.95 | $4.00 | $0.85 | $3.60 |
| Sonar Large (Online) Perplexity | 127K | $1.00 | $1.00 | $0.90 | $0.90 |
| o4-mini OpenAI | 200K | $1.10 | $4.40 | $0.99 | $3.96 |
| Gemini 2.5 Pro Google | 1M | $1.25 | $10.00 | $1.13 | $9.00 |
| GPT-4.1 OpenAI | 1M | $2.00 | $8.00 | $1.80 | $7.20 |
| Mistral Large Mistral AI | 128K | $2.00 | $6.00 | $1.80 | $5.40 |
| GPT-4o OpenAI | 128K | $2.50 | $10.00 | $2.25 | $9.00 |
| Command R+ Cohere | 128K | $2.50 | $10.00 | $2.25 | $9.00 |
| Claude Sonnet 4.6 Anthropic | 200K | $3.00 | $15.00 | $2.70 | $13.50 |
| o3 OpenAI | 200K | $10.00 | $40.00 | $9.00 | $36.00 |
| Claude Opus 4 Anthropic | 200K | $15.00 | $75.00 | $13.50 | $67.50 |
Prices in USD per million tokens. Updated monthly from provider published rates. Neureus price = OpenRouter rate × 0.90.
The calculator uses published API pricing from each provider (via OpenRouter's rate card), updated monthly. Actual costs may vary slightly if providers change pricing mid-month or if you hit rate-limit tiers.
Input tokens include your system prompt, conversation history, any documents you send, and the user's message. Output tokens are the model's reply. Most LLMs charge more for output than input (usually 3–5× more).
Neureus runs a semantic response cache and prompt compression layer in front of every model. Repeated or similar queries are served from cache at $0 provider cost, and compressed prompts cost fewer tokens. The savings are passed to you as a flat 10% discount.
For cost-sensitive production workloads, Llama 3.3 70B and Mistral Large are strong choices. For highest quality, GPT-4.1 or Claude Opus 4. For reasoning tasks, DeepSeek R1 and o3 lead benchmarks. Browse the full model catalog for capability details.
Yes. Neureus supports BYOK (bring your own key) — you supply your provider key, Neureus routes through it, and you keep the direct billing relationship with the provider while still getting Neureus's caching and routing layer.
Free tier includes 5M AI tokens. No credit card. Access every model in this calculator through one API key — 10% cheaper than going direct.