AI Gateway
Route to 10 AI providers and 35+ models from a single endpoint. OpenAI-compatible. SSE streaming. Prompt preprocessor saves 10–30% tokens before billing. Always 10% below OpenRouter.
Change the model field to switch providers. The API, response format, and streaming behavior stay identical.
The 4-pass prompt preprocessor runs automatically before every request — normalizing, trimming, and optionally compressing your messages before they hit the provider.
// TypeScript — works with any provider
const response = await fetch(
'https://app.neureus.ai/ai/chat',
{
method: 'POST',
headers: {
'Authorization': 'Bearer nr_your_api_key',
'Content-Type': 'application/json',
},
body: JSON.stringify({
// Switch providers by changing this string:
model: 'claude-sonnet-4-6',
// model: 'gpt-4o',
// model: '@wai/llama-3.3-70b', // free
// model: 'gemini-2.5-flash',
// model: 'deepseek-r1',
messages: [
{ role: 'user', content: 'Explain RAG in plain English' }
],
stream: true, // SSE works across all providers
}),
}
);
// Response: OpenAI-compatible SSE stream Workers AI models are always free. Paid models are priced 10% below OpenRouter.
claude-opus-4-8claude-sonnet-4-6claude-haiku-4-5gpt-4ogpt-4o-minio3o4-minigemini-2.5-progemini-2.5-flashgemini-2.0-flashllama-3.3-70bllama-3.1-8bdeepseek-r1deepseek-v3mistral-large-2411mistral-small-3.1codestral-2501command-r-pluscommand-rqwen-2.5-72bqwen-2.5-coder-32bllama-3.1-sonar-large-128k-online@wai/llama-3.3-70b@wai/deepseek-r1-32b@wai/nemotron-120b@wai/qwq-32bThe prompt preprocessor cuts token count before provider billing. You pay for what the model actually processes — consistently less than sending raw to OpenRouter.
| Model | OpenRouter | Neureus (realtime) | Neureus (batch) |
|---|---|---|---|
| GPT-4o | $5.00/1M | $4.50/1M | $3.00/1M |
| Claude Sonnet 4.6 | $3.00/1M | $2.70/1M | $1.80/1M |
| Gemini 2.5 Flash | $0.75/1M | $0.675/1M | ~$0.45/1M |
| Llama 3.3 70B (Workers AI) | $0.59/1M | Free | Free |
| DeepSeek R1 | $0.55/1M | $0.50/1M | ~$0.33/1M |
Batch pricing applies to async jobs (POST /ai/batch). Workers AI models always free on all plans.
On Scale plan and above: store your OpenAI or Anthropic API keys encrypted per tenant (AES-GCM). Route through Neureus's gateway at your own rate — the preprocessor still saves 10–30% tokens on top of your direct pricing.
POST /ai/providers/:provider/rotate — zero downtime// Set your OpenAI key (Scale plan+)
await fetch(
'https://app.neureus.ai/ai/providers/openai',
{
method: 'PUT',
headers: { 'Authorization': 'Bearer nr_key' },
body: JSON.stringify({ apiKey: 'sk-your-key' }),
}
);
// Rotate (re-encrypts under same DEK)
await fetch(
'https://app.neureus.ai/ai/providers/openai/rotate',
{
method: 'POST',
headers: { 'Authorization': 'Bearer nr_key' },
body: JSON.stringify({ newApiKey: 'sk-new-key' }),
}
); Change providers by changing the model ID. claude-* → Anthropic, gpt-* → OpenAI, @wai/* → Workers AI (free). Same API call, automatic dispatch.
Set stream: true to get OpenAI-compatible Server-Sent Events from any provider — including Anthropic and Google — with a unified event format.
4-pass pipeline before every request: normalize → structure → trim (>6K tokens) → compress (opt-in, Llama 3.1 8B). Cuts token count 10–30%.
Async batch jobs via OpenAI + Anthropic Batch APIs. 40% below realtime pricing. Webhook delivery on completion. Use for high-volume, non-urgent workloads.
Bring your own OpenAI or Anthropic keys on Scale plan+. Stored AES-GCM encrypted per tenant. Rotate keys via PUT /ai/providers/:provider/rotate without downtime.
300+ Cloudflare locations. Zero cold starts (Workers run on V8 isolates, not containers). p95 latency <80ms globally.
500 Neurons/month free. No credit card. Streaming, BYOK, and batch inference included.