AI Gateway

One API. Any Model.
Best Price.

Route to 10 AI providers and 35+ models from a single endpoint. OpenAI-compatible. SSE streaming. Prompt preprocessor saves 10–30% tokens before billing. Always 10% below OpenRouter.

10providers
35+models
300+edge locations
<80msp95 latency
0cold starts

One endpoint, any provider

Change the model field to switch providers. The API, response format, and streaming behavior stay identical.

The 4-pass prompt preprocessor runs automatically before every request — normalizing, trimming, and optionally compressing your messages before they hit the provider.

// TypeScript — works with any provider
const response = await fetch(
  'https://app.neureus.ai/ai/chat',
  {
    method: 'POST',
    headers: {
      'Authorization': 'Bearer nr_your_api_key',
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      // Switch providers by changing this string:
      model: 'claude-sonnet-4-6',
      // model: 'gpt-4o',
      // model: '@wai/llama-3.3-70b',  // free
      // model: 'gemini-2.5-flash',
      // model: 'deepseek-r1',
      messages: [
        { role: 'user', content: 'Explain RAG in plain English' }
      ],
      stream: true,  // SSE works across all providers
    }),
  }
);
// Response: OpenAI-compatible SSE stream

All supported providers

Workers AI models are always free. Paid models are priced 10% below OpenRouter.

Anthropic Top quality
  • claude-opus-4-8
  • claude-sonnet-4-6
  • claude-haiku-4-5
OpenAI Most popular
  • gpt-4o
  • gpt-4o-mini
  • o3
  • o4-mini
Google Long context
  • gemini-2.5-pro
  • gemini-2.5-flash
  • gemini-2.0-flash
Meta Llama Free via Workers AI
  • llama-3.3-70b
  • llama-3.1-8b
DeepSeek Reasoning
  • deepseek-r1
  • deepseek-v3
Mistral Code + edge
  • mistral-large-2411
  • mistral-small-3.1
  • codestral-2501
Cohere RAG-optimized
  • command-r-plus
  • command-r
Qwen Free via Workers AI
  • qwen-2.5-72b
  • qwen-2.5-coder-32b
Perplexity Online search
  • llama-3.1-sonar-large-128k-online
Cloudflare Workers AI Always free
  • @wai/llama-3.3-70b
  • @wai/deepseek-r1-32b
  • @wai/nemotron-120b
  • @wai/qwq-32b

Always 10% below OpenRouter

The prompt preprocessor cuts token count before provider billing. You pay for what the model actually processes — consistently less than sending raw to OpenRouter.

Model OpenRouter Neureus (realtime) Neureus (batch)
GPT-4o $5.00/1M $4.50/1M $3.00/1M
Claude Sonnet 4.6 $3.00/1M $2.70/1M $1.80/1M
Gemini 2.5 Flash $0.75/1M $0.675/1M ~$0.45/1M
Llama 3.3 70B (Workers AI) $0.59/1M Free Free
DeepSeek R1 $0.55/1M $0.50/1M ~$0.33/1M

Batch pricing applies to async jobs (POST /ai/batch). Workers AI models always free on all plans.

Bring your own provider keys

On Scale plan and above: store your OpenAI or Anthropic API keys encrypted per tenant (AES-GCM). Route through Neureus's gateway at your own rate — the preprocessor still saves 10–30% tokens on top of your direct pricing.

  • AES-GCM encryption per tenant — keys never stored in plaintext
  • Rotate keys via POST /ai/providers/:provider/rotate — zero downtime
  • BYOK overlays global secrets — set once, use across all API calls
// Set your OpenAI key (Scale plan+)
await fetch(
  'https://app.neureus.ai/ai/providers/openai',
  {
    method: 'PUT',
    headers: { 'Authorization': 'Bearer nr_key' },
    body: JSON.stringify({ apiKey: 'sk-your-key' }),
  }
);

// Rotate (re-encrypts under same DEK)
await fetch(
  'https://app.neureus.ai/ai/providers/openai/rotate',
  {
    method: 'POST',
    headers: { 'Authorization': 'Bearer nr_key' },
    body: JSON.stringify({ newApiKey: 'sk-new-key' }),
  }
);

What's included in the gateway

Multi-provider routing

Change providers by changing the model ID. claude-* → Anthropic, gpt-* → OpenAI, @wai/* → Workers AI (free). Same API call, automatic dispatch.

SSE streaming

Set stream: true to get OpenAI-compatible Server-Sent Events from any provider — including Anthropic and Google — with a unified event format.

Prompt preprocessor

4-pass pipeline before every request: normalize → structure → trim (>6K tokens) → compress (opt-in, Llama 3.1 8B). Cuts token count 10–30%.

Batch inference

Async batch jobs via OpenAI + Anthropic Batch APIs. 40% below realtime pricing. Webhook delivery on completion. Use for high-volume, non-urgent workloads.

BYOK encryption

Bring your own OpenAI or Anthropic keys on Scale plan+. Stored AES-GCM encrypted per tenant. Rotate keys via PUT /ai/providers/:provider/rotate without downtime.

Edge delivery

300+ Cloudflare locations. Zero cold starts (Workers run on V8 isolates, not containers). p95 latency <80ms globally.

Start routing to any provider in minutes

500 Neurons/month free. No credit card. Streaming, BYOK, and batch inference included.