LLM Routing

Switch Providers by
Changing a String

Same API call. Same response format. Same streaming interface. Change the model field to route to any of 10 providers — Claude, GPT, Gemini, Llama, DeepSeek, and more. Always 10% below OpenRouter prices.

How routing works

The detectProvider(model) function routes by model ID prefix — no configuration needed.

Model ID
claude-sonnet-4-6 gpt-4o @wai/llama-3.3-70b gemini-2.5-flash deepseek-r1
detectProvider()
claude-* → Anthropic
gpt-*/o1/o3/o4 → OpenAI
@wai/* / @cf/* → Workers AI (free)
gemini-* → Google
meta-llm → default model
Response
OpenAI-compatible format
SSE streaming (all providers)
Consistent error handling
Switch providers — change one field
const models = {
  free:      '@wai/llama-3.3-70b',  // $0 — Workers AI
  efficient: 'claude-haiku-4-5',    // $0.27/1M tokens
  balanced:  'claude-sonnet-4-6',   // $2.70/1M tokens
  reasoning: 'deepseek-r1',         // $0.135/1M tokens
  premium:   'gpt-4o',              // $4.50/1M tokens
  vision:    'gemini-2.5-pro',      // $10/1M tokens
};

const response = await fetch(
  'https://app.neureus.ai/ai/chat',
  {
    method: 'POST',
    headers: { 'Authorization': `Bearer ${API_KEY}` },
    body: JSON.stringify({
      model: models.balanced,  // ← change this
      messages: [{ role: 'user', content: prompt }],
      stream: false,
    }),
  }
);
const { text, inputTokens, outputTokens, costUsd } = await response.json();
SSE streaming — all providers
const stream = await fetch(
  'https://app.neureus.ai/ai/chat',
  {
    method: 'POST',
    headers: { 'Authorization': `Bearer ${API_KEY}` },
    body: JSON.stringify({
      model: 'claude-sonnet-4-6',  // or any model
      messages: [{ role: 'user', content: prompt }],
      stream: true,
    }),
  }
);

const reader = stream.body!.getReader();
const decoder = new TextDecoder();
while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  // Each chunk: data: { "delta": "...", "done": false }
  process.stdout.write(decoder.decode(value));
}
BYOK — bring your own keys (Scale plan+)
// Store your OpenAI key (AES-GCM encrypted per tenant)
await fetch(
  'https://app.neureus.ai/ai/providers/openai',
  {
    method: 'PUT',
    headers: { 'Authorization': `Bearer ${API_KEY}` },
    body: JSON.stringify({ apiKey: 'sk-your-key' }),
  }
);

// List configured providers
const providers = await fetch(
  'https://app.neureus.ai/ai/providers',
  { headers: { 'Authorization': `Bearer ${API_KEY}` } }
).then(r => r.json());
// → [{ provider: 'openai', hasKey: true, rotatedAt: '...' }]
TypeScript SDK
import { NeureuClient } from '@neureus/sdk';

const client = new NeureuClient({
  apiKey: process.env.NEUREUS_API_KEY!,
});

// Non-streaming
const result = await client.ai.chat({
  model: 'deepseek-r1',
  messages: [{ role: 'user', content: 'Solve: 2x + 5 = 13' }],
});
console.log(result.text, result.costUsd);

// Streaming
const stream = client.ai.stream({
  model: 'claude-haiku-4-5',
  messages: [{ role: 'user', content: 'Write a haiku' }],
});
for await (const chunk of stream) {
  process.stdout.write(chunk);
}

Provider reference

All token prices are 10% below OpenRouter list. Workers AI models are always free.

Cloudflare Workers AI
prefix: @wai/
Free
Cost-sensitive, edge latency, no quota
@wai/llama-3.3-70b@wai/deepseek-r1-32b@wai/qwen-coder-32b@wai/nemotron-120b
Anthropic
prefix: claude-
$0.27–$15/1M
Instruction following, long context, safety
claude-opus-4-8claude-sonnet-4-6claude-haiku-4-5
OpenAI
prefix: gpt-/o*
$0.15–$15/1M
General purpose, function calling, vision
gpt-4ogpt-4o-minio3o4-mini
Google
prefix: gemini-
$0.075–$10/1M
Long context (2M tokens), multimodal
gemini-2.5-progemini-2.5-flashgemini-2.0-flash
DeepSeek
prefix: deepseek-
$0.135–$0.55/1M
Reasoning (R1), code (V3), cost-efficient
deepseek-r1deepseek-v3
Meta Llama
prefix: llama-
Free (Workers AI)
Open weights, strong 70B benchmark
llama-3.3-70b (via @wai/)llama-3.1-8b (via @wai/)

The prompt preprocessor

4 passes run before every /ai/chat request. Cuts token count 10–30% — your provider bills for fewer tokens on every call.

Pass 1
Normalize

Trim whitespace, normalize CRLF→LF, collapse repeated blank lines, remove zero-width characters.

Pass 2
Structure

Move system messages to index 0 — providers require system messages first. Fixes ordering issues silently.

Pass 3
Trim

When token estimate exceeds 6K: keep system message + last 4 non-system turns. Replace middle with a compact summary message.

Pass 4
Compress (opt-in)

LLM compression via Llama 3.1 8B (200ms timeout, falls back to original on timeout). Send x-neureus-options: compress=true to activate.

Debug mode: Add x-neureus-debug: true header to see preprocessing stats in the response body (_preprocessing.tokensSaved, _preprocessing.passesRun).

Route to any provider today

500 Neurons/month free. No credit card. OpenAI-compatible response format from every provider.