LLM Routing
Same API call. Same response format. Same streaming interface. Change the model field to route to any of 10 providers — Claude, GPT, Gemini, Llama, DeepSeek, and more. Always 10% below OpenRouter prices.
The detectProvider(model) function routes by model ID prefix — no configuration needed.
claude-sonnet-4-6 gpt-4o @wai/llama-3.3-70b gemini-2.5-flash deepseek-r1 claude-* → Anthropicgpt-*/o1/o3/o4 → OpenAI@wai/* / @cf/* → Workers AI (free)gemini-* → Googlemeta-llm → default modelconst models = {
free: '@wai/llama-3.3-70b', // $0 — Workers AI
efficient: 'claude-haiku-4-5', // $0.27/1M tokens
balanced: 'claude-sonnet-4-6', // $2.70/1M tokens
reasoning: 'deepseek-r1', // $0.135/1M tokens
premium: 'gpt-4o', // $4.50/1M tokens
vision: 'gemini-2.5-pro', // $10/1M tokens
};
const response = await fetch(
'https://app.neureus.ai/ai/chat',
{
method: 'POST',
headers: { 'Authorization': `Bearer ${API_KEY}` },
body: JSON.stringify({
model: models.balanced, // ← change this
messages: [{ role: 'user', content: prompt }],
stream: false,
}),
}
);
const { text, inputTokens, outputTokens, costUsd } = await response.json(); const stream = await fetch(
'https://app.neureus.ai/ai/chat',
{
method: 'POST',
headers: { 'Authorization': `Bearer ${API_KEY}` },
body: JSON.stringify({
model: 'claude-sonnet-4-6', // or any model
messages: [{ role: 'user', content: prompt }],
stream: true,
}),
}
);
const reader = stream.body!.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
// Each chunk: data: { "delta": "...", "done": false }
process.stdout.write(decoder.decode(value));
} // Store your OpenAI key (AES-GCM encrypted per tenant)
await fetch(
'https://app.neureus.ai/ai/providers/openai',
{
method: 'PUT',
headers: { 'Authorization': `Bearer ${API_KEY}` },
body: JSON.stringify({ apiKey: 'sk-your-key' }),
}
);
// List configured providers
const providers = await fetch(
'https://app.neureus.ai/ai/providers',
{ headers: { 'Authorization': `Bearer ${API_KEY}` } }
).then(r => r.json());
// → [{ provider: 'openai', hasKey: true, rotatedAt: '...' }] import { NeureuClient } from '@neureus/sdk';
const client = new NeureuClient({
apiKey: process.env.NEUREUS_API_KEY!,
});
// Non-streaming
const result = await client.ai.chat({
model: 'deepseek-r1',
messages: [{ role: 'user', content: 'Solve: 2x + 5 = 13' }],
});
console.log(result.text, result.costUsd);
// Streaming
const stream = client.ai.stream({
model: 'claude-haiku-4-5',
messages: [{ role: 'user', content: 'Write a haiku' }],
});
for await (const chunk of stream) {
process.stdout.write(chunk);
} All token prices are 10% below OpenRouter list. Workers AI models are always free.
@wai/llama-3.3-70b@wai/deepseek-r1-32b@wai/qwen-coder-32b@wai/nemotron-120b claude-opus-4-8claude-sonnet-4-6claude-haiku-4-5 gpt-4ogpt-4o-minio3o4-mini gemini-2.5-progemini-2.5-flashgemini-2.0-flash deepseek-r1deepseek-v3 llama-3.3-70b (via @wai/)llama-3.1-8b (via @wai/) 4 passes run before every /ai/chat request. Cuts token count 10–30% — your provider bills for fewer tokens on every call.
Trim whitespace, normalize CRLF→LF, collapse repeated blank lines, remove zero-width characters.
Move system messages to index 0 — providers require system messages first. Fixes ordering issues silently.
When token estimate exceeds 6K: keep system message + last 4 non-system turns. Replace middle with a compact summary message.
LLM compression via Llama 3.1 8B (200ms timeout, falls back to original on timeout). Send x-neureus-options: compress=true to activate.
x-neureus-debug: true header to see preprocessing stats in the response body (_preprocessing.tokensSaved, _preprocessing.passesRun).
500 Neurons/month free. No credit card. OpenAI-compatible response format from every provider.