Most tutorials on building multi-model AI apps start with LangChain. That made sense in 2023, when there was no other way to talk to multiple LLM providers from a single interface. In 2026, you have a better option: a managed API that handles the routing, retries, and response normalization for you.
This post shows you how to build a multi-model AI app — routing across Anthropic, OpenAI, and free Workers AI models — without installing LangChain, provisioning a vector DB, or writing a single line of orchestration code.
Why not LangChain?
LangChain is a Python/TypeScript library that you run in your own environment. To use it in production, you need:
- A server to run it on ($50–200/mo depending on load)
- A vector DB if you want RAG (Pinecone starts at $70/mo)
- LangSmith for observability ($39/mo per seat)
- Ongoing maintenance as LangChain releases breaking changes
That’s $160–450/mo before you’ve written a feature. And it’s infrastructure you maintain.
The alternative: call a REST API that already implements everything.
The architecture
Here’s what we’re building:
- A function that routes messages to the cheapest model that can handle the task
- Fallback logic: start free (Workers AI), escalate to paid (Claude) if quality is insufficient
- Streaming responses for real-time UI updates
The entire thing is ~50 lines of TypeScript.
Step 1: Get an API key
Sign up at app.neureus.ai/onboard. The free tier includes 500 Neurons/month and access to Workers AI models at no cost.
Step 2: Install the SDK
npm install @neureus/sdk
Step 3: Route by task type
The key insight: different tasks need different models. Free edge models handle simple classification and summarization well. Premium models handle complex reasoning and long-form writing.
import { NeureuClient } from '@neureus/sdk';
const client = new NeureuClient({
apiKey: process.env.NEUREUS_API_KEY!,
});
type TaskType = 'simple' | 'complex' | 'reasoning';
function selectModel(task: TaskType): string {
switch (task) {
case 'simple':
// Free — Llama 3.3 70B via Cloudflare Workers AI
return '@wai/llama-3.3-70b';
case 'complex':
// $2.70/1M tokens — 10% below OpenRouter
return 'claude-sonnet-4-6';
case 'reasoning':
// $0.135/1M tokens — DeepSeek R1 for math/code
return 'deepseek-r1';
}
}
async function chat(prompt: string, task: TaskType = 'simple') {
const result = await client.ai.chat({
model: selectModel(task),
messages: [{ role: 'user', content: prompt }],
});
return {
text: result.text,
model: result.model,
costUsd: result.costUsd,
tokens: result.inputTokens + result.outputTokens,
};
}
Step 4: Add streaming for real-time responses
For chat UIs, streaming is essential. Every model returns the same SSE format:
async function streamChat(
prompt: string,
task: TaskType = 'simple',
onChunk: (text: string) => void
) {
const model = selectModel(task);
const response = await fetch('https://app.neureus.ai/ai/chat', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.NEUREUS_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model,
messages: [{ role: 'user', content: prompt }],
stream: true,
}),
});
const reader = response.body!.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n').filter(l => l.startsWith('data: '));
for (const line of lines) {
const data = line.replace('data: ', '');
if (data === '[DONE]') continue;
try {
const parsed = JSON.parse(data);
const text = parsed.choices?.[0]?.delta?.content ?? '';
if (text) onChunk(text);
} catch {}
}
}
}
Step 5: Cost-aware fallback
Route cheap first, escalate only when needed:
async function intelligentChat(prompt: string, budget: 'low' | 'high' = 'low') {
if (budget === 'low') {
// Try free model first
const result = await chat(prompt, 'simple');
// If response is too short or seems incomplete, escalate
if (result.text.length < 50 || result.text.endsWith('...')) {
return chat(prompt, 'complex');
}
return result;
}
return chat(prompt, 'complex');
}
Step 6: Add RAG in 3 more lines
The same API handles document ingestion and semantic search. No separate vector DB setup:
// Ingest once
await client.rag.ingest({ url: 'https://your-docs.com/guide' });
// Ask questions against your documents
async function ragChat(question: string) {
const result = await client.rag.query({
query: question,
model: '@wai/llama-3.3-70b', // free
});
return result.answer; // includes source attribution
}
What you didn’t need
Compared to a LangChain setup, you skipped:
Pinecone or Chroma for vector storageOpenAI Embeddings for chunkingLangSmith for observabilityRedis for session memoryServer deployment and maintenanceLangChain version management
Total saved: $150–450/mo in infrastructure, plus engineering time.
Full example with model switching UI
Here’s a complete example with a model selector:
import { NeureuClient } from '@neureus/sdk';
const client = new NeureuClient({ apiKey: process.env.NEUREUS_API_KEY! });
const MODELS = [
{ id: '@wai/llama-3.3-70b', label: 'Llama 3.3 70B', cost: 'Free' },
{ id: 'claude-haiku-4-5', label: 'Claude Haiku', cost: '$0.27/1M' },
{ id: 'claude-sonnet-4-6', label: 'Claude Sonnet', cost: '$2.70/1M' },
{ id: 'gpt-4o', label: 'GPT-4o', cost: '$4.50/1M' },
{ id: 'deepseek-r1', label: 'DeepSeek R1', cost: '$0.135/1M' },
] as const;
async function multiModelChat(
messages: Array<{ role: 'user' | 'assistant'; content: string }>,
modelId: string
) {
return client.ai.chat({ model: modelId, messages });
}
// Usage
const result = await multiModelChat(
[{ role: 'user', content: 'Explain quantum entanglement in one paragraph' }],
'deepseek-r1'
);
console.log(`${result.model}: ${result.text}`);
console.log(`Cost: $${result.costUsd?.toFixed(6)}`);
Summary
Multi-model AI routing doesn’t require a framework. It requires:
- An API key
- A model selection function
- One
fetchcall (or SDK method)
The same endpoint, response format, and streaming interface work across every provider. Switch models by changing a string. Add RAG with two more API calls. No infrastructure to maintain.
Start free at app.neureus.ai/onboard — 500 Neurons/month, Workers AI models always free.