How to Build a Multi-Model AI App Without LangChain

Most tutorials on building multi-model AI apps start with LangChain. That made sense in 2023, when there was no other way to talk to multiple LLM providers from a single interface. In 2026, you have a better option: a managed API that handles the routing, retries, and response normalization for you.

This post shows you how to build a multi-model AI app — routing across Anthropic, OpenAI, and free Workers AI models — without installing LangChain, provisioning a vector DB, or writing a single line of orchestration code.

Why not LangChain?

LangChain is a Python/TypeScript library that you run in your own environment. To use it in production, you need:

A server to run it on ($50–200/mo depending on load)
A vector DB if you want RAG (Pinecone starts at $70/mo)
LangSmith for observability ($39/mo per seat)
Ongoing maintenance as LangChain releases breaking changes

That’s $160–450/mo before you’ve written a feature. And it’s infrastructure you maintain.

The alternative: call a REST API that already implements everything.

The architecture

Here’s what we’re building:

A function that routes messages to the cheapest model that can handle the task
Fallback logic: start free (Workers AI), escalate to paid (Claude) if quality is insufficient
Streaming responses for real-time UI updates

The entire thing is ~50 lines of TypeScript.

Step 1: Get an API key

Step 2: Install the SDK

npm install @neureus/sdk

Step 3: Route by task type

The key insight: different tasks need different models. Free edge models handle simple classification and summarization well. Premium models handle complex reasoning and long-form writing.

import { NeureuClient } from '@neureus/sdk';

const client = new NeureuClient({
  apiKey: process.env.NEUREUS_API_KEY!,
});

type TaskType = 'simple' | 'complex' | 'reasoning';

function selectModel(task: TaskType): string {
  switch (task) {
    case 'simple':
      // Free — Llama 3.3 70B via Cloudflare Workers AI
      return '@wai/llama-3.3-70b';
    case 'complex':
      // $2.70/1M tokens — 10% below OpenRouter
      return 'claude-sonnet-4-6';
    case 'reasoning':
      // $0.135/1M tokens — DeepSeek R1 for math/code
      return 'deepseek-r1';
  }
}

async function chat(prompt: string, task: TaskType = 'simple') {
  const result = await client.ai.chat({
    model: selectModel(task),
    messages: [{ role: 'user', content: prompt }],
  });

  return {
    text: result.text,
    model: result.model,
    costUsd: result.costUsd,
    tokens: result.inputTokens + result.outputTokens,
  };
}

Step 4: Add streaming for real-time responses

For chat UIs, streaming is essential. Every model returns the same SSE format:

async function streamChat(
  prompt: string,
  task: TaskType = 'simple',
  onChunk: (text: string) => void
) {
  const model = selectModel(task);

  const response = await fetch('https://app.neureus.ai/ai/chat', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${process.env.NEUREUS_API_KEY}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      model,
      messages: [{ role: 'user', content: prompt }],
      stream: true,
    }),
  });

  const reader = response.body!.getReader();
  const decoder = new TextDecoder();

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    const chunk = decoder.decode(value);
    const lines = chunk.split('\n').filter(l => l.startsWith('data: '));

    for (const line of lines) {
      const data = line.replace('data: ', '');
      if (data === '[DONE]') continue;
      try {
        const parsed = JSON.parse(data);
        const text = parsed.choices?.[0]?.delta?.content ?? '';
        if (text) onChunk(text);
      } catch {}
    }
  }
}

Step 5: Cost-aware fallback

Route cheap first, escalate only when needed:

async function intelligentChat(prompt: string, budget: 'low' | 'high' = 'low') {
  if (budget === 'low') {
    // Try free model first
    const result = await chat(prompt, 'simple');
    // If response is too short or seems incomplete, escalate
    if (result.text.length < 50 || result.text.endsWith('...')) {
      return chat(prompt, 'complex');
    }
    return result;
  }
  return chat(prompt, 'complex');
}

Step 6: Add RAG in 3 more lines

The same API handles document ingestion and semantic search. No separate vector DB setup:

// Ingest once
await client.rag.ingest({ url: 'https://your-docs.com/guide' });

// Ask questions against your documents
async function ragChat(question: string) {
  const result = await client.rag.query({
    query: question,
    model: '@wai/llama-3.3-70b',  // free
  });
  return result.answer;  // includes source attribution
}

What you didn’t need

Compared to a LangChain setup, you skipped:

~~Pinecone or Chroma for vector storage~~
~~OpenAI Embeddings for chunking~~
~~LangSmith for observability~~
~~Redis for session memory~~
~~Server deployment and maintenance~~
~~LangChain version management~~

Total saved: $150–450/mo in infrastructure, plus engineering time.

Full example with model switching UI

Here’s a complete example with a model selector:

import { NeureuClient } from '@neureus/sdk';

const client = new NeureuClient({ apiKey: process.env.NEUREUS_API_KEY! });

const MODELS = [
  { id: '@wai/llama-3.3-70b', label: 'Llama 3.3 70B', cost: 'Free' },
  { id: 'claude-haiku-4-5', label: 'Claude Haiku', cost: '$0.27/1M' },
  { id: 'claude-sonnet-4-6', label: 'Claude Sonnet', cost: '$2.70/1M' },
  { id: 'gpt-4o', label: 'GPT-4o', cost: '$4.50/1M' },
  { id: 'deepseek-r1', label: 'DeepSeek R1', cost: '$0.135/1M' },
] as const;

async function multiModelChat(
  messages: Array<{ role: 'user' | 'assistant'; content: string }>,
  modelId: string
) {
  return client.ai.chat({ model: modelId, messages });
}

// Usage
const result = await multiModelChat(
  [{ role: 'user', content: 'Explain quantum entanglement in one paragraph' }],
  'deepseek-r1'
);
console.log(`${result.model}: ${result.text}`);
console.log(`Cost: $${result.costUsd?.toFixed(6)}`);

Summary

Multi-model AI routing doesn’t require a framework. It requires:

An API key
A model selection function
One fetch call (or SDK method)

The same endpoint, response format, and streaming interface work across every provider. Switch models by changing a string. Add RAG with two more API calls. No infrastructure to maintain.

Start free at app.neureus.ai/onboard — 500 Neurons/month, Workers AI models always free.