A production-quality customer support bot needs four things: a knowledge base it can search, a fast LLM to generate responses, streaming so users don’t wait, and a way to escalate when it doesn’t know the answer. Here’s all four in under 100 lines.
What we’re building
- Ingest your documentation into a vector index (one-time setup)
- Accept user questions via API
- Retrieve relevant chunks from the docs
- Generate a grounded, streamed response
- Escalate to a human when confidence is low
Setup
npm install @neureus/sdk
NEUREUS_API_KEY=nr_your_key_here
Step 1: Ingest your documentation (run once)
// scripts/ingest-docs.ts
import { NeureuAI } from '@neureus/sdk';
const client = new NeureuAI({ apiKey: process.env.NEUREUS_API_KEY! });
const docs = [
'https://yourproduct.com/docs/getting-started',
'https://yourproduct.com/docs/billing',
'https://yourproduct.com/docs/integrations',
'https://yourproduct.com/docs/troubleshooting',
];
for (const url of docs) {
const { documentId, chunks } = await client.rag.ingest({ url, chunkSize: 400, overlap: 50 });
console.log(`Ingested ${url}: ${chunks} chunks (id: ${documentId})`);
}
Run once: npx ts-node scripts/ingest-docs.ts. Re-run when docs change.
Step 2: The support bot endpoint (~70 lines)
// api/support.ts
import { NeureuAI } from '@neureus/sdk';
const client = new NeureuAI({ apiKey: process.env.NEUREUS_API_KEY! });
const SYSTEM_PROMPT = `You are a helpful customer support assistant for YourProduct.
Rules:
1. Answer only from the provided documentation context. Do not make up features or policies.
2. If you cannot find the answer in the context, say "I don't have specific information about that" and suggest contacting support.
3. Be concise. Users are frustrated when they have to read long answers.
4. Include a direct answer in the first sentence.
If the user's question falls into any of these categories, say "let me connect you with a human agent":
- Billing disputes or refund requests
- Account security issues
- Complaints about specific team members
- Legal questions`;
interface SupportRequest {
question: string;
conversationHistory?: Array<{ role: 'user' | 'assistant'; content: string }>;
stream?: boolean;
}
export async function handleSupportRequest(req: SupportRequest) {
const { question, conversationHistory = [], stream = true } = req;
// 1. Retrieve relevant documentation
const { results } = await client.rag.query({ query: question, topK: 5 });
const context = results.length > 0
? results.map(r => `[Source: ${r.metadata?.url ?? 'docs'}]\n${r.content}`).join('\n\n---\n\n')
: 'No specific documentation found for this question.';
// 2. Build the message array
const messages = [
...conversationHistory.slice(-6), // keep last 3 turns
{ role: 'user' as const, content: question },
];
const systemWithContext = `${SYSTEM_PROMPT}\n\n## Relevant documentation\n\n${context}`;
// 3. Check if escalation is needed before generating
const needsHuman = /refund|billing dispute|charge|fraud|security breach|legal|lawsuit/i.test(question);
if (needsHuman) {
return {
text: "I'd like to connect you with a human agent for this. Please hold on — someone will be with you shortly.",
escalated: true,
sources: [],
};
}
// 4. Generate response (streaming or not)
if (stream) {
const responseStream = await client.ai.stream({
model: 'claude-haiku-4-5', // fast + cheap for support
system: systemWithContext,
messages,
});
return { stream: responseStream, sources: results.map(r => r.metadata?.url).filter(Boolean) };
}
const { text } = await client.ai.chat({
model: 'claude-haiku-4-5',
system: systemWithContext,
messages,
});
// 5. Detect low-confidence answers
const lowConfidence = text.includes("I don't have specific information") ||
text.includes("I'm not sure") ||
results.length === 0;
return {
text,
escalated: false,
lowConfidence,
sources: results.map(r => r.metadata?.url).filter(Boolean),
};
}
Step 3: Wire it up as an API route
Next.js App Router:
// app/api/support/route.ts
import { NextRequest } from 'next/server';
import { handleSupportRequest } from '@/api/support';
export const runtime = 'edge';
export async function POST(req: NextRequest) {
const body = await req.json();
const result = await handleSupportRequest({ ...body, stream: true });
if ('stream' in result) {
return new Response(result.stream, {
headers: { 'Content-Type': 'text/event-stream', 'Cache-Control': 'no-cache' },
});
}
return Response.json(result);
}
Express / Hono:
app.post('/api/support', async (c) => {
const body = await c.req.json();
const result = await handleSupportRequest({ ...body, stream: false });
return c.json(result);
});
Step 4: A minimal frontend widget
<!-- Embeddable support widget — 30 lines -->
<div id="support-widget">
<div id="messages"></div>
<form id="support-form">
<input id="question" placeholder="How can I help you?" />
<button type="submit">Send</button>
</form>
</div>
<script>
const form = document.getElementById('support-form');
const messages = document.getElementById('messages');
const history = [];
form.addEventListener('submit', async (e) => {
e.preventDefault();
const question = document.getElementById('question').value;
document.getElementById('question').value = '';
addMessage('user', question);
const assistantEl = addMessage('assistant', '');
const res = await fetch('/api/support', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ question, conversationHistory: history }),
});
const reader = res.body.getReader();
const decoder = new TextDecoder();
let fullText = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
for (const line of decoder.decode(value).split('\n')) {
if (line.startsWith('data: ')) {
try { const d = JSON.parse(line.slice(6)); if (d.text) { fullText += d.text; assistantEl.textContent = fullText; } } catch {}
}
}
}
history.push({ role: 'user', content: question }, { role: 'assistant', content: fullText });
});
function addMessage(role, text) {
const el = document.createElement('div');
el.className = `message ${role}`;
el.textContent = text;
messages.appendChild(el);
return el;
}
</script>
What you get
- RAG-grounded answers: every response is based on your actual documentation, not hallucinated
- Streaming: users see the response appear word by word
- Conversation history: the last 3 turns are included for context
- Automatic escalation: billing/legal/security queries route to human agents
- Source attribution:
sources[]in the response shows which docs were used - Low-confidence detection: the bot flags when it’s uncertain so you can trigger escalation
Extending it
Add handoff to Intercom/Zendesk:
if (result.escalated || result.lowConfidence) {
await createZendeskTicket({ question, conversation: history });
}
Use a smarter model for complex questions:
const model = isComplexTechnicalQuestion(question)
? 'claude-sonnet-4-6'
: 'claude-haiku-4-5';
Add feedback tracking:
// After the user rates the response:
await client.analytics.track({ event: 'support_rating', rating, question, wasEscalated: result.escalated });
Total line count
ingest-docs.ts: 12 linessupport.ts: 65 lines- API route: 15 lines
- HTML widget: 40 lines
- Total: ~132 lines (close enough for a blog post)
The key insight: the hard parts (embedding, retrieval, streaming, generation) are one API. The code you write is the product logic — escalation rules, conversation management, UI.
Try it yourself — free tier includes 50 document ingestions and 500 Neurons/month. The support bot above runs on 2–5 Neurons per question depending on context size.