Submit thousands of LLM requests in one API call. OpenAI and Anthropic process them asynchronously at 50% off their realtime rate. Neureus passes most of that saving to you — 40% below realtime pricing.
POST an array of requests (up to 50,000) to /ai/batch. Returns a batchId immediately.
Every 5 minutes, Neureus collects jobs and submits them to the provider Batch API (OpenAI or Anthropic based on the model). You don't pay while it waits.
GET /ai/batch/{id} to poll status. Or set a webhookUrl and Neureus POSTs results when the job completes.
curl -X POST https://app.neureus.ai/ai/batch \
-H "Authorization: Bearer $NEUREUS_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"webhookUrl": "https://your-app.com/webhooks/batch",
"requests": [
{
"customId": "doc-001",
"messages": [
{ "role": "system", "content": "Classify the sentiment of this text as positive, neutral, or negative. Reply with one word only." },
{ "role": "user", "content": "The product arrived on time and works great!" }
]
},
{
"customId": "doc-002",
"messages": [
{ "role": "system", "content": "Classify the sentiment of this text as positive, neutral, or negative. Reply with one word only." },
{ "role": "user", "content": "The packaging was damaged and the item stopped working after a day." }
]
}
// ... up to 50,000 requests
]
}'
// Response:
// { "batchId": "batch_abc123", "pollUrl": "/ai/batch/batch_abc123", "status": "collecting" } import { NeureuAI } from '@neureus/sdk';
const client = new NeureuAI({ apiKey: process.env.NEUREUS_KEY });
// Load your documents
const documents = await loadDocuments(); // your data
// Submit batch job
const { batchId } = await client.ai.batchCreate({
model: 'claude-haiku-4-5',
webhookUrl: 'https://your-app.com/webhooks/batch',
requests: documents.map(doc => ({
customId: doc.id,
messages: [
{ role: 'system', content: 'Extract the key topics from this document as a JSON array.' },
{ role: 'user', content: doc.content },
],
})),
});
console.log('Submitted:', batchId);
// Poll for completion (or handle via webhook)
const result = await client.ai.batchPoll(batchId);
// result.status: 'collecting' | 'submitted' | 'processing' | 'completed' | 'failed'
// result.results[]: { customId, response: { text, inputTokens, outputTokens } } Neureus batch pricing is 40% below realtime. Provider discount is 50%; we pass most of it through.
| Model | Realtime (input/1M) | Batch (input/1M) | Savings |
|---|---|---|---|
| GPT-4o | $4.50 | $3.00 | −33% |
| GPT-4o Mini | $0.135 | $0.09 | −33% |
| Claude Sonnet 4.6 | $2.70 | $1.80 | −33% |
| Claude Haiku 4.5 | $0.225 | $0.15 | −33% |
Output tokens are billed at the same discount. Free tier: 500 Neurons/month (~50k tokens) applies to batch jobs too.
Classify, summarize, or extract from thousands of PDFs overnight. Submit at midnight, have results by morning.
Sentiment analysis, category tagging, spam detection at scale. 40% off means more budget for the hard cases.
Translate product catalogs, legal documents, or customer reviews. Batch handles the volume; you handle the review.
Enrich CRM records, product listings, or user profiles with AI-generated fields. Fire and forget.
Generate gold-standard eval sets, synthetic training data, or LLM judge scores at bulk pricing.
Generate personalized email subjects, product descriptions, or recommendations for millions of users.
Batch inference is included on all Neureus plans. Start with the free tier.