If you’re running high-volume AI workloads — product descriptions, document classification, embedding generation, data enrichment — you’re probably paying full realtime rates when you don’t need to.
OpenAI’s Batch API processes requests asynchronously at 50% of realtime pricing. Neureus routes through that API and prices at 60% of OpenAI’s public rate (40% below the usual price, or 20% below OpenAI’s already-discounted batch rate). For the right workloads, that’s real money.
What qualifies for batch inference?
Batch inference is right for workloads that are:
- Non-interactive: No user is waiting for the response in real-time
- High volume: Hundreds to thousands of requests
- Tolerant of latency: Results can arrive within minutes to hours
Classic examples:
- Product catalog enrichment (generate descriptions for 10,000 SKUs)
- Document classification (label 50,000 support tickets by category)
- Sentiment analysis on historical records
- Nightly data enrichment pipelines
- Embedding generation for large document sets
- Summarization of long articles for a content pipeline
Batch vs. realtime: the math
For GPT-4o processing 1 million tokens:
| Mode | Rate | Cost |
|---|---|---|
| OpenAI realtime | $5.00/1M | $5.00 |
| OpenAI Batch API | $2.50/1M | $2.50 |
| Neureus realtime | $4.50/1M | $4.50 |
| Neureus batch | $3.00/1M | $3.00 |
For Claude Sonnet 4.6 processing 1 million tokens:
| Mode | Rate | Cost |
|---|---|---|
| Anthropic realtime | $3.00/1M | $3.00 |
| Anthropic Batch API | $1.50/1M | $1.50 |
| Neureus realtime | $2.70/1M | $2.70 |
| Neureus batch | $1.80/1M | $1.80 |
At 100M tokens/month on GPT-4o, switching from Neureus realtime to Neureus batch saves $150/month. At 1B tokens, that’s $1,500/month.
How Neureus batch inference works
The flow:
POST /ai/batch— submit your batch job with requests- Neureus queues it, waits for collection window (default: 5 minutes or 1,000 requests), then submits to OpenAI/Anthropic Batch API
GET /ai/batch/:id— poll for status (collecting,submitted,processing,completed,failed)- On completion: results delivered to your webhook URL, or poll the endpoint directly
The cron schedule: Neureus checks for ready-to-submit jobs every 5 minutes, polls provider status every 10 minutes.
Implementation
Submit a batch job
import { NeureuClient } from '@neureus/sdk';
const client = new NeureuClient({ apiKey: process.env.NEUREUS_API_KEY! });
// Option 1: SDK
const job = await client.batch.create({
model: 'gpt-4o-mini', // GPT-4o-mini at $0.075/1M realtime → ~$0.045/1M batch
requests: products.map((product, i) => ({
customId: `product-${product.id}`,
messages: [
{
role: 'system',
content: 'Write a compelling 2-sentence product description. Focus on key features and benefits.',
},
{
role: 'user',
content: `Product: ${product.name}\nFeatures: ${product.features.join(', ')}`,
},
],
})),
webhookUrl: 'https://your-app.com/webhooks/batch-complete',
});
console.log(`Batch job submitted: ${job.batchId}`);
console.log(`Poll at: ${job.pollUrl}`);
Option 2: Direct API (cURL)
curl -X POST https://app.neureus.ai/ai/batch \
-H "Authorization: Bearer nr_your_key" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-haiku-4-5",
"requests": [
{
"customId": "doc-001",
"messages": [
{ "role": "user", "content": "Classify this support ticket: My order never arrived..." }
]
}
],
"webhookUrl": "https://your-app.com/webhook"
}'
Poll for status
// Poll until complete
async function waitForBatch(batchId: string): Promise<BatchResult[]> {
while (true) {
const status = await client.batch.get(batchId);
if (status.state === 'completed') {
return status.results;
}
if (status.state === 'failed') {
throw new Error(`Batch failed: ${status.error}`);
}
// Poll every 2 minutes — batch jobs take minutes, not seconds
await new Promise(resolve => setTimeout(resolve, 120_000));
}
}
Webhook handler (Next.js API route)
// app/api/webhooks/batch-complete/route.ts
import { NextRequest } from 'next/server';
export async function POST(req: NextRequest) {
const { batchId, results, failCount } = await req.json();
for (const result of results) {
const { customId, text, inputTokens, outputTokens } = result;
// customId is whatever you passed in the request (e.g., "product-123")
const productId = customId.replace('product-', '');
await db.products.update({
where: { id: productId },
data: { description: text },
});
}
if (failCount > 0) {
console.warn(`Batch ${batchId}: ${failCount} requests failed`);
}
return Response.json({ ok: true });
}
Force-submit a batch
By default, Neureus collects requests for 5 minutes before submitting to the provider API. For testing or urgent batches, force-submit immediately:
await client.batch.flush(batchId);
Or via API:
curl -X POST https://app.neureus.ai/ai/batch/${BATCH_ID}/flush \
-H "Authorization: Bearer nr_your_key"
What to watch for
Failure handling: Neureus polls for provider status every 10 minutes. After 10 consecutive failed polls, the job is marked failed. Check failCount in the batch status response.
Result ordering: Batch results may arrive out of order relative to your input requests. Use customId to match results to inputs — don’t rely on array position.
Rate limits still apply: Batch API has its own rate limits at the provider level. For very large batches (>100K requests), split across multiple jobs.
Model availability: Batch inference is available for OpenAI (GPT models) and Anthropic (Claude models). Workers AI models don’t support async batch — use them for realtime at $0/call instead.
The 10-minute migration
If you have an existing pipeline calling OpenAI directly, the migration is:
- Change your endpoint from
https://api.openai.com/v1/chat/completionstohttps://app.neureus.ai/ai/batch - Wrap your messages array in a
requestsarray withcustomIdper item - Add a
webhookUrlor switch your polling to the Neureus batch status endpoint - Remove your OpenAI API key from the request — use your Neureus API key in the Authorization header
That’s it. Your provider key, model selection, and message format stay the same.
The free tier doesn’t include batch inference — it’s available on Builder ($29/mo) and above. Start at app.neureus.ai/onboard.