Batch Inference

LLM at scale.
40% off realtime.

Submit thousands of LLM requests in one API call. OpenAI and Anthropic process them asynchronously at 50% off their realtime rate. Neureus passes most of that saving to you — 40% below realtime pricing.

Start free API reference

How it works

Submit a batch job

POST an array of requests (up to 50,000) to /ai/batch. Returns a batchId immediately.

Neureus collects + submits

Every 5 minutes, Neureus collects jobs and submits them to the provider Batch API (OpenAI or Anthropic based on the model). You don't pay while it waits.

Poll or receive webhook

GET /ai/batch/{id} to poll status. Or set a webhookUrl and Neureus POSTs results when the job completes.

Quick start

Submit a batch job

curl -X POST https://app.neureus.ai/ai/batch \
  -H "Authorization: Bearer $NEUREUS_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "webhookUrl": "https://your-app.com/webhooks/batch",
    "requests": [
      {
        "customId": "doc-001",
        "messages": [
          { "role": "system", "content": "Classify the sentiment of this text as positive, neutral, or negative. Reply with one word only." },
          { "role": "user", "content": "The product arrived on time and works great!" }
        ]
      },
      {
        "customId": "doc-002",
        "messages": [
          { "role": "system", "content": "Classify the sentiment of this text as positive, neutral, or negative. Reply with one word only." },
          { "role": "user", "content": "The packaging was damaged and the item stopped working after a day." }
        ]
      }
      // ... up to 50,000 requests
    ]
  }'

// Response:
// { "batchId": "batch_abc123", "pollUrl": "/ai/batch/batch_abc123", "status": "collecting" }

TypeScript SDK (recommended)

import { NeureuAI } from '@neureus/sdk';
const client = new NeureuAI({ apiKey: process.env.NEUREUS_KEY });

// Load your documents
const documents = await loadDocuments(); // your data

// Submit batch job
const { batchId } = await client.ai.batchCreate({
  model: 'claude-haiku-4-5',
  webhookUrl: 'https://your-app.com/webhooks/batch',
  requests: documents.map(doc => ({
    customId: doc.id,
    messages: [
      { role: 'system', content: 'Extract the key topics from this document as a JSON array.' },
      { role: 'user', content: doc.content },
    ],
  })),
});

console.log('Submitted:', batchId);

// Poll for completion (or handle via webhook)
const result = await client.ai.batchPoll(batchId);
// result.status: 'collecting' | 'submitted' | 'processing' | 'completed' | 'failed'
// result.results[]: { customId, response: { text, inputTokens, outputTokens } }

Batch pricing

Neureus batch pricing is 40% below realtime. Provider discount is 50%; we pass most of it through.

Model	Realtime (input/1M)	Batch (input/1M)	Savings
GPT-4o	$4.50	$3.00	−33%
GPT-4o Mini	$0.135	$0.09	−33%
Claude Sonnet 4.6	$2.70	$1.80	−33%
Claude Haiku 4.5	$0.225	$0.15	−33%

Output tokens are billed at the same discount. Free tier: 500 Neurons/month (~50k tokens) applies to batch jobs too.

What teams use batch inference for

📄

Document processing

Classify, summarize, or extract from thousands of PDFs overnight. Submit at midnight, have results by morning.

🏷️

Content labeling

Sentiment analysis, category tagging, spam detection at scale. 40% off means more budget for the hard cases.

🌐

Translation pipelines

Translate product catalogs, legal documents, or customer reviews. Batch handles the volume; you handle the review.

📊

Data enrichment

Enrich CRM records, product listings, or user profiles with AI-generated fields. Fire and forget.

🧪

Evaluation datasets

Generate gold-standard eval sets, synthetic training data, or LLM judge scores at bulk pricing.

📧

Personalized outreach

Generate personalized email subjects, product descriptions, or recommendations for millions of users.

LLM at scale.40% off realtime.

How it works

Submit a batch job

Neureus collects + submits

Poll or receive webhook

Quick start

Batch pricing

What teams use batch inference for

Document processing

Content labeling

Translation pipelines

Data enrichment

Evaluation datasets

Personalized outreach

Process at scale. Pay 40% less.

LLM at scale.
40% off realtime.