Batch Inference

LLM at scale.
40% off realtime.

Submit thousands of LLM requests in one API call. OpenAI and Anthropic process them asynchronously at 50% off their realtime rate. Neureus passes most of that saving to you — 40% below realtime pricing.

$5.00/1M tokens $3.00/1M tokens GPT-4o batch
$3.00/1M tokens $1.80/1M tokens Claude Sonnet batch
$0.20/1M tokens $0.12/1M tokens GPT-4o Mini batch

How it works

1

Submit a batch job

POST an array of requests (up to 50,000) to /ai/batch. Returns a batchId immediately.

2

Neureus collects + submits

Every 5 minutes, Neureus collects jobs and submits them to the provider Batch API (OpenAI or Anthropic based on the model). You don't pay while it waits.

3

Poll or receive webhook

GET /ai/batch/{id} to poll status. Or set a webhookUrl and Neureus POSTs results when the job completes.

Quick start

Submit a batch job
curl -X POST https://app.neureus.ai/ai/batch \
  -H "Authorization: Bearer $NEUREUS_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "webhookUrl": "https://your-app.com/webhooks/batch",
    "requests": [
      {
        "customId": "doc-001",
        "messages": [
          { "role": "system", "content": "Classify the sentiment of this text as positive, neutral, or negative. Reply with one word only." },
          { "role": "user", "content": "The product arrived on time and works great!" }
        ]
      },
      {
        "customId": "doc-002",
        "messages": [
          { "role": "system", "content": "Classify the sentiment of this text as positive, neutral, or negative. Reply with one word only." },
          { "role": "user", "content": "The packaging was damaged and the item stopped working after a day." }
        ]
      }
      // ... up to 50,000 requests
    ]
  }'

// Response:
// { "batchId": "batch_abc123", "pollUrl": "/ai/batch/batch_abc123", "status": "collecting" }
TypeScript SDK (recommended)
import { NeureuAI } from '@neureus/sdk';
const client = new NeureuAI({ apiKey: process.env.NEUREUS_KEY });

// Load your documents
const documents = await loadDocuments(); // your data

// Submit batch job
const { batchId } = await client.ai.batchCreate({
  model: 'claude-haiku-4-5',
  webhookUrl: 'https://your-app.com/webhooks/batch',
  requests: documents.map(doc => ({
    customId: doc.id,
    messages: [
      { role: 'system', content: 'Extract the key topics from this document as a JSON array.' },
      { role: 'user', content: doc.content },
    ],
  })),
});

console.log('Submitted:', batchId);

// Poll for completion (or handle via webhook)
const result = await client.ai.batchPoll(batchId);
// result.status: 'collecting' | 'submitted' | 'processing' | 'completed' | 'failed'
// result.results[]: { customId, response: { text, inputTokens, outputTokens } }

Batch pricing

Neureus batch pricing is 40% below realtime. Provider discount is 50%; we pass most of it through.

Model Realtime (input/1M) Batch (input/1M) Savings
GPT-4o $4.50 $3.00 −33%
GPT-4o Mini $0.135 $0.09 −33%
Claude Sonnet 4.6 $2.70 $1.80 −33%
Claude Haiku 4.5 $0.225 $0.15 −33%

Output tokens are billed at the same discount. Free tier: 500 Neurons/month (~50k tokens) applies to batch jobs too.

What teams use batch inference for

📄

Document processing

Classify, summarize, or extract from thousands of PDFs overnight. Submit at midnight, have results by morning.

🏷️

Content labeling

Sentiment analysis, category tagging, spam detection at scale. 40% off means more budget for the hard cases.

🌐

Translation pipelines

Translate product catalogs, legal documents, or customer reviews. Batch handles the volume; you handle the review.

📊

Data enrichment

Enrich CRM records, product listings, or user profiles with AI-generated fields. Fire and forget.

🧪

Evaluation datasets

Generate gold-standard eval sets, synthetic training data, or LLM judge scores at bulk pricing.

📧

Personalized outreach

Generate personalized email subjects, product descriptions, or recommendations for millions of users.

Process at scale. Pay 40% less.

Batch inference is included on all Neureus plans. Start with the free tier.