Cut Your OpenAI Bill 40% with Batch Inference

If you’re running high-volume AI workloads — product descriptions, document classification, embedding generation, data enrichment — you’re probably paying full realtime rates when you don’t need to.

OpenAI’s Batch API processes requests asynchronously at 50% of realtime pricing. Neureus routes through that API and prices at 60% of OpenAI’s public rate (40% below the usual price, or 20% below OpenAI’s already-discounted batch rate). For the right workloads, that’s real money.

What qualifies for batch inference?

Batch inference is right for workloads that are:

Non-interactive: No user is waiting for the response in real-time
High volume: Hundreds to thousands of requests
Tolerant of latency: Results can arrive within minutes to hours

Classic examples:

Product catalog enrichment (generate descriptions for 10,000 SKUs)
Document classification (label 50,000 support tickets by category)
Sentiment analysis on historical records
Nightly data enrichment pipelines
Embedding generation for large document sets
Summarization of long articles for a content pipeline

Batch vs. realtime: the math

For GPT-4o processing 1 million tokens:

Mode	Rate	Cost
OpenAI realtime	$5.00/1M	$5.00
OpenAI Batch API	$2.50/1M	$2.50
Neureus realtime	$4.50/1M	$4.50
Neureus batch	$3.00/1M	$3.00

For Claude Sonnet 4.6 processing 1 million tokens:

Mode	Rate	Cost
Anthropic realtime	$3.00/1M	$3.00
Anthropic Batch API	$1.50/1M	$1.50
Neureus realtime	$2.70/1M	$2.70
Neureus batch	$1.80/1M	$1.80

At 100M tokens/month on GPT-4o, switching from Neureus realtime to Neureus batch saves $150/month. At 1B tokens, that’s $1,500/month.

How Neureus batch inference works

The flow:

POST /ai/batch — submit your batch job with requests
Neureus queues it, waits for collection window (default: 5 minutes or 1,000 requests), then submits to OpenAI/Anthropic Batch API
GET /ai/batch/:id — poll for status (collecting, submitted, processing, completed, failed)
On completion: results delivered to your webhook URL, or poll the endpoint directly

The cron schedule: Neureus checks for ready-to-submit jobs every 5 minutes, polls provider status every 10 minutes.

Implementation

Submit a batch job

import { NeureuClient } from '@neureus/sdk';

const client = new NeureuClient({ apiKey: process.env.NEUREUS_API_KEY! });

// Option 1: SDK
const job = await client.batch.create({
  model: 'gpt-4o-mini',  // GPT-4o-mini at $0.075/1M realtime → ~$0.045/1M batch
  requests: products.map((product, i) => ({
    customId: `product-${product.id}`,
    messages: [
      {
        role: 'system',
        content: 'Write a compelling 2-sentence product description. Focus on key features and benefits.',
      },
      {
        role: 'user',
        content: `Product: ${product.name}\nFeatures: ${product.features.join(', ')}`,
      },
    ],
  })),
  webhookUrl: 'https://your-app.com/webhooks/batch-complete',
});

console.log(`Batch job submitted: ${job.batchId}`);
console.log(`Poll at: ${job.pollUrl}`);

Option 2: Direct API (cURL)

curl -X POST https://app.neureus.ai/ai/batch \
  -H "Authorization: Bearer nr_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-haiku-4-5",
    "requests": [
      {
        "customId": "doc-001",
        "messages": [
          { "role": "user", "content": "Classify this support ticket: My order never arrived..." }
        ]
      }
    ],
    "webhookUrl": "https://your-app.com/webhook"
  }'

Poll for status

// Poll until complete
async function waitForBatch(batchId: string): Promise<BatchResult[]> {
  while (true) {
    const status = await client.batch.get(batchId);

    if (status.state === 'completed') {
      return status.results;
    }

    if (status.state === 'failed') {
      throw new Error(`Batch failed: ${status.error}`);
    }

    // Poll every 2 minutes — batch jobs take minutes, not seconds
    await new Promise(resolve => setTimeout(resolve, 120_000));
  }
}

Webhook handler (Next.js API route)

// app/api/webhooks/batch-complete/route.ts
import { NextRequest } from 'next/server';

export async function POST(req: NextRequest) {
  const { batchId, results, failCount } = await req.json();

  for (const result of results) {
    const { customId, text, inputTokens, outputTokens } = result;

    // customId is whatever you passed in the request (e.g., "product-123")
    const productId = customId.replace('product-', '');

    await db.products.update({
      where: { id: productId },
      data: { description: text },
    });
  }

  if (failCount > 0) {
    console.warn(`Batch ${batchId}: ${failCount} requests failed`);
  }

  return Response.json({ ok: true });
}

Force-submit a batch

By default, Neureus collects requests for 5 minutes before submitting to the provider API. For testing or urgent batches, force-submit immediately:

await client.batch.flush(batchId);

Or via API:

curl -X POST https://app.neureus.ai/ai/batch/${BATCH_ID}/flush \
  -H "Authorization: Bearer nr_your_key"

What to watch for

Failure handling: Neureus polls for provider status every 10 minutes. After 10 consecutive failed polls, the job is marked failed. Check failCount in the batch status response.

Result ordering: Batch results may arrive out of order relative to your input requests. Use customId to match results to inputs — don’t rely on array position.

Rate limits still apply: Batch API has its own rate limits at the provider level. For very large batches (>100K requests), split across multiple jobs.

Model availability: Batch inference is available for OpenAI (GPT models) and Anthropic (Claude models). Workers AI models don’t support async batch — use them for realtime at $0/call instead.

The 10-minute migration

If you have an existing pipeline calling OpenAI directly, the migration is:

Change your endpoint from https://api.openai.com/v1/chat/completions to https://app.neureus.ai/ai/batch
Wrap your messages array in a requests array with customId per item
Add a webhookUrl or switch your polling to the Neureus batch status endpoint
Remove your OpenAI API key from the request — use your Neureus API key in the Authorization header

That’s it. Your provider key, model selection, and message format stay the same.

The free tier doesn’t include batch inference — it’s available on Builder ($29/mo) and above. Start at app.neureus.ai/onboard.