← Blog

Cut Your OpenAI Bill 40% with Batch Inference

OpenAI's Batch API charges 50% of realtime rates. Neureus adds another 40% off on top. Here's how to identify workloads that are safe to batch and implement it in under 30 minutes.

If you’re running high-volume AI workloads — product descriptions, document classification, embedding generation, data enrichment — you’re probably paying full realtime rates when you don’t need to.

OpenAI’s Batch API processes requests asynchronously at 50% of realtime pricing. Neureus routes through that API and prices at 60% of OpenAI’s public rate (40% below the usual price, or 20% below OpenAI’s already-discounted batch rate). For the right workloads, that’s real money.

What qualifies for batch inference?

Batch inference is right for workloads that are:

  • Non-interactive: No user is waiting for the response in real-time
  • High volume: Hundreds to thousands of requests
  • Tolerant of latency: Results can arrive within minutes to hours

Classic examples:

  • Product catalog enrichment (generate descriptions for 10,000 SKUs)
  • Document classification (label 50,000 support tickets by category)
  • Sentiment analysis on historical records
  • Nightly data enrichment pipelines
  • Embedding generation for large document sets
  • Summarization of long articles for a content pipeline

Batch vs. realtime: the math

For GPT-4o processing 1 million tokens:

ModeRateCost
OpenAI realtime$5.00/1M$5.00
OpenAI Batch API$2.50/1M$2.50
Neureus realtime$4.50/1M$4.50
Neureus batch$3.00/1M$3.00

For Claude Sonnet 4.6 processing 1 million tokens:

ModeRateCost
Anthropic realtime$3.00/1M$3.00
Anthropic Batch API$1.50/1M$1.50
Neureus realtime$2.70/1M$2.70
Neureus batch$1.80/1M$1.80

At 100M tokens/month on GPT-4o, switching from Neureus realtime to Neureus batch saves $150/month. At 1B tokens, that’s $1,500/month.

How Neureus batch inference works

The flow:

  1. POST /ai/batch — submit your batch job with requests
  2. Neureus queues it, waits for collection window (default: 5 minutes or 1,000 requests), then submits to OpenAI/Anthropic Batch API
  3. GET /ai/batch/:id — poll for status (collecting, submitted, processing, completed, failed)
  4. On completion: results delivered to your webhook URL, or poll the endpoint directly

The cron schedule: Neureus checks for ready-to-submit jobs every 5 minutes, polls provider status every 10 minutes.

Implementation

Submit a batch job

import { NeureuClient } from '@neureus/sdk';

const client = new NeureuClient({ apiKey: process.env.NEUREUS_API_KEY! });

// Option 1: SDK
const job = await client.batch.create({
  model: 'gpt-4o-mini',  // GPT-4o-mini at $0.075/1M realtime → ~$0.045/1M batch
  requests: products.map((product, i) => ({
    customId: `product-${product.id}`,
    messages: [
      {
        role: 'system',
        content: 'Write a compelling 2-sentence product description. Focus on key features and benefits.',
      },
      {
        role: 'user',
        content: `Product: ${product.name}\nFeatures: ${product.features.join(', ')}`,
      },
    ],
  })),
  webhookUrl: 'https://your-app.com/webhooks/batch-complete',
});

console.log(`Batch job submitted: ${job.batchId}`);
console.log(`Poll at: ${job.pollUrl}`);

Option 2: Direct API (cURL)

curl -X POST https://app.neureus.ai/ai/batch \
  -H "Authorization: Bearer nr_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-haiku-4-5",
    "requests": [
      {
        "customId": "doc-001",
        "messages": [
          { "role": "user", "content": "Classify this support ticket: My order never arrived..." }
        ]
      }
    ],
    "webhookUrl": "https://your-app.com/webhook"
  }'

Poll for status

// Poll until complete
async function waitForBatch(batchId: string): Promise<BatchResult[]> {
  while (true) {
    const status = await client.batch.get(batchId);

    if (status.state === 'completed') {
      return status.results;
    }

    if (status.state === 'failed') {
      throw new Error(`Batch failed: ${status.error}`);
    }

    // Poll every 2 minutes — batch jobs take minutes, not seconds
    await new Promise(resolve => setTimeout(resolve, 120_000));
  }
}

Webhook handler (Next.js API route)

// app/api/webhooks/batch-complete/route.ts
import { NextRequest } from 'next/server';

export async function POST(req: NextRequest) {
  const { batchId, results, failCount } = await req.json();

  for (const result of results) {
    const { customId, text, inputTokens, outputTokens } = result;

    // customId is whatever you passed in the request (e.g., "product-123")
    const productId = customId.replace('product-', '');

    await db.products.update({
      where: { id: productId },
      data: { description: text },
    });
  }

  if (failCount > 0) {
    console.warn(`Batch ${batchId}: ${failCount} requests failed`);
  }

  return Response.json({ ok: true });
}

Force-submit a batch

By default, Neureus collects requests for 5 minutes before submitting to the provider API. For testing or urgent batches, force-submit immediately:

await client.batch.flush(batchId);

Or via API:

curl -X POST https://app.neureus.ai/ai/batch/${BATCH_ID}/flush \
  -H "Authorization: Bearer nr_your_key"

What to watch for

Failure handling: Neureus polls for provider status every 10 minutes. After 10 consecutive failed polls, the job is marked failed. Check failCount in the batch status response.

Result ordering: Batch results may arrive out of order relative to your input requests. Use customId to match results to inputs — don’t rely on array position.

Rate limits still apply: Batch API has its own rate limits at the provider level. For very large batches (>100K requests), split across multiple jobs.

Model availability: Batch inference is available for OpenAI (GPT models) and Anthropic (Claude models). Workers AI models don’t support async batch — use them for realtime at $0/call instead.

The 10-minute migration

If you have an existing pipeline calling OpenAI directly, the migration is:

  1. Change your endpoint from https://api.openai.com/v1/chat/completions to https://app.neureus.ai/ai/batch
  2. Wrap your messages array in a requests array with customId per item
  3. Add a webhookUrl or switch your polling to the Neureus batch status endpoint
  4. Remove your OpenAI API key from the request — use your Neureus API key in the Authorization header

That’s it. Your provider key, model selection, and message format stay the same.


The free tier doesn’t include batch inference — it’s available on Builder ($29/mo) and above. Start at app.neureus.ai/onboard.

Try Neureus AI — start free

500 Neurons/month, no credit card required. The complete AI application backend in one API.