Claude Batch API Guide

Claude Batch API Guide: Save 50% on Large-Scale Processing

The Claude Batch API lets you submit large collections of requests for asynchronous processing at half the cost of standard API calls. If your workload does not require real-time responses, the Batch API is the most cost-effective way to use Claude at scale. This guide covers setup, implementation, and optimization strategies.

What Is the Batch API?

The Batch API accepts a set of message requests as a single batch, processes them asynchronously within a 24-hour window, and returns all results when complete. In exchange for the longer processing time, you receive a 50% discount on all token costs.

Pricing Comparison

Ideal Use Cases

Python Implementation

import anthropic
import json

client = anthropic.Anthropic()

# Step 1: Create batch requests
requests = []
documents = ["doc1.txt", "doc2.txt", "doc3.txt"]  # Your data

for i, doc_path in enumerate(documents):
    with open(doc_path, "r") as f:
        content = f.read()

    requests.append({
        "custom_id": f"doc-{i}",
        "params": {
            "model": "claude-sonnet-4-20250514",
            "max_tokens": 1024,
            "messages": [{
                "role": "user",
                "content": f"Summarize this document in 3 bullet points:\n\n{content}"
            }]
        }
    })

# Step 2: Submit the batch
batch = client.batches.create(requests=requests)
print(f"Batch ID: {batch.id}")
print(f"Status: {batch.processing_status}")

Checking Batch Status

# Poll for completion
import time

while True:
    batch = client.batches.retrieve(batch.id)
    print(f"Status: {batch.processing_status}")
    print(f"Completed: {batch.request_counts.succeeded}/{batch.request_counts.processing + batch.request_counts.succeeded}")

    if batch.processing_status == "ended":
        break

    time.sleep(60)  # Check every minute

Retrieving Results

# Get all results
results = client.batches.results(batch.id)

for result in results:
    print(f"Request {result.custom_id}:")
    if result.result.type == "succeeded":
        message = result.result.message
        print(f"  Response: {message.content[0].text[:200]}...")
        print(f"  Tokens: {message.usage.input_tokens} in, {message.usage.output_tokens} out")
    else:
        print(f"  Error: {result.result.error}")
Tip: Use the custom_id field to map batch results back to your original data. This makes it easy to correlate responses with input documents, database records, or processing pipeline stages.

Node.js Implementation

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

// Create batch
const batch = await client.batches.create({
  requests: items.map((item, i) => ({
    custom_id: `item-${i}`,
    params: {
      model: 'claude-sonnet-4-20250514',
      max_tokens: 1024,
      messages: [
        { role: 'user', content: `Classify this review as positive, negative, or neutral: "${item.text}"` }
      ]
    }
  }))
});

console.log(`Batch ${batch.id} submitted with ${items.length} requests`);

// Poll for completion
let status = batch;
while (status.processing_status !== 'ended') {
  await new Promise(r => setTimeout(r, 30000));
  status = await client.batches.retrieve(batch.id);
  console.log(`Progress: ${status.request_counts.succeeded} completed`);
}

// Retrieve results
const results = await client.batches.results(batch.id);
for await (const result of results) {
  console.log(`${result.custom_id}: ${result.result.message?.content[0]?.text}`);
}

Batch API Limits and Constraints

Cost Optimization Strategies

  1. Combine Batch API with Haiku — For simple classification tasks, Batch Haiku costs just $0.40/$2.00 per million tokens. Process millions of items for pennies.
  2. Use prompt caching within batches — If all batch requests share a common system prompt, caching reduces input costs further.
  3. Right-size max_tokens — Set max_tokens to the minimum needed for each task. For classification tasks, 50-100 tokens is sufficient.
  4. Batch during off-peak hours — Submit batches during low-demand periods for potentially faster processing.
Warning: The Batch API does not support streaming. Each request in the batch returns a complete response. If you need real-time token-by-token output, use the standard Messages API instead.

Using Batch API Through a Relay

Relay services like claude4u.com can handle batch requests, providing the same 50% discount with additional benefits like per-key cost tracking and usage analytics across all your batch jobs:

export ANTHROPIC_BASE_URL="https://claude4u.com/antigravity"
export ANTHROPIC_API_KEY="cr_your_key_here"

# Batch API works through the relay with the same syntax
# The relay tracks batch costs per API key

The Batch API is one of the most underutilized cost-saving tools in the Claude ecosystem. If even 30% of your workload can tolerate asynchronous processing, switching those requests to batch can meaningfully reduce your monthly API bill.

Get Started with 轻舟 AI

Stable, fast AI API relay — supports Claude, OpenAI, Gemini and more

Sign Up Free