Claude Batch API Guide
Claude Batch API Guide: Save 50% on Large-Scale Processing
The Claude Batch API lets you submit large collections of requests for asynchronous processing at half the cost of standard API calls. If your workload does not require real-time responses, the Batch API is the most cost-effective way to use Claude at scale. This guide covers setup, implementation, and optimization strategies.
What Is the Batch API?
The Batch API accepts a set of message requests as a single batch, processes them asynchronously within a 24-hour window, and returns all results when complete. In exchange for the longer processing time, you receive a 50% discount on all token costs.
Pricing Comparison
- Standard Sonnet: $3.00 / $15.00 per million tokens (input/output)
- Batch Sonnet: $1.50 / $7.50 per million tokens (50% off)
- Standard Opus: $15.00 / $75.00 per million tokens
- Batch Opus: $7.50 / $37.50 per million tokens (50% off)
- Standard Haiku: $0.80 / $4.00 per million tokens
- Batch Haiku: $0.40 / $2.00 per million tokens (50% off)
Ideal Use Cases
- Data classification — Categorize thousands of support tickets, reviews, or documents.
- Content generation — Generate product descriptions, summaries, or translations in bulk.
- Data extraction — Extract structured information from large document collections.
- Evaluation — Run LLM-as-judge evaluations across large test datasets.
- Code analysis — Analyze an entire codebase file-by-file for security vulnerabilities or style issues.
Python Implementation
import anthropic
import json
client = anthropic.Anthropic()
# Step 1: Create batch requests
requests = []
documents = ["doc1.txt", "doc2.txt", "doc3.txt"] # Your data
for i, doc_path in enumerate(documents):
with open(doc_path, "r") as f:
content = f.read()
requests.append({
"custom_id": f"doc-{i}",
"params": {
"model": "claude-sonnet-4-20250514",
"max_tokens": 1024,
"messages": [{
"role": "user",
"content": f"Summarize this document in 3 bullet points:\n\n{content}"
}]
}
})
# Step 2: Submit the batch
batch = client.batches.create(requests=requests)
print(f"Batch ID: {batch.id}")
print(f"Status: {batch.processing_status}")
Checking Batch Status
# Poll for completion
import time
while True:
batch = client.batches.retrieve(batch.id)
print(f"Status: {batch.processing_status}")
print(f"Completed: {batch.request_counts.succeeded}/{batch.request_counts.processing + batch.request_counts.succeeded}")
if batch.processing_status == "ended":
break
time.sleep(60) # Check every minute
Retrieving Results
# Get all results
results = client.batches.results(batch.id)
for result in results:
print(f"Request {result.custom_id}:")
if result.result.type == "succeeded":
message = result.result.message
print(f" Response: {message.content[0].text[:200]}...")
print(f" Tokens: {message.usage.input_tokens} in, {message.usage.output_tokens} out")
else:
print(f" Error: {result.result.error}")
custom_id field to map batch results back to your original data. This makes it easy to correlate responses with input documents, database records, or processing pipeline stages.
Node.js Implementation
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
// Create batch
const batch = await client.batches.create({
requests: items.map((item, i) => ({
custom_id: `item-${i}`,
params: {
model: 'claude-sonnet-4-20250514',
max_tokens: 1024,
messages: [
{ role: 'user', content: `Classify this review as positive, negative, or neutral: "${item.text}"` }
]
}
}))
});
console.log(`Batch ${batch.id} submitted with ${items.length} requests`);
// Poll for completion
let status = batch;
while (status.processing_status !== 'ended') {
await new Promise(r => setTimeout(r, 30000));
status = await client.batches.retrieve(batch.id);
console.log(`Progress: ${status.request_counts.succeeded} completed`);
}
// Retrieve results
const results = await client.batches.results(batch.id);
for await (const result of results) {
console.log(`${result.custom_id}: ${result.result.message?.content[0]?.text}`);
}
Batch API Limits and Constraints
- Maximum batch size: 10,000 requests per batch
- Processing time: Up to 24 hours (most batches complete much faster)
- Request format: Same as standard Messages API — all features including tool use, vision, and system prompts are supported
- Result availability: Results are available for 29 days after batch completion
Cost Optimization Strategies
- Combine Batch API with Haiku — For simple classification tasks, Batch Haiku costs just $0.40/$2.00 per million tokens. Process millions of items for pennies.
- Use prompt caching within batches — If all batch requests share a common system prompt, caching reduces input costs further.
- Right-size max_tokens — Set
max_tokensto the minimum needed for each task. For classification tasks, 50-100 tokens is sufficient. - Batch during off-peak hours — Submit batches during low-demand periods for potentially faster processing.
Using Batch API Through a Relay
Relay services like claude4u.com can handle batch requests, providing the same 50% discount with additional benefits like per-key cost tracking and usage analytics across all your batch jobs:
export ANTHROPIC_BASE_URL="https://claude4u.com/antigravity"
export ANTHROPIC_API_KEY="cr_your_key_here"
# Batch API works through the relay with the same syntax
# The relay tracks batch costs per API key
The Batch API is one of the most underutilized cost-saving tools in the Claude ecosystem. If even 30% of your workload can tolerate asynchronous processing, switching those requests to batch can meaningfully reduce your monthly API bill.
Get Started with 轻舟 AI
Stable, fast AI API relay — supports Claude, OpenAI, Gemini and more
Sign Up Free
轻舟 AI