Claude Vision API Tutorial
Claude Vision API: Image Analysis and Understanding Tutorial
Claude's vision capability allows you to send images along with text prompts, enabling powerful use cases like document analysis, screenshot understanding, chart interpretation, and visual question answering. This tutorial covers everything you need to implement Claude Vision in your applications.
Supported Image Formats
- JPEG — Standard photos and screenshots
- PNG — Screenshots, diagrams, and images with transparency
- GIF — Static GIF images (first frame of animated GIFs)
- WebP — Modern web image format
Maximum image size is approximately 5MB per image. Images are automatically resized if they exceed the model's processing limits.
Sending Images via Base64
Python Example
import anthropic
import base64
client = anthropic.Anthropic()
# Read and encode the image
with open("screenshot.png", "rb") as f:
image_data = base64.standard_b64encode(f.read()).decode("utf-8")
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": image_data
}
},
{
"type": "text",
"text": "Describe what you see in this image in detail."
}
]
}
]
)
print(message.content[0].text)
Sending Images via URL
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "url",
"url": "https://example.com/chart.png"
}
},
{
"type": "text",
"text": "Analyze this chart and summarize the key trends."
}
]
}
]
)
print(message.content[0].text)
Node.js Example
import Anthropic from '@anthropic-ai/sdk';
import fs from 'fs';
const client = new Anthropic();
const imageData = fs.readFileSync('diagram.png').toString('base64');
const message = await client.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 1024,
messages: [
{
role: 'user',
content: [
{
type: 'image',
source: {
type: 'base64',
media_type: 'image/png',
data: imageData,
},
},
{
type: 'text',
text: 'Explain this architecture diagram.',
},
],
},
],
});
console.log(message.content[0].text);
Multiple Images in One Request
You can send multiple images in a single message for comparison or comprehensive analysis:
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
messages=[
{
"role": "user",
"content": [
{"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": before_image}},
{"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": after_image}},
{"type": "text", "text": "Compare these two UI screenshots and list all visual differences."}
]
}
]
)
Practical Use Cases
Document OCR and Extraction
# Extract text from a scanned document
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
messages=[{
"role": "user",
"content": [
{"type": "image", "source": {"type": "base64", "media_type": "image/jpeg", "data": doc_image}},
{"type": "text", "text": "Extract all text from this document. Preserve the original formatting and structure."}
]
}]
)
UI Screenshot Analysis
# Analyze a UI screenshot for accessibility issues
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
messages=[{
"role": "user",
"content": [
{"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": ui_screenshot}},
{"type": "text", "text": "Review this UI for accessibility issues. Check contrast, text size, button spacing, and label clarity."}
]
}]
)
Chart and Data Visualization Analysis
# Interpret a data visualization
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
messages=[{
"role": "user",
"content": [
{"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": chart_image}},
{"type": "text", "text": "Analyze this chart. What are the key trends, outliers, and notable data points?"}
]
}]
)
Token Cost for Images
Image tokens are counted as input tokens and billed at the model's input rate. Approximate costs per image with Claude Sonnet:
- Small image (up to 384x384): ~200 tokens (~$0.0006)
- Medium image (768x768): ~800 tokens (~$0.0024)
- Large image (1500x1500): ~2,500 tokens (~$0.0075)
- Very large image (4K): ~5,000 tokens (~$0.015)
Vision Through a Relay Service
Vision API requests work seamlessly through relay services like claude4u.com. The base64 image data is forwarded transparently to the upstream API:
# Set up relay connection
export ANTHROPIC_BASE_URL="https://claude4u.com/antigravity"
export ANTHROPIC_API_KEY="cr_your_relay_key_here"
# Your vision code works without any changes
# The relay proxies image data to Anthropic automatically
Claude's vision capabilities open up powerful automation possibilities from document processing to visual QA. Combined with tool use, you can build systems that see, understand, and act on visual information autonomously.
Get Started with 轻舟 AI
Stable, fast AI API relay — supports Claude, OpenAI, Gemini and more
Sign Up Free
轻舟 AI