Claude Vision API Tutorial

Claude Vision API: Image Analysis and Understanding Tutorial

Claude's vision capability allows you to send images along with text prompts, enabling powerful use cases like document analysis, screenshot understanding, chart interpretation, and visual question answering. This tutorial covers everything you need to implement Claude Vision in your applications.

Supported Image Formats

JPEG — Standard photos and screenshots
PNG — Screenshots, diagrams, and images with transparency
GIF — Static GIF images (first frame of animated GIFs)
WebP — Modern web image format

Maximum image size is approximately 5MB per image. Images are automatically resized if they exceed the model's processing limits.

Sending Images via Base64

Python Example

import anthropic
import base64

client = anthropic.Anthropic()

# Read and encode the image
with open("screenshot.png", "rb") as f:
    image_data = base64.standard_b64encode(f.read()).decode("utf-8")

message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": image_data
                    }
                },
                {
                    "type": "text",
                    "text": "Describe what you see in this image in detail."
                }
            ]
        }
    ]
)
print(message.content[0].text)

Sending Images via URL

import anthropic

client = anthropic.Anthropic()

message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "url",
                        "url": "https://example.com/chart.png"
                    }
                },
                {
                    "type": "text",
                    "text": "Analyze this chart and summarize the key trends."
                }
            ]
        }
    ]
)
print(message.content[0].text)

Node.js Example

import Anthropic from '@anthropic-ai/sdk';
import fs from 'fs';

const client = new Anthropic();

const imageData = fs.readFileSync('diagram.png').toString('base64');

const message = await client.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 1024,
  messages: [
    {
      role: 'user',
      content: [
        {
          type: 'image',
          source: {
            type: 'base64',
            media_type: 'image/png',
            data: imageData,
          },
        },
        {
          type: 'text',
          text: 'Explain this architecture diagram.',
        },
      ],
    },
  ],
});

console.log(message.content[0].text);

Multiple Images in One Request

You can send multiple images in a single message for comparison or comprehensive analysis:

message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=2048,
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": before_image}},
                {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": after_image}},
                {"type": "text", "text": "Compare these two UI screenshots and list all visual differences."}
            ]
        }
    ]
)

Tip: When sending multiple images, keep in mind that each image consumes tokens from your context window. A typical image uses 1,000-5,000 tokens depending on resolution. Monitor usage to avoid unexpected costs.

Practical Use Cases

Document OCR and Extraction

# Extract text from a scanned document
message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    messages=[{
        "role": "user",
        "content": [
            {"type": "image", "source": {"type": "base64", "media_type": "image/jpeg", "data": doc_image}},
            {"type": "text", "text": "Extract all text from this document. Preserve the original formatting and structure."}
        ]
    }]
)

UI Screenshot Analysis

# Analyze a UI screenshot for accessibility issues
message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=2048,
    messages=[{
        "role": "user",
        "content": [
            {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": ui_screenshot}},
            {"type": "text", "text": "Review this UI for accessibility issues. Check contrast, text size, button spacing, and label clarity."}
        ]
    }]
)

Chart and Data Visualization Analysis

# Interpret a data visualization
message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=2048,
    messages=[{
        "role": "user",
        "content": [
            {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": chart_image}},
            {"type": "text", "text": "Analyze this chart. What are the key trends, outliers, and notable data points?"}
        ]
    }]
)

Token Cost for Images

Image tokens are counted as input tokens and billed at the model's input rate. Approximate costs per image with Claude Sonnet:

Small image (up to 384x384): ~200 tokens (~$0.0006)
Medium image (768x768): ~800 tokens (~$0.0024)
Large image (1500x1500): ~2,500 tokens (~$0.0075)
Very large image (4K): ~5,000 tokens (~$0.015)

Warning: Resize images to the minimum resolution needed for your task before sending them to the API. Sending a 4K screenshot when 720p would suffice wastes tokens and increases costs significantly.

Vision Through a Relay Service

Vision API requests work seamlessly through relay services like claude4u.com. The base64 image data is forwarded transparently to the upstream API:

# Set up relay connection
export ANTHROPIC_BASE_URL="https://claude4u.com/antigravity"
export ANTHROPIC_API_KEY="cr_your_relay_key_here"

# Your vision code works without any changes
# The relay proxies image data to Anthropic automatically

Claude's vision capabilities open up powerful automation possibilities from document processing to visual QA. Combined with tool use, you can build systems that see, understand, and act on visual information autonomously.

Get Started with 轻舟 AI

Stable, fast AI API relay — supports Claude, OpenAI, Gemini and more

Claude Vision API Tutorial

Claude Vision API: Image Analysis and Understanding Tutorial

Supported Image Formats

Sending Images via Base64

Python Example

Sending Images via URL

Node.js Example

Multiple Images in One Request

Practical Use Cases

Document OCR and Extraction

UI Screenshot Analysis

Chart and Data Visualization Analysis

Token Cost for Images

Vision Through a Relay Service

Get Started with 轻舟 AI

More Guides