GPT-4o Complete Guide
GPT-4o Multimodal Model: Complete Guide
GPT-4o is OpenAI's flagship multimodal model that can process text, images, audio, and video inputs while generating text and audio outputs. The "o" stands for "omni," reflecting its ability to handle multiple modalities natively. This guide covers everything you need to know to use GPT-4o effectively in your applications.
What Makes GPT-4o Special
- Native multimodal — Processes images and text in a single model, not separate pipelines
- Faster than GPT-4 Turbo — 2x faster response times on average
- Cheaper than GPT-4 Turbo — 50% lower cost per token
- 128K context window — Process up to ~96,000 words in a single request
- Improved multilingual — Better performance on non-English languages
- Structured outputs — Native JSON mode and function calling support
Text Generation with GPT-4o
from openai import OpenAI
client = OpenAI(base_url="https://claude4u.com/v1")
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are an expert software engineer."},
{"role": "user", "content": "Explain the difference between REST and GraphQL."}
],
temperature=0.7,
max_tokens=1000
)
print(response.choices[0].message.content)
Image Understanding (Vision)
GPT-4o can analyze images sent as URLs or base64-encoded data:
from openai import OpenAI
import base64
client = OpenAI(base_url="https://claude4u.com/v1")
# Method 1: Image URL
response = client.chat.completions.create(
model="gpt-4o",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image? Describe it in detail."},
{
"type": "image_url",
"image_url": {"url": "https://example.com/photo.jpg"}
}
]
}]
)
# Method 2: Base64 encoded image
with open("screenshot.png", "rb") as f:
img_b64 = base64.b64encode(f.read()).decode()
response = client.chat.completions.create(
model="gpt-4o",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "Analyze this UI screenshot for accessibility issues."},
{
"type": "image_url",
"image_url": {"url": f"data:image/png;base64,{img_b64}"}
}
]
}]
)
print(response.choices[0].message.content)
Multiple Images in One Request
response = client.chat.completions.create(
model="gpt-4o",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "Compare these two product designs and list the differences."},
{"type": "image_url", "image_url": {"url": "https://example.com/design-a.png"}},
{"type": "image_url", "image_url": {"url": "https://example.com/design-b.png"}}
]
}]
)
Image Detail Levels
Control how much detail the model uses when processing images, which affects both quality and cost:
content = [
{"type": "text", "text": "Read all the text in this document."},
{
"type": "image_url",
"image_url": {
"url": "https://example.com/document.png",
"detail": "high" # Options: "auto", "low", "high"
}
}
]
- low — 85 tokens per image. Fast and cheap, good for simple questions.
- high — Up to 1,105 tokens per image tile (varies by resolution). Best for detailed analysis.
- auto — The model decides based on the image size (default).
Node.js Vision Example
import OpenAI from 'openai';
import fs from 'fs';
const client = new OpenAI({ baseURL: 'https://claude4u.com/v1' });
const imageBuffer = fs.readFileSync('chart.png');
const base64Image = imageBuffer.toString('base64');
const response = await client.chat.completions.create({
model: 'gpt-4o',
messages: [{
role: 'user',
content: [
{ type: 'text', text: 'Extract all data points from this chart and format as JSON.' },
{
type: 'image_url',
image_url: { url: `data:image/png;base64,${base64Image}` }
}
]
}],
response_format: { type: 'json_object' }
});
const data = JSON.parse(response.choices[0].message.content);
console.log(data);
GPT-4o vs GPT-4o-mini
Choose the right variant for your use case:
- GPT-4o — Best for complex reasoning, detailed analysis, creative writing, and code generation. $2.50 input / $10.00 output per 1M tokens.
- GPT-4o-mini — Best for simple tasks, classification, extraction, and high-volume processing. $0.15 input / $0.60 output per 1M tokens. 16x cheaper than GPT-4o.
Tip: Use GPT-4o-mini for preprocessing and filtering, then route only complex cases to GPT-4o. This hybrid approach can reduce costs by 80% or more while maintaining quality where it matters.
Structured Outputs (JSON Mode)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{
"role": "user",
"content": "List 5 programming languages with their year of creation."
}],
response_format={"type": "json_object"}
)
import json
data = json.loads(response.choices[0].message.content)
Warning: When using JSON mode, you must include the word "JSON" in your prompt (system or user message). Otherwise the API will return an error.
Best Use Cases for GPT-4o
- Document analysis — Extract data from scanned documents, receipts, invoices
- Code review — Analyze screenshots of code for bugs and improvements
- Product analysis — Compare product images, read labels, identify defects
- Accessibility auditing — Evaluate UI screenshots for accessibility compliance
- Chart and graph interpretation — Extract data from visualizations
- Content moderation — Analyze images for policy compliance
Tip: Access GPT-4o and all other OpenAI models through claude4u.com with a single API key. The relay service handles load balancing, failover, and provides access to Claude and Gemini models through the same OpenAI-compatible interface.
Get Started with 轻舟 AI
Stable, fast AI API relay — supports Claude, OpenAI, Gemini and more
Sign Up Free
轻舟 AI