GPT-4o Complete Guide

GPT-4o Multimodal Model: Complete Guide

GPT-4o is OpenAI's flagship multimodal model that can process text, images, audio, and video inputs while generating text and audio outputs. The "o" stands for "omni," reflecting its ability to handle multiple modalities natively. This guide covers everything you need to know to use GPT-4o effectively in your applications.

What Makes GPT-4o Special

Text Generation with GPT-4o

from openai import OpenAI

client = OpenAI(base_url="https://claude4u.com/v1")

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are an expert software engineer."},
        {"role": "user", "content": "Explain the difference between REST and GraphQL."}
    ],
    temperature=0.7,
    max_tokens=1000
)

print(response.choices[0].message.content)

Image Understanding (Vision)

GPT-4o can analyze images sent as URLs or base64-encoded data:

from openai import OpenAI
import base64

client = OpenAI(base_url="https://claude4u.com/v1")

# Method 1: Image URL
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this image? Describe it in detail."},
            {
                "type": "image_url",
                "image_url": {"url": "https://example.com/photo.jpg"}
            }
        ]
    }]
)

# Method 2: Base64 encoded image
with open("screenshot.png", "rb") as f:
    img_b64 = base64.b64encode(f.read()).decode()

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Analyze this UI screenshot for accessibility issues."},
            {
                "type": "image_url",
                "image_url": {"url": f"data:image/png;base64,{img_b64}"}
            }
        ]
    }]
)

print(response.choices[0].message.content)

Multiple Images in One Request

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Compare these two product designs and list the differences."},
            {"type": "image_url", "image_url": {"url": "https://example.com/design-a.png"}},
            {"type": "image_url", "image_url": {"url": "https://example.com/design-b.png"}}
        ]
    }]
)

Image Detail Levels

Control how much detail the model uses when processing images, which affects both quality and cost:

content = [
    {"type": "text", "text": "Read all the text in this document."},
    {
        "type": "image_url",
        "image_url": {
            "url": "https://example.com/document.png",
            "detail": "high"  # Options: "auto", "low", "high"
        }
    }
]

Node.js Vision Example

import OpenAI from 'openai';
import fs from 'fs';

const client = new OpenAI({ baseURL: 'https://claude4u.com/v1' });

const imageBuffer = fs.readFileSync('chart.png');
const base64Image = imageBuffer.toString('base64');

const response = await client.chat.completions.create({
    model: 'gpt-4o',
    messages: [{
        role: 'user',
        content: [
            { type: 'text', text: 'Extract all data points from this chart and format as JSON.' },
            {
                type: 'image_url',
                image_url: { url: `data:image/png;base64,${base64Image}` }
            }
        ]
    }],
    response_format: { type: 'json_object' }
});

const data = JSON.parse(response.choices[0].message.content);
console.log(data);

GPT-4o vs GPT-4o-mini

Choose the right variant for your use case:

Tip: Use GPT-4o-mini for preprocessing and filtering, then route only complex cases to GPT-4o. This hybrid approach can reduce costs by 80% or more while maintaining quality where it matters.

Structured Outputs (JSON Mode)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{
        "role": "user",
        "content": "List 5 programming languages with their year of creation."
    }],
    response_format={"type": "json_object"}
)

import json
data = json.loads(response.choices[0].message.content)
Warning: When using JSON mode, you must include the word "JSON" in your prompt (system or user message). Otherwise the API will return an error.

Best Use Cases for GPT-4o

Tip: Access GPT-4o and all other OpenAI models through claude4u.com with a single API key. The relay service handles load balancing, failover, and provides access to Claude and Gemini models through the same OpenAI-compatible interface.

Get Started with 轻舟 AI

Stable, fast AI API relay — supports Claude, OpenAI, Gemini and more

Sign Up Free