OpenAI Embedding API Tutorial

OpenAI Embedding API: Complete Guide for Vector Search and RAG

OpenAI's Embedding API converts text into dense vector representations that capture semantic meaning. These vectors power similarity search, recommendation systems, clustering, and Retrieval-Augmented Generation (RAG). This guide covers model selection, implementation, and practical applications.

What Are Embeddings?

An embedding is a list of floating-point numbers (a vector) that represents the meaning of a piece of text. Similar texts produce similar vectors, which can be compared using cosine similarity or dot product. OpenAI's latest embedding models produce vectors with up to 3,072 dimensions.

Available Embedding Models

Tip: Use text-embedding-3-small for most applications. It is 5x cheaper than ada-002 with better performance. Only upgrade to text-embedding-3-large if you need maximum retrieval accuracy.

Basic Usage (Python)

from openai import OpenAI

client = OpenAI(base_url="https://claude4u.com/v1")

response = client.embeddings.create(
    model="text-embedding-3-small",
    input="Machine learning is a subset of artificial intelligence."
)

embedding = response.data[0].embedding
print(f"Dimensions: {len(embedding)}")
print(f"First 5 values: {embedding[:5]}")

Batch Embedding

Embed multiple texts in a single API call for better performance:

texts = [
    "How to train a neural network",
    "Best practices for deep learning",
    "Introduction to natural language processing",
    "Computer vision fundamentals",
    "Reinforcement learning explained"
]

response = client.embeddings.create(
    model="text-embedding-3-small",
    input=texts
)

embeddings = [item.embedding for item in response.data]
print(f"Generated {len(embeddings)} embeddings")

Node.js Implementation

import OpenAI from 'openai';

const client = new OpenAI({ baseURL: 'https://claude4u.com/v1' });

const response = await client.embeddings.create({
    model: 'text-embedding-3-small',
    input: ['Hello world', 'How are you?', 'Goodbye']
});

const embeddings = response.data.map((item) => item.embedding);
console.log(`Generated ${embeddings.length} embeddings`);
console.log(`Dimensions: ${embeddings[0].length}`);

Cosine Similarity Search

import numpy as np
from openai import OpenAI

client = OpenAI(base_url="https://claude4u.com/v1")

def get_embedding(text):
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=text
    )
    return np.array(response.data[0].embedding)

def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

# Build a knowledge base
documents = [
    "Python is a high-level programming language.",
    "JavaScript runs in web browsers.",
    "SQL is used for database queries.",
    "Docker containers package applications.",
    "Kubernetes orchestrates container deployments."
]

doc_embeddings = [get_embedding(doc) for doc in documents]

# Search
query = "How do I query a database?"
query_embedding = get_embedding(query)

similarities = [cosine_similarity(query_embedding, doc_emb) for doc_emb in doc_embeddings]

# Rank results
results = sorted(zip(documents, similarities), key=lambda x: x[1], reverse=True)
for doc, score in results[:3]:
    print(f"{score:.4f}: {doc}")

Reducing Dimensions

The v3 models support native dimension reduction to save storage and improve speed:

response = client.embeddings.create(
    model="text-embedding-3-large",
    input="Hello world",
    dimensions=256  # Reduce from 3072 to 256
)

embedding = response.data[0].embedding
print(f"Reduced dimensions: {len(embedding)}")  # 256

Building a RAG System

Retrieval-Augmented Generation combines embeddings with language models:

from openai import OpenAI

client = OpenAI(base_url="https://claude4u.com/v1")

def rag_query(question, knowledge_base, kb_embeddings):
    # Step 1: Embed the question
    q_response = client.embeddings.create(
        model="text-embedding-3-small",
        input=question
    )
    q_embedding = q_response.data[0].embedding

    # Step 2: Find most relevant documents
    similarities = []
    for i, kb_emb in enumerate(kb_embeddings):
        sim = sum(a * b for a, b in zip(q_embedding, kb_emb))
        similarities.append((i, sim))

    top_docs = sorted(similarities, key=lambda x: x[1], reverse=True)[:3]
    context = "\n".join([knowledge_base[i] for i, _ in top_docs])

    # Step 3: Generate answer with context
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": f"Answer based on this context:\n{context}"},
            {"role": "user", "content": question}
        ]
    )
    return response.choices[0].message.content

Using with Vector Databases

For production systems, store embeddings in a vector database:

Warning: Embedding models have a maximum input length of 8,191 tokens. For longer documents, split them into overlapping chunks of 500-1,000 tokens for best retrieval performance.

Text Chunking Best Practices

def chunk_text(text, chunk_size=500, overlap=50):
    """Split text into overlapping chunks by word count."""
    words = text.split()
    chunks = []
    for i in range(0, len(words), chunk_size - overlap):
        chunk = " ".join(words[i:i + chunk_size])
        chunks.append(chunk)
    return chunks
Tip: claude4u.com supports the OpenAI Embedding API through its compatible interface. You can generate embeddings using the same code and endpoint, with the added benefit of unified billing and rate limit management across all your AI API usage.

Get Started with 轻舟 AI

Stable, fast AI API relay — supports Claude, OpenAI, Gemini and more

Sign Up Free