OpenAI Embedding API Tutorial
OpenAI Embedding API: Complete Guide for Vector Search and RAG
OpenAI's Embedding API converts text into dense vector representations that capture semantic meaning. These vectors power similarity search, recommendation systems, clustering, and Retrieval-Augmented Generation (RAG). This guide covers model selection, implementation, and practical applications.
What Are Embeddings?
An embedding is a list of floating-point numbers (a vector) that represents the meaning of a piece of text. Similar texts produce similar vectors, which can be compared using cosine similarity or dot product. OpenAI's latest embedding models produce vectors with up to 3,072 dimensions.
Available Embedding Models
- text-embedding-3-large — 3,072 dimensions, best accuracy. $0.13 / 1M tokens.
- text-embedding-3-small — 1,536 dimensions, great balance of cost and quality. $0.02 / 1M tokens.
- text-embedding-ada-002 — 1,536 dimensions, legacy model. $0.10 / 1M tokens.
text-embedding-3-small for most applications. It is 5x cheaper than ada-002 with better performance. Only upgrade to text-embedding-3-large if you need maximum retrieval accuracy.
Basic Usage (Python)
from openai import OpenAI
client = OpenAI(base_url="https://claude4u.com/v1")
response = client.embeddings.create(
model="text-embedding-3-small",
input="Machine learning is a subset of artificial intelligence."
)
embedding = response.data[0].embedding
print(f"Dimensions: {len(embedding)}")
print(f"First 5 values: {embedding[:5]}")
Batch Embedding
Embed multiple texts in a single API call for better performance:
texts = [
"How to train a neural network",
"Best practices for deep learning",
"Introduction to natural language processing",
"Computer vision fundamentals",
"Reinforcement learning explained"
]
response = client.embeddings.create(
model="text-embedding-3-small",
input=texts
)
embeddings = [item.embedding for item in response.data]
print(f"Generated {len(embeddings)} embeddings")
Node.js Implementation
import OpenAI from 'openai';
const client = new OpenAI({ baseURL: 'https://claude4u.com/v1' });
const response = await client.embeddings.create({
model: 'text-embedding-3-small',
input: ['Hello world', 'How are you?', 'Goodbye']
});
const embeddings = response.data.map((item) => item.embedding);
console.log(`Generated ${embeddings.length} embeddings`);
console.log(`Dimensions: ${embeddings[0].length}`);
Cosine Similarity Search
import numpy as np
from openai import OpenAI
client = OpenAI(base_url="https://claude4u.com/v1")
def get_embedding(text):
response = client.embeddings.create(
model="text-embedding-3-small",
input=text
)
return np.array(response.data[0].embedding)
def cosine_similarity(a, b):
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
# Build a knowledge base
documents = [
"Python is a high-level programming language.",
"JavaScript runs in web browsers.",
"SQL is used for database queries.",
"Docker containers package applications.",
"Kubernetes orchestrates container deployments."
]
doc_embeddings = [get_embedding(doc) for doc in documents]
# Search
query = "How do I query a database?"
query_embedding = get_embedding(query)
similarities = [cosine_similarity(query_embedding, doc_emb) for doc_emb in doc_embeddings]
# Rank results
results = sorted(zip(documents, similarities), key=lambda x: x[1], reverse=True)
for doc, score in results[:3]:
print(f"{score:.4f}: {doc}")
Reducing Dimensions
The v3 models support native dimension reduction to save storage and improve speed:
response = client.embeddings.create(
model="text-embedding-3-large",
input="Hello world",
dimensions=256 # Reduce from 3072 to 256
)
embedding = response.data[0].embedding
print(f"Reduced dimensions: {len(embedding)}") # 256
Building a RAG System
Retrieval-Augmented Generation combines embeddings with language models:
from openai import OpenAI
client = OpenAI(base_url="https://claude4u.com/v1")
def rag_query(question, knowledge_base, kb_embeddings):
# Step 1: Embed the question
q_response = client.embeddings.create(
model="text-embedding-3-small",
input=question
)
q_embedding = q_response.data[0].embedding
# Step 2: Find most relevant documents
similarities = []
for i, kb_emb in enumerate(kb_embeddings):
sim = sum(a * b for a, b in zip(q_embedding, kb_emb))
similarities.append((i, sim))
top_docs = sorted(similarities, key=lambda x: x[1], reverse=True)[:3]
context = "\n".join([knowledge_base[i] for i, _ in top_docs])
# Step 3: Generate answer with context
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": f"Answer based on this context:\n{context}"},
{"role": "user", "content": question}
]
)
return response.choices[0].message.content
Using with Vector Databases
For production systems, store embeddings in a vector database:
- Pinecone — Managed vector database with filtering and metadata
- Weaviate — Open-source with hybrid search capabilities
- Qdrant — High-performance open-source vector database
- ChromaDB — Lightweight, perfect for prototyping
- pgvector — PostgreSQL extension for vector similarity search
Text Chunking Best Practices
def chunk_text(text, chunk_size=500, overlap=50):
"""Split text into overlapping chunks by word count."""
words = text.split()
chunks = []
for i in range(0, len(words), chunk_size - overlap):
chunk = " ".join(words[i:i + chunk_size])
chunks.append(chunk)
return chunks
Get Started with 轻舟 AI
Stable, fast AI API relay — supports Claude, OpenAI, Gemini and more
Sign Up Free
轻舟 AI