Build an AI Search Engine
Build an AI-Powered Search Engine with Embeddings
Traditional keyword search fails when users do not know the exact terms to search for. AI-powered semantic search uses embeddings — dense vector representations of text — to find results based on meaning rather than exact word matches. This guide shows you how to build a semantic search engine using embedding APIs and large language models to deliver dramatically better search results.
How Semantic Search Works
Semantic search operates on a fundamentally different principle than keyword search:
- Indexing — Convert every document in your corpus into a vector embedding using an embedding model.
- Query — Convert the user's search query into a vector using the same model.
- Retrieval — Find the documents whose vectors are closest to the query vector (cosine similarity).
- Re-ranking — Optionally use an LLM to re-rank results by relevance and generate summaries.
This means a search for "how to fix a slow application" will match documents about "performance optimization techniques" even if they share no words in common.
Step 1: Generate Embeddings
Use an embedding API to convert your content into vectors. OpenAI's text-embedding-3-small and Cohere's embed-v3 are popular choices:
import OpenAI from 'openai';
const openai = new OpenAI({
apiKey: process.env.OPENAI_KEY,
baseURL: 'https://claude4u.com/openai' // Relay endpoint
});
async function generateEmbedding(text) {
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: text
});
return response.data[0].embedding; // 1536-dimensional vector
}
// Index your documents
async function indexDocuments(documents) {
const indexed = [];
for (const doc of documents) {
const embedding = await generateEmbedding(doc.content);
indexed.push({
id: doc.id,
title: doc.title,
content: doc.content,
embedding: embedding
});
}
return indexed;
}
Step 2: Store Vectors in a Database
For production use, store embeddings in a vector database optimized for similarity search:
- pgvector (PostgreSQL extension) — Best for teams already using PostgreSQL. Easy to deploy, good enough for millions of documents.
- Pinecone — Fully managed vector database with fast query times. Great for large-scale applications.
- Weaviate — Open-source, supports hybrid search (keyword + semantic), built-in vectorization.
- Qdrant — High-performance, open-source, with excellent filtering capabilities.
- ChromaDB — Lightweight, developer-friendly, ideal for prototyping and small-scale applications.
-- pgvector: Create table and index
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
title TEXT NOT NULL,
content TEXT NOT NULL,
embedding vector(1536)
);
-- Create an IVFFlat index for fast similarity search
CREATE INDEX ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
-- Semantic search query
SELECT id, title, content,
1 - (embedding <=> $1::vector) AS similarity
FROM documents
ORDER BY embedding <=> $1::vector
LIMIT 10;
Step 3: AI-Enhanced Results with LLM Re-ranking
Raw vector similarity often returns good candidates but in suboptimal order. Use an LLM to re-rank results and generate answer summaries:
import Anthropic from '@anthropic-ai/sdk';
const claude = new Anthropic({
apiKey: process.env.CLAUDE_KEY,
baseURL: 'https://claude4u.com'
});
async function searchWithAI(query, vectorResults) {
const context = vectorResults
.map((r, i) => `[${i + 1}] ${r.title}\n${r.content}`)
.join('\n\n---\n\n');
const response = await claude.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 1024,
system: `You are a search assistant. Given the user's query and search results,
provide a direct answer citing the most relevant sources. Format:
1. A concise answer (2-3 sentences)
2. Most relevant results ranked by relevance with brief explanations`,
messages: [{
role: 'user',
content: `Query: ${query}\n\nSearch Results:\n${context}`
}]
});
return response.content[0].text;
}
Pro Tip: Chunk your documents into 200-500 token segments before embedding. Smaller chunks produce more precise matches because the embedding captures a focused topic rather than an average of many topics in a long document. Overlap chunks by 50-100 tokens to preserve context at boundaries.
Hybrid Search: Combining Keyword and Semantic
The best search systems combine both approaches. Use keyword search for exact matches (product IDs, error codes, names) and semantic search for conceptual queries:
- Run both keyword (BM25) and semantic search in parallel.
- Use Reciprocal Rank Fusion (RRF) to merge and re-rank results from both methods.
- Weight keyword results higher for short, specific queries and semantic results higher for natural language questions.
Warning: Embedding models have token limits (typically 8,192 tokens). Documents exceeding this limit must be chunked before embedding. Additionally, embedding quality degrades for very short texts (under 20 tokens), so combine metadata fields into a single text block before embedding.
Performance Optimization
As your document collection grows, search performance becomes critical:
- Batch embedding — Generate embeddings in batches of 100-1000 to reduce API overhead.
- Approximate nearest neighbors (ANN) — Use ANN indexes (IVFFlat, HNSW) instead of brute-force search for collections over 100K documents.
- Pre-filtering — Apply metadata filters (category, date range, permissions) before vector search to reduce the search space.
- Caching — Cache embeddings for frequent queries and popular search terms.
Building an AI-powered search engine transforms how users discover information in your application. By combining embedding APIs for retrieval with LLM APIs for understanding and summarization — easily managed through a unified relay service like claude4u.com — you create search experiences that feel intelligent and effortless.
Get Started with 轻舟 AI
Stable, fast AI API relay — supports Claude, OpenAI, Gemini and more
Sign Up Free
轻舟 AI