Build an AI Chatbot

Build an AI Chatbot with Claude and GPT API

Building an AI-powered chatbot has never been more accessible. With large language model APIs from Anthropic (Claude) and OpenAI (GPT), developers can create intelligent conversational agents that understand context, handle complex queries, and deliver human-like responses. This guide walks you through the entire process, from choosing the right model to deploying a production-ready chatbot.

Why Build a Chatbot with LLM APIs?

Traditional chatbots relied on rigid decision trees and keyword matching. Modern LLM-powered chatbots offer significant advantages:

Natural language understanding — Users can phrase questions any way they want, and the model still understands intent.
Context retention — Multi-turn conversations feel natural as the model tracks conversation history.
Flexible deployment — A single API integration powers chatbots across web, mobile, Slack, Discord, and more.
Rapid iteration — Adjust behavior through prompts rather than rewriting code.

Choosing the Right Model

The first decision is selecting which model best fits your use case:

Claude (Anthropic) — Excels at long-form conversation, nuanced reasoning, and following complex instructions. Claude's large context window (up to 200K tokens) makes it ideal for chatbots that need to reference lengthy documents or maintain extended conversations.
GPT-4 / GPT-4o (OpenAI) — Strong general-purpose models with excellent coding and creative writing abilities. Wide ecosystem support.
Claude Haiku / GPT-4o-mini — Faster, cheaper models suitable for high-volume chatbots where response speed matters more than deep reasoning.

Pro Tip: Use a relay service like claude4u.com to access multiple AI models through a single unified API endpoint. This lets you switch between Claude and GPT models without changing your integration code.

Basic Chatbot Architecture

A production chatbot typically consists of three layers: the frontend interface, a backend server that manages conversation state, and the LLM API integration. Here is a minimal Node.js implementation:

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic({
  apiKey: process.env.CLAUDE_API_KEY,
  baseURL: 'https://claude4u.com'  // Use relay for reliability
});

// Store conversation history per session
const sessions = new Map();

async function chat(sessionId, userMessage) {
  if (!sessions.has(sessionId)) {
    sessions.set(sessionId, []);
  }

  const history = sessions.get(sessionId);
  history.push({ role: 'user', content: userMessage });

  const response = await client.messages.create({
    model: 'claude-sonnet-4-20250514',
    max_tokens: 1024,
    system: 'You are a helpful customer support assistant.',
    messages: history
  });

  const assistantMessage = response.content[0].text;
  history.push({ role: 'assistant', content: assistantMessage });

  return assistantMessage;
}

Adding Streaming Responses

For a better user experience, stream responses token by token instead of waiting for the complete answer. This dramatically improves perceived response time:

async function chatStream(sessionId, userMessage, onToken) {
  const history = sessions.get(sessionId) || [];
  history.push({ role: 'user', content: userMessage });

  const stream = await client.messages.stream({
    model: 'claude-sonnet-4-20250514',
    max_tokens: 1024,
    system: 'You are a helpful assistant.',
    messages: history
  });

  let fullResponse = '';
  for await (const event of stream) {
    if (event.type === 'content_block_delta') {
      fullResponse += event.delta.text;
      onToken(event.delta.text);
    }
  }

  history.push({ role: 'assistant', content: fullResponse });
  sessions.set(sessionId, history);
}

Production Considerations

Before deploying your chatbot, address these critical areas:

Rate limiting — Protect against abuse by limiting requests per user per minute.
Conversation pruning — Trim older messages when approaching the model's context window limit to control costs.
Error handling — Implement retry logic with exponential backoff for API failures (429, 500, 529 errors).
Content moderation — Filter inputs and outputs to prevent misuse.
Cost monitoring — Track token usage per session to manage API spending.

Warning: Never expose your API key in client-side code. Always route API calls through your backend server. Consider using a relay service to add an extra layer of key management and access control.

Enhancing Your Chatbot

Once your basic chatbot works, consider these enhancements:

RAG (Retrieval-Augmented Generation) — Connect a vector database to give your chatbot access to your company's knowledge base.
Tool use / Function calling — Let the chatbot take actions like checking order status, booking appointments, or querying databases.
Multi-model fallback — Route to a cheaper model for simple queries and a more capable model for complex ones.
Analytics — Log conversations to identify common questions and improve your system prompt over time.

Building an AI chatbot with Claude or GPT APIs is a rewarding project that can transform how your users interact with your product. Start with a simple implementation, iterate based on real usage patterns, and scale your architecture as demand grows.

Get Started with 轻舟 AI

Stable, fast AI API relay — supports Claude, OpenAI, Gemini and more

Build an AI Chatbot