AI API Best Practices

AI API Best Practices for Developers

Working with AI APIs effectively requires more than just making HTTP requests and parsing responses. From prompt engineering to error handling, streaming to cost management, there are proven patterns that separate production-grade AI integrations from fragile prototypes. This guide covers the essential best practices every developer should follow when building with AI APIs.

1. Design Your Prompts for Consistency

Prompt engineering is the foundation of reliable AI API usage. A well-structured prompt produces consistent, useful outputs while minimizing token waste.

// Bad: Vague, inconsistent results
const response = await client.chat.completions.create({
  model: "claude-sonnet-4-20250514",
  messages: [
    { role: "user", content: "Check this code" }
  ]
});

// Good: Structured prompt with clear expectations
const response = await client.chat.completions.create({
  model: "claude-sonnet-4-20250514",
  messages: [
    {
      role: "system",
      content: `You are a code reviewer. Analyze code for:
        1. Bugs and logical errors
        2. Security vulnerabilities
        3. Performance issues
        Respond in JSON with fields: issues[], suggestions[], severity`
    },
    { role: "user", content: `Review this function:\n${codeSnippet}` }
  ]
});

2. Implement Robust Error Handling

AI APIs can fail in ways that traditional APIs do not. Build resilience into every call:

async function callAIWithRetry(params, maxRetries = 3) {
  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      return await client.chat.completions.create(params);
    } catch (error) {
      if (error.status === 429) {
        // Rate limited — exponential backoff
        const delay = Math.pow(2, attempt) * 1000;
        console.log(`Rate limited. Retrying in ${delay}ms...`);
        await new Promise((r) => setTimeout(r, delay));
      } else if (error.status === 529) {
        // Overloaded — wait longer
        await new Promise((r) => setTimeout(r, 10000));
      } else if (error.status >= 500) {
        // Server error — retry with backoff
        await new Promise((r) => setTimeout(r, attempt * 2000));
      } else {
        // Client error (400, 401, 403) — don't retry
        throw error;
      }
    }
  }
  throw new Error('Max retries exceeded');
}

Using a relay service like claude4u.com handles much of this retry logic for you. The service automatically manages rate limits, retries on transient errors, and routes around overloaded accounts — so your application code stays cleaner.

3. Use Streaming for Better User Experience

For interactive applications, streaming responses via Server-Sent Events (SSE) dramatically improves perceived latency. Users see the first tokens within milliseconds instead of waiting for the complete response.

const stream = await client.chat.completions.create({
  model: "claude-sonnet-4-20250514",
  messages: [{ role: "user", content: prompt }],
  stream: true
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content;
  if (content) {
    process.stdout.write(content);
  }
}

4. Manage Context Windows Efficiently

Every token in your request costs money and counts against the model's context window. Be intentional about what you include:

Summarize conversation history instead of sending the full chat log
Use system prompts wisely — keep them concise but complete
Leverage prompt caching for repeated system prompts and few-shot examples
Truncate or chunk large inputs rather than sending entire documents

5. Implement Request Cancellation

When a user navigates away or cancels an operation, stop the API request to avoid wasting tokens:

const controller = new AbortController();

// Cancel on user action
cancelButton.addEventListener('click', () => controller.abort());

try {
  const response = await fetch('https://claude4u.com/v1/chat/completions', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${apiKey}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify(params),
    signal: controller.signal
  });
} catch (error) {
  if (error.name === 'AbortError') {
    console.log('Request cancelled by user');
  }
}

6. Use the Right Model for the Task

Not every request needs the most powerful model. Match model capability to task complexity:

Simple classification, extraction, formatting: Claude Haiku / GPT-4o-mini / Gemini Flash
Code generation, analysis, general tasks: Claude Sonnet / GPT-4o / Gemini Pro
Complex reasoning, research, difficult problems: Claude Opus / o3 / Gemini 2.5 Pro

7. Structure Your Output

When you need structured data from an AI model, use JSON mode or explicit schema instructions:

const response = await client.chat.completions.create({
  model: "claude-sonnet-4-20250514",
  messages: [
    {
      role: "user",
      content: `Extract entities from this text and return JSON:
        {"people": [], "places": [], "dates": []}
        Text: ${inputText}`
    }
  ],
  response_format: { type: "json_object" }
});

8. Log and Monitor Everything

Production AI integrations need comprehensive observability:

Log request/response metadata (model, tokens, latency) but never log prompt content in production
Track token usage and costs per endpoint, user, and model
Set up alerts for error rate spikes and latency increases
Monitor rate limit headroom to anticipate scaling needs

Never log full prompt or response content in production systems. This creates privacy risks, compliance issues, and massive storage costs. Log metadata only — token counts, model name, latency, and status codes.

9. Implement Graceful Degradation

Design your application to function when the AI API is unavailable. Provide cached responses, fallback to simpler models, or disable AI features gracefully instead of showing error pages to users.

10. Centralize Your AI API Access

Whether you build an internal abstraction layer or use a managed relay service like claude4u.com, centralizing AI API access gives you a single place to manage authentication, implement caching, track costs, and switch between providers. This is especially critical for teams where multiple developers and services are making AI API calls.

Following these best practices will help you build AI integrations that are reliable, cost-effective, and maintainable. The difference between a demo and a production system is not the AI model — it is the engineering around it.

Get Started with 轻舟 AI

Stable, fast AI API relay — supports Claude, OpenAI, Gemini and more

AI API Best Practices