Gemini API Error Troubleshooting
Gemini API Error Codes Guide: Troubleshooting 400, 403, 429, 500, and 503 Errors
When working with the Gemini API, encountering errors is inevitable. Understanding what each error code means and how to resolve it can save you hours of debugging. This comprehensive guide covers every common Gemini API error with explanations, causes, and solutions.
400 Bad Request — INVALID_ARGUMENT
A 400 error means your request is malformed or contains invalid parameters. Common causes include:
- Invalid model name: Check that you are using a valid model identifier like
gemini-2.5-proorgemini-2.5-flash. - Exceeding context length: Your input exceeds the model's maximum context window.
- Invalid content format: The request body does not match the expected schema.
- Unsupported parameters: Passing parameters that the chosen model does not support.
# Common fix: validate your request before sending
from google import genai
client = genai.Client(api_key="YOUR_API_KEY")
# Count tokens before sending to avoid context length errors
token_count = client.models.count_tokens(
model="gemini-2.5-flash",
contents="Your prompt here"
)
print(f"Token count: {token_count.total_tokens}")
403 Forbidden — PERMISSION_DENIED
A 403 error means your API key does not have permission to access the requested resource. Troubleshoot by checking:
- API key validity: Ensure your key has not been revoked or expired.
- API enabled: Verify that the "Generative Language API" is enabled in your Google Cloud project.
- Key restrictions: If you have applied IP or referrer restrictions, make sure your current environment matches.
- Regional restrictions: Some Gemini features may not be available in all regions.
- Billing status: Some models or features require billing to be enabled on your project.
# Test your API key
curl -s "https://generativelanguage.googleapis.com/v1beta/models?key=YOUR_API_KEY" | head -20
429 Too Many Requests — RESOURCE_EXHAUSTED
A 429 error means you have exceeded your rate limit or quota. The Gemini API enforces limits at multiple levels:
- Requests per minute (RPM): The number of API calls per minute.
- Tokens per minute (TPM): The total tokens (input + output) processed per minute.
- Requests per day (RPD): Daily request limit, primarily on the free tier.
Solutions for 429 errors:
- Implement rate limiting in your client code to stay within quotas.
- Use exponential backoff when retrying after a 429.
- Enable billing to move from free tier limits to higher pay-as-you-go quotas.
- Request a quota increase through the Google Cloud Console if your pay-as-you-go limits are insufficient.
- Use a relay service that distributes requests across multiple API keys.
import time
def call_with_rate_limit(client, prompt, rpm_limit=25):
"""Simple rate limiter for Gemini API calls."""
min_interval = 60.0 / rpm_limit
try:
response = client.models.generate_content(
model="gemini-2.5-flash",
contents=prompt
)
time.sleep(min_interval) # Enforce minimum interval
return response
except Exception as e:
if "429" in str(e):
retry_after = 60 # Default wait time
print(f"Rate limited. Waiting {retry_after}s...")
time.sleep(retry_after)
return call_with_rate_limit(client, prompt, rpm_limit)
raise
500 Internal Server Error — INTERNAL
A 500 error indicates an unexpected problem on Google's servers. These are typically transient and resolve on their own. If you encounter frequent 500 errors:
- Retry with backoff: Most 500 errors are temporary. A simple retry often succeeds.
- Simplify your request: Very complex or large requests are more likely to trigger server errors.
- Check service status: Visit the Google Cloud Status Dashboard for ongoing incidents.
- Try a different model: If one model consistently returns 500 errors, another model may be unaffected.
503 Service Unavailable — MODEL_CAPACITY_EXHAUSTED
A 503 error means the model does not have enough server capacity to process your request. This is different from 429 (your personal rate limit) — 503 affects all users. Common during peak demand periods, especially for Gemini 2.5 Pro.
- Retry with exponential backoff and jitter to avoid thundering herd problems.
- Fall back to a lighter model like Gemini 2.5 Flash during capacity crunches.
- Switch regions if using Vertex AI — capacity varies by data center.
- Schedule non-urgent work for off-peak hours.
Best Practices for Error Handling
Build resilient applications by following these patterns:
- Always implement retry logic with exponential backoff for 429, 500, and 503 errors.
- Log error details including the error code, message, and request metadata for debugging.
- Set reasonable timeouts to prevent hanging requests from consuming resources.
- Use circuit breakers to stop sending requests when error rates are high.
- Monitor error rates and set up alerts for unusual spikes.
Simplify Error Handling with a Relay Service
A relay service like claude4u.com handles many of these errors transparently. It implements automatic retry logic, rate limit management across multiple API keys, and model failover — so your application receives fewer errors and you write less error handling code. The relay service also provides detailed error logging and analytics to help you identify patterns and optimize your usage.
Get Started with 轻舟 AI
Stable, fast AI API relay — supports Claude, OpenAI, Gemini and more
Sign Up Free
轻舟 AI