Gemini API Pricing Guide
Gemini API Pricing Guide: Free Tier, Pro, and Flash Model Costs
Understanding Gemini API pricing is essential for budgeting your AI application costs. Google offers a generous free tier for experimentation and development, with pay-as-you-go pricing for production workloads. This guide breaks down the costs for each Gemini model so you can choose the right balance of capability and cost for your use case.
Free Tier: What You Get at Zero Cost
Google AI Studio provides a free tier for all Gemini models, making it one of the most accessible AI APIs available. The free tier includes:
- Gemini 2.5 Flash: 30 requests per minute, 1,500 requests per day
- Gemini 2.5 Pro: 5 requests per minute, 25 requests per day
- Gemini 2.0 Flash: 15 requests per minute, 1,500 requests per day
Pay-As-You-Go Pricing
Once billing is enabled, you pay per token processed. Pricing differs between input tokens (what you send) and output tokens (what the model generates). Here are the current rates for the primary Gemini models:
Gemini 2.5 Pro Pricing
The most capable Gemini model, designed for complex reasoning, code generation, and multi-step tasks:
- Input: $1.25 per million tokens (up to 200K context), $2.50 per million tokens (over 200K)
- Output: $10.00 per million tokens
- Thinking tokens: $3.75 per million tokens
- Context window: 1 million tokens
Gemini 2.5 Flash Pricing
The best price-to-performance ratio in the Gemini family, ideal for most production applications:
- Input: $0.15 per million tokens (up to 200K context), $0.30 per million tokens (over 200K)
- Output: $0.60 per million tokens (non-thinking), $3.50 per million tokens (thinking)
- Context window: 1 million tokens
Gemini 2.0 Flash Pricing
Previous generation Flash model, still available and cost-effective:
- Input: $0.10 per million tokens
- Output: $0.40 per million tokens
- Context window: 1 million tokens
Cost Estimation Examples
To help you plan your budget, here are some real-world usage scenarios:
- Customer support chatbot (1,000 conversations/day): Using Gemini 2.5 Flash with average 500 input tokens and 200 output tokens per conversation costs approximately $0.20 per day, or around $6 per month.
- Code review assistant (200 reviews/day): Using Gemini 2.5 Pro with average 2,000 input tokens and 1,000 output tokens per review costs approximately $2.50 per day, or about $75 per month.
- Document summarization (500 documents/day): Using Gemini 2.5 Flash with 5,000 input tokens and 500 output tokens per document costs approximately $0.53 per day, or around $16 per month.
Context Caching: Reduce Costs for Repeated Contexts
If you send the same large context (like documentation or code repositories) with multiple requests, context caching can reduce your input costs significantly. Cached tokens are priced at a 75% discount compared to regular input tokens. You pay a small per-hour storage fee to keep the cache alive.
from google import genai
from google.genai import types
client = genai.Client(api_key="YOUR_API_KEY")
# Create a cache with system instructions and reference documents
cache = client.caches.create(
model="gemini-2.5-flash",
config=types.CreateCachedContentConfig(
system_instruction="You are a technical support assistant.",
contents=[large_documentation_text],
ttl="3600s"
)
)
# Use the cache for multiple requests
response = client.models.generate_content(
model="gemini-2.5-flash",
contents="How do I configure authentication?",
config=types.GenerateContentConfig(
cached_content=cache.name
)
)
Saving Money on Gemini API Costs
- Choose the right model: Use Flash for most tasks and reserve Pro for complex reasoning that actually needs it.
- Optimize prompts: Shorter, well-structured prompts reduce input token costs.
- Set max output tokens: Limit response length to avoid paying for unnecessary output.
- Use context caching: Cache large, repeated contexts instead of sending them with every request.
- Batch requests: Use the Batch API for non-time-sensitive tasks at a 50% discount.
Using a Relay Service for Cost Management
API relay services like claude4u.com provide detailed cost tracking and usage analytics across multiple API keys and models. This makes it easier to monitor spending, set budget alerts, and optimize which models handle which types of requests. The relay service can also automatically route requests to the most cost-effective model that meets your quality requirements.
Get Started with 轻舟 AI
Stable, fast AI API relay — supports Claude, OpenAI, Gemini and more
Sign Up Free
轻舟 AI