Gemini API Pricing Guide

Gemini API Pricing Guide: Free Tier, Pro, and Flash Model Costs

Understanding Gemini API pricing is essential for budgeting your AI application costs. Google offers a generous free tier for experimentation and development, with pay-as-you-go pricing for production workloads. This guide breaks down the costs for each Gemini model so you can choose the right balance of capability and cost for your use case.

Free Tier: What You Get at Zero Cost

Google AI Studio provides a free tier for all Gemini models, making it one of the most accessible AI APIs available. The free tier includes:

Gemini 2.5 Flash: 30 requests per minute, 1,500 requests per day
Gemini 2.5 Pro: 5 requests per minute, 25 requests per day
Gemini 2.0 Flash: 15 requests per minute, 1,500 requests per day

The free tier is sufficient for prototyping, personal projects, and low-traffic applications. For production workloads, you will need to enable billing on your Google Cloud project to unlock pay-as-you-go pricing with higher rate limits.

Pay-As-You-Go Pricing

Once billing is enabled, you pay per token processed. Pricing differs between input tokens (what you send) and output tokens (what the model generates). Here are the current rates for the primary Gemini models:

Gemini 2.5 Pro Pricing

The most capable Gemini model, designed for complex reasoning, code generation, and multi-step tasks:

Input: $1.25 per million tokens (up to 200K context), $2.50 per million tokens (over 200K)
Output: $10.00 per million tokens
Thinking tokens: $3.75 per million tokens
Context window: 1 million tokens

Gemini 2.5 Flash Pricing

The best price-to-performance ratio in the Gemini family, ideal for most production applications:

Input: $0.15 per million tokens (up to 200K context), $0.30 per million tokens (over 200K)
Output: $0.60 per million tokens (non-thinking), $3.50 per million tokens (thinking)
Context window: 1 million tokens

Gemini 2.0 Flash Pricing

Previous generation Flash model, still available and cost-effective:

Input: $0.10 per million tokens
Output: $0.40 per million tokens
Context window: 1 million tokens

Cost Estimation Examples

To help you plan your budget, here are some real-world usage scenarios:

Customer support chatbot (1,000 conversations/day): Using Gemini 2.5 Flash with average 500 input tokens and 200 output tokens per conversation costs approximately $0.20 per day, or around $6 per month.
Code review assistant (200 reviews/day): Using Gemini 2.5 Pro with average 2,000 input tokens and 1,000 output tokens per review costs approximately $2.50 per day, or about $75 per month.
Document summarization (500 documents/day): Using Gemini 2.5 Flash with 5,000 input tokens and 500 output tokens per document costs approximately $0.53 per day, or around $16 per month.

Context Caching: Reduce Costs for Repeated Contexts

If you send the same large context (like documentation or code repositories) with multiple requests, context caching can reduce your input costs significantly. Cached tokens are priced at a 75% discount compared to regular input tokens. You pay a small per-hour storage fee to keep the cache alive.

from google import genai
from google.genai import types

client = genai.Client(api_key="YOUR_API_KEY")

# Create a cache with system instructions and reference documents
cache = client.caches.create(
    model="gemini-2.5-flash",
    config=types.CreateCachedContentConfig(
        system_instruction="You are a technical support assistant.",
        contents=[large_documentation_text],
        ttl="3600s"
    )
)

# Use the cache for multiple requests
response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="How do I configure authentication?",
    config=types.GenerateContentConfig(
        cached_content=cache.name
    )
)

Saving Money on Gemini API Costs

Choose the right model: Use Flash for most tasks and reserve Pro for complex reasoning that actually needs it.
Optimize prompts: Shorter, well-structured prompts reduce input token costs.
Set max output tokens: Limit response length to avoid paying for unnecessary output.
Use context caching: Cache large, repeated contexts instead of sending them with every request.
Batch requests: Use the Batch API for non-time-sensitive tasks at a 50% discount.

Pricing can change over time. Always check the official Google AI pricing page for the most current rates before making budget commitments.

Using a Relay Service for Cost Management

API relay services like claude4u.com provide detailed cost tracking and usage analytics across multiple API keys and models. This makes it easier to monitor spending, set budget alerts, and optimize which models handle which types of requests. The relay service can also automatically route requests to the most cost-effective model that meets your quality requirements.

Get Started with 轻舟 AI

Stable, fast AI API relay — supports Claude, OpenAI, Gemini and more

Gemini API Pricing Guide