Gemini API Pricing Guide

Gemini API Pricing Guide: Free Tier, Pro, and Flash Model Costs

Understanding Gemini API pricing is essential for budgeting your AI application costs. Google offers a generous free tier for experimentation and development, with pay-as-you-go pricing for production workloads. This guide breaks down the costs for each Gemini model so you can choose the right balance of capability and cost for your use case.

Free Tier: What You Get at Zero Cost

Google AI Studio provides a free tier for all Gemini models, making it one of the most accessible AI APIs available. The free tier includes:

The free tier is sufficient for prototyping, personal projects, and low-traffic applications. For production workloads, you will need to enable billing on your Google Cloud project to unlock pay-as-you-go pricing with higher rate limits.

Pay-As-You-Go Pricing

Once billing is enabled, you pay per token processed. Pricing differs between input tokens (what you send) and output tokens (what the model generates). Here are the current rates for the primary Gemini models:

Gemini 2.5 Pro Pricing

The most capable Gemini model, designed for complex reasoning, code generation, and multi-step tasks:

Gemini 2.5 Flash Pricing

The best price-to-performance ratio in the Gemini family, ideal for most production applications:

Gemini 2.0 Flash Pricing

Previous generation Flash model, still available and cost-effective:

Cost Estimation Examples

To help you plan your budget, here are some real-world usage scenarios:

Context Caching: Reduce Costs for Repeated Contexts

If you send the same large context (like documentation or code repositories) with multiple requests, context caching can reduce your input costs significantly. Cached tokens are priced at a 75% discount compared to regular input tokens. You pay a small per-hour storage fee to keep the cache alive.

from google import genai
from google.genai import types

client = genai.Client(api_key="YOUR_API_KEY")

# Create a cache with system instructions and reference documents
cache = client.caches.create(
    model="gemini-2.5-flash",
    config=types.CreateCachedContentConfig(
        system_instruction="You are a technical support assistant.",
        contents=[large_documentation_text],
        ttl="3600s"
    )
)

# Use the cache for multiple requests
response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="How do I configure authentication?",
    config=types.GenerateContentConfig(
        cached_content=cache.name
    )
)

Saving Money on Gemini API Costs

  1. Choose the right model: Use Flash for most tasks and reserve Pro for complex reasoning that actually needs it.
  2. Optimize prompts: Shorter, well-structured prompts reduce input token costs.
  3. Set max output tokens: Limit response length to avoid paying for unnecessary output.
  4. Use context caching: Cache large, repeated contexts instead of sending them with every request.
  5. Batch requests: Use the Batch API for non-time-sensitive tasks at a 50% discount.
Pricing can change over time. Always check the official Google AI pricing page for the most current rates before making budget commitments.

Using a Relay Service for Cost Management

API relay services like claude4u.com provide detailed cost tracking and usage analytics across multiple API keys and models. This makes it easier to monitor spending, set budget alerts, and optimize which models handle which types of requests. The relay service can also automatically route requests to the most cost-effective model that meets your quality requirements.

Get Started with 轻舟 AI

Stable, fast AI API relay — supports Claude, OpenAI, Gemini and more

Sign Up Free