Gemini 3.5 Flash Is Now Free: What You Get at Google I/O 2026
Google's fastest frontier model just became the default for every free user — here's what changed, what the limits are, and whether it's worth switching from ChatGPT.
By Free AI News Editorial · · · 8 min read
Every year, Google uses I/O to reset what "free AI" means. In 2026, that reset was Gemini 3.5 Flash — a model that not only replaces the previous generation's flagship on performance benchmarks, but does so at zero cost to everyday users. If you opened the Gemini app or used Search AI Mode after May 19, you were already running it without knowing. For developers, free access via Google AI Studio at 60 requests per minute represents one of the most generous no-credit-card-required API tiers in the market. This article breaks down everything that changed, what you can actually do with it, and where the catches are.
Gemini 3.5 Flash was announced alongside more than 100 product updates at Google I/O 2026, but the free tier rollout is arguably the most impactful change for the average user. This is the first time a "frontier-class" Gemini model — one that competes directly with ChatGPT and Claude on benchmark leaderboards — has been freely available to anyone without a Gemini Advanced subscription.
What exactly is Gemini 3.5 Flash and why did Google release it for free?
Gemini 3.5 Flash is the first model in Google's new 3.5 family, which the company describes as combining "frontier intelligence with action" — meaning it was purpose-built for agentic tasks, long-horizon reasoning, and coding, not just one-shot question answering. According to Google's official Gemini 3.5 announcement on blog.google, 3.5 Flash is now the default model for both the Gemini app and AI Mode in Search, globally. That single line is what makes this a free-tier story: the default is free, and the default is now frontier-class.
Why give it away? The same competitive logic that pushed OpenAI to put o3-mini on the free tier earlier this year. Google is in a race for daily active users, and models that only paid subscribers can access don't become the default AI for billions of people. By making 3.5 Flash the free standard, Google ensures that when users ask Google Search a complex question, they get a response from a model that genuinely competes with paid alternatives — reinforcing the habit of staying inside Google's ecosystem rather than switching to ChatGPT or Claude. You can learn more about how these competitive dynamics affect free tier availability on our Free Tier Tracker.
What do you actually get on the Gemini 3.5 Flash free tier?
The free tier is split into two distinct contexts: consumer apps and developer API access. Here's what each one offers:
- Gemini App (free account) — Gemini 3.5 Flash is the default model. You get unlimited conversational access at no cost on gemini.google.com. There are no publicly stated daily message caps for standard queries, though heavy sustained use may hit undisclosed throttle limits.
- Google Search AI Mode — Gemini 3.5 Flash powers the AI-generated responses in Search's AI Mode for all users, including those without a Google One subscription. Available globally as of the I/O 2026 launch date.
- Google AI Studio (developer free tier) — The API free tier allows 60 requests per minute with no credit card required. This is one of the most accessible developer free tiers available from any major AI provider in 2026, enabling prototyping and light production workloads at zero cost.
- Context window — 2 million tokens on the free tier. This is not a paid-only feature. You can submit very long documents, codebases, or conversation histories without hitting a truncation wall.
- Multimodal input — Images, audio, and text inputs are supported. Video understanding is available via the API but may be rate-limited differently from text-only queries.
- Built-in thinking — According to the Gemini 3.5 Flash model card from Google DeepMind, the model includes configurable thinking via a
thinking_levelparameter (low / medium / high), with Thought Preservation that retains reasoning across multi-turn conversations. Medium thinking is the default. - Knowledge cutoff — January 2026. Queries about events after that date require grounding via Google Search, which is supported natively.
What's not included on the free tier: access to Gemini Omni Flash (the video-generation multimodal sibling model) is currently limited to paid Gemini subscribers and select YouTube tools, with API access rolling out in the coming weeks. Gemini 3.5 Pro — the more powerful sibling — is still in testing as of late May 2026 and will be a paid-tier model when it launches.
How does Gemini 3.5 Flash perform compared to ChatGPT and Claude?
This is the question that matters most for anyone deciding whether to switch or supplement. Based on benchmarks circulating post-I/O, here's where Gemini 3.5 Flash lands:
| Benchmark | Gemini 3.5 Flash | Claude Sonnet 4.6 | GPT-5.5 |
|---|---|---|---|
| MCP Atlas (agentic) | 83.6% | 79.1% | — |
| Terminal-Bench 2.1 | 76.2% | — | — |
| SWE-Bench Verified (coding) | — | 79.6% | — |
| Speed (tokens/sec) | 289 | — | — |
| API input cost (per 1M tokens) | $1.50 | $3.00 | Higher |
The picture is nuanced. Gemini 3.5 Flash is the clear winner for agentic and long-horizon tasks — the MCP Atlas score is meaningful because it measures multi-step tool use, which is increasingly how people use AI assistants. It also runs at 289 tokens per second, roughly 4x faster than GPT-5.5 or Opus-tier models, which matters a lot when you're building apps or running batch workflows. For pure software engineering, Claude Sonnet 4.6 still holds the SWE-Bench lead. See our full model comparison guide for a deeper breakdown across more categories.
The pricing gap is equally notable. At $1.50 per million input tokens, Gemini 3.5 Flash costs exactly half what Claude Sonnet 4.6 charges. For high-volume API users — think content pipelines, customer service bots, or data processing — that's not a rounding error; it's a budget-halving decision with essentially no performance tradeoff for general workloads. The Verge's I/O 2026 roundup highlighted this pricing pressure as one of the most consequential shifts of the conference.
Who should actually switch to Gemini 3.5 Flash as their primary free AI tool?
Not everyone. The answer depends heavily on what you're using AI for day-to-day. Here are the clearest use-case fits and non-fits:
Strong fit — switch or add it to your rotation:
- Developers building agentic pipelines — The MCP Atlas performance and 60 req/min free API tier make this the cheapest credible choice for prototyping agents and automations in 2026.
- Researchers with very long documents — The 2M token context window (free) means you can process full books, large codebases, or multi-hour transcripts in one shot. No other major model provides this at zero cost.
- Google Workspace power users — Gemini 3.5 Flash integrates natively across Gmail, Docs, Sheets, and Search. If your workflow already lives inside Google, there's essentially no switching cost.
- Budget-conscious API consumers — At $1.50/M input tokens, it's the cheapest flagship-class API from a major US provider. For workloads under 200K context, it's the obvious default.
Weaker fit — probably stay where you are:
- Deep software engineering tasks — Claude Sonnet 4.6 still leads on SWE-Bench Verified. If you're doing serious code review or complex refactoring, Claude holds an edge.
- Users heavily invested in the OpenAI ecosystem — If you rely on GPT-specific plugins, fine-tunes, or the Assistants API, the friction of switching exceeds the performance gain for most use cases.
- Privacy-sensitive workflows — Google's data use terms for consumer products are broader than some organizations allow. Enterprise users should check their compliance requirements before routing sensitive data through the free consumer API.
Browse our full directory of free AI tools to compare Gemini 3.5 Flash alongside the other no-cost options available right now.
What are the real limits and catches you need to know about?
Every free tier has a catch. Here's an honest accounting of Gemini 3.5 Flash's:
- Knowledge cutoff: January 2026. Events after that date require Search grounding to be accurate. This is enabled by default in the consumer app but adds latency. For time-sensitive queries, always verify grounding is active.
- Rate limits on the free API. Google AI Studio's free tier allows 60 requests per minute — generous for prototyping, but insufficient for production traffic spikes. You'll need a paid key before scaling.
- Omni Flash is not fully free yet. The video-understanding sibling model is available in YouTube Shorts and Create for free users, but API access is still rolling out. If your use case is video-in-video-out, you're not fully there on the free tier.
- Gemini 3.5 Pro is coming and will be paid. The more powerful sibling is in testing and expected in June 2026 as a paid-tier model. If benchmarks hold, some tasks currently fine on 3.5 Flash may feel like they need the upgrade once Pro lands and users start comparing outputs.
- Thinking level is capped at medium by default. The
thinking_level: highmode is available but may consume more quota. Heavy reasoning workloads on the free tier may be throttled differently than standard completions. - The "free" consumer experience is ad-supported. Google Search AI Mode is free in the sense that you don't pay cash — but responses are adjacent to Google's ad ecosystem. For research or professional use, this context matters.
For real-time updates on when these limits change — or when a new model tier drops — bookmark our AI Free Tier Tracker, which we update whenever a major provider makes a change.
🔑 Key Takeaways
- Gemini 3.5 Flash launched on May 19, 2026 at Google I/O and is now the free default model in the Gemini app and Google Search AI Mode globally — no subscription required.
- The model runs at 289 tokens per second and scores 76.2% on Terminal-Bench 2.1, outperforming Gemini 3.1 Pro on coding and agentic tasks while being four times faster than comparable frontier models.
- Developers get 60 free requests per minute via Google AI Studio with no credit card needed, making it the most accessible flagship-class API free tier from a major US provider in 2026.
- Gemini 3.5 Flash leads Claude Sonnet 4.6 on agentic benchmarks (MCP Atlas: 83.6% vs 79.1%) and costs half as much via API ($1.50 vs $3.00 per million input tokens), making it the top cost-efficiency pick for general workloads.
- Key limits to watch: January 2026 knowledge cutoff, 60 req/min API cap on the free tier, Omni Flash video API not yet fully open, and Gemini 3.5 Pro (a paid upgrade) expected in June 2026.
Frequently Asked Questions
Is Gemini 3.5 Flash really free to use?
Yes. As of May 19, 2026, Gemini 3.5 Flash is the default model in the free tier of both the Gemini app and Google Search AI Mode. No subscription is required. Developers also get 60 free requests per minute through Google AI Studio with no credit card needed to get started.
How fast is Gemini 3.5 Flash compared to other models?
Google reports Gemini 3.5 Flash runs at 289 tokens per second and is approximately four times faster than comparable frontier models. It outperforms Gemini 3.1 Pro on coding and agentic benchmarks while running faster and costing less. On Terminal-Bench 2.1, it scores 76.2%, signaling strong capability for tool-use and long-horizon reasoning tasks.
What is the context window for Gemini 3.5 Flash?
Gemini 3.5 Flash supports a 2 million token context window — one of the largest available from any model in 2026. This means you can process entire codebases, long research documents, or hours of transcribed audio in a single prompt without truncation. This 2M context is available on the free tier, not just paid plans.
How does Gemini 3.5 Flash compare to Claude Sonnet 4.6?
Gemini 3.5 Flash leads Claude Sonnet 4.6 on MCP Atlas (83.6% vs 79.1%) and costs roughly half as much via API ($1.50 vs $3.00 per million input tokens). Claude Sonnet 4.6 leads on SWE-Bench Verified software engineering tasks (79.6%). For general agentic work and cost efficiency, Gemini 3.5 Flash wins; for deep coding tasks, Claude holds an edge.
Does Gemini 3.5 Flash replace Gemini 3.1 Pro?
In practical terms, yes. Gemini 3.5 Flash outperforms Gemini 3.1 Pro on coding and long-horizon agentic benchmarks while being faster and cheaper. Google has positioned it as the new performance baseline for the 3.5 family. Gemini 3.5 Pro is still in testing and expected as a paid-tier model in June 2026 for even more demanding tasks.
What is the API pricing for Gemini 3.5 Flash?
Via the Google AI API, Gemini 3.5 Flash is priced at $1.50 per million input tokens — roughly half the cost of Claude Sonnet 4.6 at $3.00 per million. The free tier in Google AI Studio gives developers 60 requests per minute at no cost, making it one of the most accessible options for building and prototyping AI applications in 2026.