Google launches Gemini 3.1 Ultra with 2M-token context

Read the full articleGoogle Launches Gemini 3.1 Ultra on Crescendo AI

What Happened

Google released Gemini 3.1 Ultra on March 20, 2026, with a 2-million-token context window, doubling the capacity of any current competitor. The model supports native multimodal reasoning across text, images, and audio without intermediate conversion or transcription steps. Sandboxed code execution is included natively, positioning the model for agentic and developer-facing workflows.

Our Take

2 million tokens. I had to read that twice. That's roughly 1,500 average codebases shoved into a single context window — not chunked, not summarized, just... there.

Here's the thing: this makes our entire RAG setup for document-heavy clients look like unnecessary overengineering. We've spent real hours on chunking strategies and embedding pipelines. With 2M tokens you can just send the whole thing. That's not a small deal.

The native multimodal piece is what's getting buried under the context headline. No transcription step, no image-to-text preprocessing — it reasons across images, audio, and text natively. That quietly kills a whole class of pipelines we've been bolting together.

Honestly? I'm skeptical of the latency and cost story at 2M tokens. Google hasn't published per-token pricing yet (classic), and inference at that scale is never free. Don't architect anything around this until you've run real benchmarks.

Sandboxed code execution is them going hard at the agentic use case. We're going to test it against our current workflow this week — if it holds up, some of what we've built in the last six months is getting simplified.

What To Do

Pick one RAG pipeline you built in the last year and rerun it as a direct 2M-context prompt on Gemini 3.1 Ultra — measure latency, cost per query, and accuracy against your current chunked retrieval setup before committing to any architectural changes.

Builder's Brief

Who

Teams building long-document analysis, legal review, or codebase-scale context applications

What changes

Eliminates chunking and retrieval overhead for workloads under 2M tokens, collapsing RAG pipeline complexity for a specific use-case tier

When

now

Watch for

Gemini API pricing per 1M tokens at context lengths above 500K — cost curve determines actual adoption

What Skeptics Say

2M-token context is a benchmark trophy; production latency and cost at full-context utilization remain prohibitive for most applications, and competitors will close the gap within two quarters.

Cited By

Crescendo AI Google Launches Gemini 3.1 Ultra

React

Newsletter

Get the weekly AI digest

The stories that matter, with a builder's perspective. Every Thursday.

Loading comments...