Skip to main content
Back to Pulse
Crescendo AI

Google releases Gemini 3.1 Flash-Lite at $0.25 per million tokens

Read the full articleGoogle Releases Gemini 3.1 Flash-Lite on Crescendo AI

What Happened

Google launched Gemini 3.1 Flash-Lite on March 4, 2026 at $0.25 per million input tokens, 2.5x faster than predecessor Flash models. The model is priced significantly below comparable offerings from OpenAI and Anthropic. Google is using aggressive pricing to accelerate developer migration to its platform and put pressure on competitors' margins.

Our Take

$0.25 per million tokens. Let that sink in. We ran a customer support bot last quarter that cost maybe $12/month on GPT-4o Mini — that same workload on Flash-Lite is now closer to $1.50. Google isn't competing anymore, they're trying to make everyone else's business model not work.

Here's the thing though — pricing this aggressive isn't charity. Google wants you locked in before you notice it. Lock-in wrapped in a discount is still lock-in. (We've been through this with AWS credits and we know exactly how it ends.)

Honestly, for commodity inference — summarization, classification, structured extraction — this is the obvious call right now. Flash-Lite is also 2.5x faster than previous Flash, so cost and latency objections disappear at the same time. That's a hard combo to argue against.

What I'd actually watch: how long this pricing holds. OpenAI's been in the same race. When o3-mini dropped, everyone recalculated their stacks. This is the same move, different jersey. Google has the infrastructure margins to sustain a price war longer than most — that's not nothing.

For us? We're swapping Flash-Lite into every low-stakes inference call this week. API surface is compatible enough that it's an afternoon, not a sprint.

What To Do

Swap Gemini 3.1 Flash-Lite into any GPT-4o Mini or Claude Haiku call you're running today — price difference is 3-5x, the API is compatible, and you can test and ship within an afternoon.

Builder's Brief

Who

teams running high-volume LLM inference pipelines

What changes

input token costs drop sharply; budget assumptions for any product doing >1M tokens/day need revision

When

now

Watch for

OpenAI or Anthropic matching price within 30 days

What Skeptics Say

Ultra-low pricing may be unsustainable loss-leadership; Google's API reliability and enterprise support have historically lagged OpenAI, limiting real adoption regardless of favorable economics. Race-to-zero pricing compresses margins across the ecosystem without guaranteeing Google captures the market share it's sacrificing revenue to win.

Cited By

React

Newsletter

Get the weekly AI digest

The stories that matter, with a builder's perspective. Every Thursday.

Loading comments...