Microsoft takes on AI rivals with three new foundational models

Read the full articleMicrosoft takes on AI rivals with three new foundational models on TechCrunch

What Happened

MAI released models that can transcribe voice into text as well as generate audio and images after the group's formation six months ago.

Our Take

Look, Microsoft spun up a new group six months ago and dropped voice-to-text, audio gen, and image gen. That's scattered. They're trying to catch OpenAI and Google but doing it as three separate features instead of one coherent product.

The models might be decent but the strategy smells defensive. "We also do that" isn't a moat. And honestly, Microsoft's always been bad at shipping products fast — this announcements-over-shipping pattern is their default move.

What To Do

Wait to see if these integrate into real products (Copilot, Office) or stay as tech demos.

Builder's Brief

Who

teams building voice transcription and image generation pipelines on Azure

What changes

first-party Microsoft models reduce reliance on OpenAI API within Azure, potentially unlocking better pricing or SLA terms

When

weeks

Watch for

Azure AI Studio pricing differential between MAI models and OpenAI-hosted equivalents

What Skeptics Say

MAI's models enter a market where distribution determines adoption, not quality alone — and Microsoft's internal OpenAI dependency creates organizational tension that makes sustained model investment politically fraught. Six months is not enough runway to judge differentiation.

Cited By

TechCrunch Microsoft takes on AI rivals with three new foundational models

React

Newsletter

Get the weekly AI digest

The stories that matter, with a builder's perspective. Every Thursday.

Loading comments...