Microsoft Just Launched Three In-House AI Models — and It's Coming Directly for OpenAI and Google

For years, Microsoft has been the company that sells other people’s AI. It built the cloud infrastructure for OpenAI, distributed GPT models through Azure, and packaged everything into Copilot. That chapter isn’t over — but a new one has clearly begun. On Thursday, Microsoft unveiled three foundational AI models built entirely in-house, marking the clearest signal yet that the $3 trillion software giant intends to compete at the frontier, not just sit beneath it.

Three New Models, Three Competitive Battlegrounds

The three models — MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2 — are available immediately through Microsoft Foundry and a new MAI Playground. Each targets a different high-value AI capability: speech-to-text transcription, voice generation, and image creation.

MAI-Transcribe-1 is the headliner. Microsoft claims it achieves the lowest average Word Error Rate on the FLEURS benchmark across the top 25 languages by Microsoft product usage, averaging just 3.8% WER. According to Microsoft’s own benchmarks, it outperforms OpenAI’s Whisper-large-v3 across all 25 languages and beats Google’s Gemini 3.1 Flash on 22 of 25. Batch transcription speed is reportedly 2.5 times faster than Microsoft’s existing Azure Fast offering. The model is already being tested inside Copilot’s Voice mode and Microsoft Teams.

MAI-Voice-1 generates 60 seconds of natural-sounding speech in a single second, preserves speaker identity across long-form content, and supports custom voice creation from just a few seconds of sample audio. It’s priced at $22 per million characters. MAI-Image-2 debuted as a top-three model on the Arena.ai leaderboard with at least 2x faster generation than its predecessor, priced at $5 per million input tokens and $33 per million image output tokens. WPP, one of the world’s biggest advertising groups, is already building with it at scale.

The Contract Change That Made All of This Possible

To understand why this launch matters, you need to know what changed legally. Until October 2025, Microsoft was contractually barred from independently pursuing artificial general intelligence. The original 2019 deal with OpenAI gave Microsoft a licence to its models in exchange for cloud infrastructure — but prohibited Microsoft from going it alone at the frontier.

When OpenAI began expanding its compute relationships beyond Microsoft — striking deals with SoftBank and others — Microsoft renegotiated. The revised agreement freed Microsoft to build its own frontier models while retaining licence rights to everything OpenAI develops through 2032. Mustafa Suleyman, Microsoft’s AI chief, who formed the company’s superintelligence team just six months ago, described the shift plainly: the new terms meant Microsoft could finally pursue its own path to superintelligence independently.

“Nothing’s changing with the OpenAI partnership. We will be in partnership with them at least until 2032 and hopefully a lot longer. They have been a phenomenal partner to us.”

— Mustafa Suleyman, Microsoft AI Chief

Small Teams, Massive Results — How Microsoft Built These Models

Perhaps the most striking detail of Thursday’s launch is how lean the teams behind these models actually were. Suleyman revealed that the audio model was built by just 10 people, and the image team is similarly small. The efficiency gains, he said, came almost entirely from architecture choices and data quality — not headcount.

“My philosophy has always been that we need fewer people who are more empowered. So we operate an extremely flat structure.”

— Mustafa Suleyman, Microsoft AI Chief

This challenges a prevailing assumption in frontier AI — that you need thousands of researchers and enormous headcount costs to compete. Meta has reportedly offered individual researchers compensation packages of $100 million to $200 million. Microsoft is apparently getting comparable results with a fraction of that investment in people. If the benchmarks hold up, the implications for the economics of AI development are significant.

Suleyman also painted a vivid picture of how his team actually works — less like a traditional Microsoft engineering division and more like a startup trading floor. Round tables instead of desks. Laptops instead of large monitors. Teams of 50 to 60 people coding side by side, morning to night. He described it as “vibe coding” at an institutional scale.

Pricing Designed to Undercut Everyone Else in the Market

The models’ pricing is openly aggressive. Suleyman confirmed that the goal is to be cheaper than every major cloud competitor — Amazon, Google, and others — a deliberate decision that makes strategic sense for a company that can spread model development costs across its vast enterprise customer base.

Microsoft’s stock has dropped roughly 17% year-to-date, and investors have been pressing hard for evidence that hundreds of billions in AI infrastructure spending will generate returns. These models offer a two-pronged answer: they reduce Microsoft’s internal costs for running Teams, Copilot, Bing, and PowerPoint, while simultaneously giving enterprise developers a pricing reason to choose Microsoft Foundry over rivals. The goal, as Suleyman put it, is to deliver the cost efficiencies needed to serve AI workloads at the scale the coming years will demand.

A Frontier Language Model Is Coming — Microsoft Plans to Be “Completely Independent”

Transcription, voice, and image generation are just the opening moves. When asked directly whether Microsoft would build a large language model to compete with GPT at the frontier level, Suleyman was unambiguous.

“We absolutely are going to be delivering state of the art models across all modalities. Our mission is to make sure that if Microsoft ever needs it, we will be able to provide state of the art at the best efficiency, the cheapest price, and be completely independent.”

— Mustafa Suleyman, Microsoft AI Chief

Suleyman described a multi-year roadmap being built out with full backing from CEO Satya Nadella, who flew in to meet the superintelligence team in person to lay out the compute roadmap for the next two to four years. Building a competitive frontier LLM is a fundamentally harder challenge than the specialised models launched Thursday — but Microsoft now has the contractual freedom, the organisational mandate, and the early track record to make that ambition credible.

What This Means for the Rest of the AI Industry

Microsoft entering the model development race as a genuine competitor — not just a distributor — reshapes the competitive landscape in meaningful ways. OpenAI loses some of its exclusive grip on Microsoft’s platform strategy. Google faces a well-capitalised rival now matching it on benchmarks for speech and images. Voice AI startups like ElevenLabs face Microsoft’s distribution advantages on top of competitive pricing. And the broader enterprise AI market now has another major player offering first-party models with the compliance, data provenance, and governance assurances that regulated industries require.

Three models. Ten engineers each. Half the GPUs of the competition. For a company that spent years as the infrastructure layer beneath other people’s AI breakthroughs, Thursday’s launch is a clear message: Microsoft is done waiting in the wings.

Author

Lucienne

Lucienne Albrecht is Luxe Chronicle’s wealth and lifestyle editor, celebrated for her elegant perspective on finance, legacy, and global luxury culture. With a flair for blending sophistication with insight, she brings a distinctly feminine voice to the world of high society and wealth.