Maia 200: Behind Microsoft’s Custom AI Chip Breakthrough

Share this article
Share this article
Prioritise Us on Google
Microsoft's first in-house AI accelerator focused on inference, Maia 200 (Credit: Microsoft)
Microsoft's custom chip combines 3nm design, fast memory and software integration to advance efficient AI inference across Azure’s global footprint

Microsoft has unveiled Maia 200 – its first custom-built AI accelerator for inference – now operational across Azure data centres.

Fabricated using TSMC’s 3-nanometre technology and featuring a reengineered memory subsystem, the processor is optimised for large-scale AI workloads, delivering exceptional efficiency and performance per dollar.

Maia 200 incorporates 216GB of HBM3e memory with 7TB/s of bandwidth, 272MB of on-chip SRAM and advanced data movement engines engineered to keep large language models continuously engaged.

It is purpose-built for inference tasks such as token generation and synthetic data processing, supporting deployments with OpenAI’s GPT‑5.2 models as well as internal development by Microsoft’s Superintelligence division.

Scott Guthrie, Executive Vice President at Microsoft

Writing on LinkedIn, Scott Guthrie, Executive Vice President at Microsoft says: “As AI workloads get bigger and more complex, we are engineering the full stack from our custom-built silicon all the way to the data centre. Today we launched Maia 200, our next-generation AI accelerator chip.

“Maia 200 is an AI inference powerhouse: the most performant first‑party silicon from any hyperscaler, with three times the FP4 performance of Amazon’s third‑generation Trainium and FP8 performance above Google’s seventh‑generation TPU. It’s also the most efficient inference system we’ve ever deployed, delivering 30% better performance per dollar than the latest hardware in our fleet.

“Already running in our Iowa data centre with impressive throughput, Maia 200 is accelerating today’s multimodal, multicall AI workloads with faster inference and higher output at scale.”

Youtube Placeholder

Deployed and integrated at data centre scale

Maia 200 is now live in Microsoft’s US Central data centre region near Des Moines, Iowa, with deployment in the US West 3 region near Phoenix, Arizona set to follow.

Additional regions are set for rollout, with the accelerator fully integrated into Azure’s control plane and services.

It features native support for security, telemetry and diagnostics at both the chip and rack levels.

Each Maia 200 contains more than 140 billion transistors and delivers more than 10 petaFLOPS of 4-bit (FP4) performance and over 5 petaFLOPS at 8-bit (FP8), all within a 750W system‑on‑chip power envelope.

These specifications are tuned for low‑precision compute used in contemporary inference models, while maintaining flexibility for scaling to larger architectures.

To address AI performance bottlenecks from data movement, the chip incorporates a sophisticated memory system using narrow‑precision datatypes, dedicated DMA engines and an on‑die network‑on‑chip fabric.

This design enhances data flow, accelerating token processing and model input rates.

Maia 200 info-graphic from Microsoft Azure, showing the capability (Credit: Microsoft)

System-level innovation and network design

At the system level, Maia 200 debuts a two‑tier scale‑up network architecture built on standard Ethernet rather than proprietary interconnects.

This strategic design enables broad scalability and cost efficiency, while sustaining high performance and reliability.

Each Maia accelerator provides 2.8 TB/s of dedicated, bidirectional scale‑up bandwidth and supports collective operations across clusters of up to 6,144 accelerators.

Within each tray, four accelerators are directly linked through non‑switched connections to deliver high‑bandwidth, low‑latency local communication.

The same transport protocols apply across trays, racks and full clusters, establishing a unified and programmable fabric optimised for inference workloads.

This cohesive networking approach streamlines cluster management, reduces latency and power draw and lowers total cost of ownership across Microsoft’s Azure fleet.​​​​​​​

Maia 200 server blade (Credit: Microsoft)

Software stack and data centre readiness

Microsoft is unveiling the Maia software development kit (SDK) in tandem with the hardware deployment.

The SDK features native integration with PyTorch, a Triton compiler, optimised kernel libraries and access to a low‑level programming language purpose‑built for Maia.

With these tools, developers can seamlessly port models between hardware platforms or fine‑tune performance for specialised workloads.

To accelerate deployment, Microsoft validates its silicon and systems ahead of fabrication.

Maia 200 is designed using a pre‑silicon modelling environment that accurately simulates large language model workloads, enabling optimisation across silicon, networking and software long before production.

During this phase, the company also develops core data centre systems, including second‑generation liquid‑cooled heat exchanger units.

As a result, Maia 200 reaches production readiness within days of silicon delivery and is installed in data centres in less than half the time required for previous infrastructure programmes.

This co‑engineered strategy – spanning chip design, system software and data centre integration – enables Microsoft to achieve higher utilisation, lower cost per watt and faster global deployment at scale.

Company portals

Executives