Maia 200: Behind Microsoftâs Custom AI Chip Breakthrough

Microsoft has unveiled Maia 200 â its first custom-built AI accelerator for inference â now operational across Azure data centres.
Fabricated using TSMCâs 3-nanometre technology and featuring a reengineered memory subsystem, the processor is optimised for large-scale AI workloads, delivering exceptional efficiency and performance per dollar.
Maia 200 incorporates 216GB of HBM3e memory with 7TB/s of bandwidth, 272MB of on-chip SRAM and advanced data movement engines engineered to keep large language models continuously engaged.
It is purpose-built for inference tasks such as token generation and synthetic data processing, supporting deployments with OpenAIâs GPTâ5.2 models as well as internal development by Microsoftâs Superintelligence division.
Writing on LinkedIn, Scott Guthrie, Executive Vice President at Microsoft says: âAs AI workloads get bigger and more complex, we are engineering the full stack from our custom-built silicon all the way to the data centre. Today we launched Maia 200, our next-generation AI accelerator chip.
âMaia 200 is an AI inference powerhouse: the most performant firstâparty silicon from any hyperscaler, with three times the FP4 performance of Amazonâs thirdâgeneration Trainium and FP8 performance above Googleâs seventhâgeneration TPU. Itâs also the most efficient inference system weâve ever deployed, delivering 30% better performance per dollar than the latest hardware in our fleet.
âAlready running in our Iowa data centre with impressive throughput, Maia 200 is accelerating todayâs multimodal, multicall AI workloads with faster inference and higher output at scale.â
Deployed and integrated at data centre scale
Maia 200 is now live in Microsoftâs US Central data centre region near Des Moines, Iowa, with deployment in the US West 3 region near Phoenix, Arizona set to follow.
Additional regions are set for rollout, with the accelerator fully integrated into Azureâs control plane and services.
It features native support for security, telemetry and diagnostics at both the chip and rack levels.
Each Maia 200 contains more than 140 billion transistors and delivers more than 10 petaFLOPS of 4-bit (FP4) performance and over 5 petaFLOPS at 8-bit (FP8), all within a 750W systemâonâchip power envelope.
These specifications are tuned for lowâprecision compute used in contemporary inference models, while maintaining flexibility for scaling to larger architectures.
To address AI performance bottlenecks from data movement, the chip incorporates a sophisticated memory system using narrowâprecision datatypes, dedicated DMA engines and an onâdie networkâonâchip fabric.
This design enhances data flow, accelerating token processing and model input rates.
System-level innovation and network design
At the system level, Maia 200 debuts a twoâtier scaleâup network architecture built on standard Ethernet rather than proprietary interconnects.
This strategic design enables broad scalability and cost efficiency, while sustaining high performance and reliability.
Each Maia accelerator provides 2.8âŻTB/s of dedicated, bidirectional scaleâup bandwidth and supports collective operations across clusters of up to 6,144 accelerators.
Within each tray, four accelerators are directly linked through nonâswitched connections to deliver highâbandwidth, lowâlatency local communication.
The same transport protocols apply across trays, racks and full clusters, establishing a unified and programmable fabric optimised for inference workloads.
This cohesive networking approach streamlines cluster management, reduces latency and power draw and lowers total cost of ownership across Microsoftâs Azure fleet.âââââââ
Software stack and data centre readiness
Microsoft is unveiling the Maia software development kit (SDK) in tandem with the hardware deployment.
The SDK features native integration with PyTorch, a Triton compiler, optimised kernel libraries and access to a lowâlevel programming language purposeâbuilt for Maia.
With these tools, developers can seamlessly port models between hardware platforms or fineâtune performance for specialised workloads.
To accelerate deployment, Microsoft validates its silicon and systems ahead of fabrication.
Maia 200 is designed using a preâsilicon modelling environment that accurately simulates large language model workloads, enabling optimisation across silicon, networking and software long before production.
During this phase, the company also develops core data centre systems, including secondâgeneration liquidâcooled heat exchanger units.
As a result, Maia 200 reaches production readiness within days of silicon delivery and is installed in data centres in less than half the time required for previous infrastructure programmes.
This coâengineered strategy â spanning chip design, system software and data centre integration â enables Microsoft to achieve higher utilisation, lower cost per watt and faster global deployment at scale.


