Microsoft: First AI Superfactory Redefines AI Workloads

Share this article
Share this article
Prioritise Us on Google
Rack level direct liquid cooling | Credit: Microsoft
Microsoft unveils the world’s first AI superfactory, called the Atlanta Fairwater data centre, that connect hundreds of thousands of Nvidia Blackwell GPUs

The race to train frontier AI models is pushing data centre infrastructure to its physical limits – and those limits are tightening fast.

At this point, even the speed of light determines how closely processors can sit together, while heat dissipation dictates how much power each rack can handle.

These are the immovable constraints shaping where and how AI systems come to life.

In response, Microsoft has launched its second Fairwater AI data centre in Atlanta, Georgia, linking it to the existing Wisconsin site via a dedicated AI wide area network.

What sets the facility apart is its design: built to host hundreds of thousands of Nvidia GB200 and GB300 graphics processing units (GPUs) under a unified flat network – moving away from the traditional cloud model toward a purpose-engineered platform for modern AI training.

Satya Nadella, CEO at Microsoft

Satya Nadella, CEO at Microsoft, says: “Today we announced our new Fairwater data centre in Atlanta, connected with our first Fairwater site in Wisconsin and our broader Azure footprint to create the world’s first AI superfactory.”

What defines an AI superfactory?

How does the AI superfactory work and what are its benefits?

The Atlanta Fairwater site demonstrates how AI workloads have evolved well beyond just training massive models. 

Satya says: “AI workloads have evolved beyond large-scale pre-training. Today, they encompass fine-tuning, reinforcement learning, synthetic data generation, evaluation pipelines and more.”

As a result, the Fairwater data centres use facility-wide liquid cooling systems featuring a closed-loop design that continuously reuses coolant with minimal water consumption.

Microsoft opens Atlanta Fairwater AI data centre with 140kW racks | Credit: Microsoft

Each rack in Microsoft's Fairwater data centres draws about 140kW of power, with entire rows consuming up to 1,360kW.

The closed-loop liquid cooling system reuses water continuously after the initial fill, which equals the annual water consumption of around 20 homes.

This liquid cooling shift is critical not just for sustainability but also for enabling much higher computing density – air cooling cannot remove heat efficiently at such high power levels, making liquid coolingthe only viable solution.

Microsoft’s two-storey building design minimises cable lengths between GPUs, reducing latency and allowing a more compact, performance-optimised layout for dense GPU packing.

Two-story networking architecture | Credit: Microsoft

Physical distance between GPUs in a cluster matters significantly when every GPU needs to communicate with every other GPU.

Placing racks in a three-dimensional configuration, as Microsoft does in its Fairwater data centres, reduces the length of cable runs between GPUs, directly improving latency and bandwidth by minimising the time signals take to travel, enhancing communication speed and efficiency within the cluster. 

Satya explains: “Fairwater’s two-story design and liquid cooling system lets us place racks in three dimensions and pack them with GPUs as densely as possible, minimising cable runs and improving latency and effective bandwidth.”

Each rack houses up to 72 Nvidia Blackwell GPUs, all connected through NVLink, Nvidia’s proprietary high-speed interconnect technology.

Densely populated GPU racks with app driven networking | Credit: Microsoft

Furthermore, the Blackwell GPUs support FP4, a four-bit floating point format that boosts operations per second while reducing memory needs, with each rack delivering 1.8TB of GPU-to-GPU bandwidth.

How Microsoft’s AI superfactory grid power delivers 99.99% availability

The Atlanta location was chosen partly for its access to utility power offering four nines availability – 99.99% uptime – at the cost typically associated with three nines, or 99.9% reliability.

This highly reliable grid connection enables Microsoft to eliminate traditional backup infrastructure, including on-site generation and uninterruptible power supplies, for the GPU fleet.

However, managing power at this scale introduces its own complexities and challenges, requiring advanced power-management technologies to stabilise grid demand and ensure continuous, efficient operation.​

Youtube Placeholder

Large-scale training jobs create power oscillations that can affect grid stability, so Microsoft has developed solutions with industry partners, including software that smooths demand by introducing supplementary workloads during low activity periods, and hardware where GPUs enforce power thresholds to stabilise consumption.

Satya says: “Each Fairwater DC can integrate hundreds of thousands of the latest Nvidia GPUs into a single coherent cluster. 

“This provides flexible infra that can support the full spectrum of workloads and ensure no GPU is left unnecessarily idle.”

Microsoft deployed more than 120,000 miles of new fibre across the US last year to build the dedicated AI Wide Area Network that connects these facilities, enabling near real-time data transfer and collaboration between AI superfactory sites.

Satya says: “Every Fairwater DC will connect through our continent-spanning AI WAN to prior generations of AI supercomputers, forming a truly fungible pool of compute.”

The company is bringing more than 100,000 GB300 GPUs online this quarter for inference across its broader fleet. 

Satya concludes: “For us, it’s all about turning every gigawatt into the maximum number of useful tokens. Not every GW is created equal.”

Company portals

Executives