Nvidia GB300 NVL72 Ensures Consistent Data Centre AI Power

Nvidia is developing advanced energy management capabilities for its platform to tackle the energy challenges faced by massive AI training workloads.
Operating thousands of GPUs in sync is causing data centres to induce power fluctuations, thereby stressing grid infrastructure.
The innovative GB300 NVL72 is being introduced with integrated hardware and software provisions aimed to alleviate power spikes, reducing peak grid demand by up to 30%.
The new attributes present in the GB200 NVL72 platform aim to allow data centre managers to curtail the over-provisioning of power infrastructure, promoting reduced operational costs and enhanced rack density within allocated budget constraints.
Representing a substantial leap in performance for AI reasoning and agentic tasks, the Nvidia GB300 NVL72 heralds up to a tenfold increase in user responsiveness, a fivefold enhancement in throughput per watt over the prior-gen Nvidia Hopper architecture and a 50x surge in output for reasoning model inference.
In July, CoreWeave became the inaugural cloud provider to launch the platform.
"CoreWeave is constantly working to push the boundaries of AI development further, deploying the bleeding-edge cloud capabilities required to train the next generation of AI models," said Peter Salanki, Co-Founder and Chief Technology Officer at CoreWeave.
"We're proud to be the first to stand up this transformative platform and help innovators prepare for the next exciting wave of AI."
AI training's impact on grid stability
Traditional data centres operate varied workloads across systems, facilitating a balance of power demand.
In contrast, AI training involves numerous GPUs executing identical calculations concurrently, leading to abrupt fluctuations between high and low power states.
These activities necessitate grid response to rapid load modifications, which can extend up to 90 minutes with traditional generation resources.
Such swift transitions may disrupt electric resonance, transformer stability and voltage for other grid operators.
Nvidia’s engineers use heatmaps and time series charts to illustrate this pattern, with GPUs elevating power at job initiation, exhibiting rapid variations during processing and concluding abruptly.
To counter this, Nvidia has introduced a concerted suite of features aimed at smoothing AI workload power profiles across three phases: ramp-up, steady-state and ramp-down.
Hardware and software for power smoothing
At workload start, Nvidia’s new power cap feature regulates GPU consumption by progressively enhancing power limits aligned with grid ramp tolerances.
This mitigates sudden surges that could unsettle the supply.
Upon training completion, the GB300 platform utilises a GPU burn mechanism to sustain power consumption temporarily, permitting the system to taper off gradually.
This method allows the dissipation of power within controlled parameters and disengages immediately upon a new workload initiation, or, if no new tasks arise, reduces power following preset limits.
For rapid transients during steady states, Nvidia’s updated power shelves incorporate energy storage as electrolytic capacitors, charging during low demand and discharging during peaks.
This flattens the power curve exposed to the grid, creating a substantially smoother AC power profile over the previous GB200 PSU, as Nvidia's internal assessments corroborate.
Identical AI training on both GB200 and GB300 racks indicate the former replicates DC level spikes causing instability at AC grid input, while the latter reduces grid-facing power peaks by 30%, maintaining identical GPU output.
Continuous collaboration with LITEON Technology helped Nvidia optimise the GB300 power shelf’s design, with half its volume now comprising energy storage elements, facilitating 65 joules per GPU.
A management controller orchestrates real-time power storage and release, safeguarding grid stability.
Reducing provisioning for AI-scale facilities
Historically, data centres provision power for peak loads, necessitating peak GPU support despite transient occurrences.
By levelling peaks, the GB300’s smoothing capability enables infrastructure to align closer to actual usage.
This grants operators options: increasing rack numbers within existing power budgets or reducing overall power allocation for deployments.
These smoothing strategies, executed at shelf and rack levels in both GB200 and GB300 NVL72 systems, employ multiple shelves per rack to balance loads.
Nvidia’s SMI tool or Redfish protocol allows precise configuration of GPU idle times before ramp-down and target ramp rates.
These innovations signify Nvidia’s proactive response to AI infrastructure’s energy requisites, integrating storage and intelligent power controls within GB300 NVL72, empowering data centres to adapt to increasing model sizes without overwhelming power abilities.
These systems provide a rack-level, rapid transient power smoothing strategy.


