AWS: Building the Infrastructure Layer for Enterprise AI

Share this article
David Brown, Vice President of Compute and Machine Learning at AWS. Pic: AWS
Compute and ML Services VP David Brown on why AWS is betting on custom silicon, how power constraints shape strategy and what slows enterprise adoption

Three years ago, AWS made decisions about power procurement that will determine what the company can offer customers in 2027.

Not about which features to build or which markets to enter, but about megawatts: how many to procure, where to locate them and how much computing infrastructure those megawatts could support.

Get it wrong, and AWS turns away customers. Get it right, and the company maintains its lead in the cloud computing market.

“We had to make decisions three years ago about the power we’re going to have in ’26 and ‘27,” says David Brown, who runs all compute and machine learning services at AWS.

“Amazon.com is a supply chain company, and we brought a lot of that into our DNA.”

David’s responsibilities span EC2, containers, Lambda, Bedrock and SageMaker – the foundational layer that determines whether AI workloads run efficiently or expensively.

AWS re:Invent 2025

During re:Invent week in Las Vegas, where AWS announced both its Trainium3 accelerator and Graviton5 processor, Technology Magazine spoke with David on chip development timelines, the physics constraints of distributing training clusters and why controlling the full stack from chip design through data centre deployment creates advantages that merchant silicon alone cannot deliver.

From announcement to deployment in ten months

Project Rainier shows how quickly AWS can move custom silicon from announcement to production.

The company revealed the project in December 2024 at re:Invent. By October 2025, the infrastructure was running. The facility now houses more than 500,000 Trainium 2 chips and will hit one million by year end.

David says execution speed matters as much as scale.

“This is very different to some of the other announcements you’ve seen where it’s much longer term.”

AWS operates in a market where model architectures evolve faster than traditional hardware procurement cycles.

But speed means solving problems most chip designers never face.

David Brown speaking in a keynote session at AWS re:Invent 2025

“It’s not only about how quickly you can make the silicon – and you have to make it well, because any mistake means you lose a generation,” he says. “But it’s also about taking that silicon, putting it into a server, getting that server into a data centre, setting the network up so you have this incredibly large training cluster.”

AWS designs silicon knowing exactly which servers it will populate, which racks those servers will occupy, how network topology will connect them. That end-to-end ownership compresses timelines that chip manufacturers typically measure in quarters.

Trainium3 delivers 4x performance improvements over Trainium 2 with 40% better performance per watt: important when power limits how much computing fits in a facility.

Memory bandwidth, meanwhile, has increased 50%.

Each generation incorporates feedback from watching how customers use the previous version. Which workloads run well, which don’t, what needs fixing.

David says AWS expects to scale Trainium3 faster than Trainium2.

The power problem that reshapes everything

Project Rainier will scale to 2.2GW. That’s more than two nuclear power stations feeding a single facility, but it’s not enough.

David is direct about it. “These large language model providers are going to need more than 2.2 gigawatts, so then you start to look at other sites that are relatively close by.”

AWS has facilities in Indiana and Mississippi, with more locations planned.

The distributed approach introduces latency: training clusters that span multiple locations need different software optimisation than clusters in a single building. Physics, rather than ambition, becomes the limit.

A large-scale AI data centre campus developed by AWS as part of Project Rainier

Power availability now determines where AWS can build data centres. The company uses Local Zones – technology originally built for media customers who needed low-latency access – to position infrastructure where power exists rather than where traditional regional boundaries might suggest.

But the real challenge isn’t finding power today. It’s forecasting what will be needed in three years while accounting for model efficiency improvements that may or may not happen.

If models become twice as efficient, that effectively doubles available capacity without adding more power.

Bet too heavily on efficiency gains and AWS turns customers away. Bet too conservatively and the company wastes money on unused capacity.

Graviton5 and watching customers at scale

SAP saw 60% performance gains migrating from Graviton4 to Graviton5. The improvement came from two architectural changes that AWS only understood by deploying the previous generation at scale and watching how customers used it.

Graviton4 achieved 192 cores using two processors connected by an interconnect.

That design worked for many workloads but introduced latency penalties when processes needed memory from the opposite side of the CPU. Graviton5 consolidates those cores into a single die.

“We really liked the 192 cores in Graviton4. We didn’t like the fact that it was on two separate dies,” David says. “It was great for databases and analytics, but it would be amazing if we could bring it together.”

Inside an AWS data centre designed for AI workloads

The cache needed rethinking too. When AWS looked at workload patterns, the data showed that 192 cores were starving for L3 cache. Graviton5 increases L3 cache by 3x, giving each core access to 2.6x more cache than Graviton4 provided.

“When you increase the core count in CPUs, you don't have enough cache for the cores,” David explains.

The 25% performance improvement over Graviton4 exceeds typical CPU generation advances. “In the CPU space normally, you’re shooting for 10% to 15%, and we’re hitting tens of percentage points better,” he notes.

AWS now runs more than 50% of new CPU capacity on Graviton, with 98% of the top 1,000 EC2 customers using the processors.

That installed base generates the feedback that drives each new generation. The cycle now repeats every 12 to 18 months, with each iteration informed by which workloads run well and which don’t.

Customer choice over exclusivity

When Anthropic announced recently it would use Google TPUs, questions emerged about AWS’ partnership with the AI company.

“AWS remains their preferred training partner and preferred cloud provider,” David says. “We’re very pleased with where the relationship is and the work they've been doing on Trainium2 with Project Rainier.”

The partnership continues because AWS and Anthropic engineering teams work together on optimising large language model training from software down to silicon.

That work doesn’t stop because Anthropic uses other accelerators for specific workloads.

David sees the situation as validation of AWS’ approach to customer choice. The company offers Intel, AMD and Graviton processors, letting customers pick based on workload requirements.

Accelerators will follow the same pattern.

AWS Trainium3

“Nvidia serves the vast majority of workloads today, and customers are looking for choice in that space,” he says.

“It really comes down to whether Trainium can give them better price performance for that workload. We like competition.”

The logic reflects how AWS manages its own infrastructure. The company keeps multiple generations of chips in production at once, matching workloads to hardware based on economics.

Customers who built applications on the M1 Small instance from August 2006 can still run those workloads today.

AWS maintains older Nvidia chips, Trainium1 and Trainium2 in production alongside newer hardware, never forcing deprecation as long as customers need them.

Overcoming the cost barrier

Bedrock has become one of AWS’ fastest-growing services, but enterprise penetration remains shallow.

“If you really look at all the enterprises on AWS, there’s still enormous room for growth from enterprises and start-ups using inference in a meaningful way,” David says.

“Even the enterprises that have used Bedrock today are just scratching the surface.”

The problem isn’t technical capability: models can already handle tasks that seemed impossible two years ago. But economics determine deployment velocity.

If running inference costs too much, companies limit how they use it. If costs drop by an order of magnitude, entirely new use cases become viable.

Software improvements contribute as much as hardware. Models have become dramatically more efficient through new training approaches.

AWS Annapurna Labs Trainium3

David expects that to continue whilst hardware delivers better price performance through both Nvidia GPUs and custom silicon like Trainium.

“It’s the same cycle we've seen with computers over time. You’re going to get more performance for every dollar spent,” he says.

“I think we’re going to see, over and over again, both the ability to train models and the ability to run inference – the cost of that really comes way down.”

AWS has shown it can move from chip design to production deployment in months.

It has scaled custom silicon to millions of units whilst maintaining rapid iteration cycles.

Whether those capabilities translate into the cost reductions that unlock enterprise adoption will determine if AI infrastructure becomes as commoditised as compute capacity.

“When things get cheaper, customers innovate faster,” David says.

“They do more for their end customers, and those wheels spin even faster in terms of innovation and getting things done. Whether it’s price performance on GPUs or price performance on custom silicon like Trainium, you want to get more compute for every dollar spent.

“If I can get that cost down, it's going to unlock more innovation for our customers and more adoption.”

Executives