How Red Hat and AWS Power OpenShift AI at re:Invent 2025

Share this article
Share this article
Prioritise Us on Google
Red Hat and AWS have a longstanding strategic partnership
At AWS re:Invent 2025, Red Hat AI Inference Server on AWS Trainium and Inferentia delivers 30-40% better price performance for scalable Gen AI

Red Hat is extending its long-standing relationship with Amazon Web Services (AWS) to make it easier and more cost-effective for enterprises to run Gen AI at scale on AWS. 

The new collaboration, announced at AWS re:Invent 2025, brings Red Hat AI, OpenShift and Ansible together with AWS’ Trainium and Inferentia silicon to deliver a full-stack, production-ready path from AI pilots to governed, enterprise-wide deployment.​

Youtube Placeholder

Colin Brace, Vice President of Annapurna Labs at AWS, says: “Enterprises demand solutions that deliver exceptional performance, cost efficiency and operational choice for mission-critical AI workloads. 

“AWS designed its Trainium and Inferentia chips to make high-performance AI inference and training more accessible and cost-effective. 

“Our collaboration with Red Hat provides customers with a supported path to deploying generative AI at scale, combining the flexibility of open source with AWS infrastructure and purpose-built AI accelerators to accelerate time-to-value from pilot to production.”

Colin Brace, Vice President of Annapurna Labs at AWS

Silicon-first gen AI for the enterprise

The surge in Gen AI is forcing CIOs to rethink how they provision compute for inference, where costs can quickly outstrip initial training budgets. 

Analyst firm IDC expects that “by 2027, 40% of organisations will use custom silicon, including ARM processors or AI/ML-specific chips, to meet rising demands for performance optimisation, cost efficiency and specialised computing”.

This emphasises why enterprises are looking beyond general-purpose GPUs for large-scale AI. 

AWS Trainium and Inferentia are designed to address exactly this challenge, with AWS stating that its latest Trainium2-based instances can deliver between 30% and 40% better price-performance than current GPU-based Amazon EC2 instances for Gen AI workloads.​

Against this backdrop, Red Hat is positioning its open hybrid cloud stack as a way to abstract model operations away from specific accelerators while still exploiting the economics of AWS custom silicon. 

The aim? To let IT decision-makers standardise on a common AI operations layer, all while retaining the freedom to mix and match models and hardware as requirements evolve.

Red Hat AI Inference Server meets AWS chips

At the heart of the announcement is Red Hat AI Inference Server, built on the high-performance vLLM inference framework and now being optimised to run on AWS Trainium and Inferentia. 

By creating a common inference layer tuned for AWS’ AI accelerators, the companies say customers can target any supported Gen AI model while benefiting from higher throughput, lower latency and improved price-performance versus comparable GPU instances.​

Red Hat and AWS are also collaborating upstream on an AWS AI chip plugin for vLLM, reinforcing both firms’ open source credentials and ensuring performance improvements flow back to the broader community.

This work is closely tied to llm-d, an open source project for distributed inference at scale that Red Hat has now brought into Red Hat OpenShift AI 3 as a commercially supported capability.​

“By enabling our enterprise-grade Red Hat AI Inference Server, built on the innovative vLLM framework, with AWS AI chips, we’re empowering organisations to deploy and scale AI workloads with enhanced efficiency and flexibility,” says Joe Fernandes, Vice President and General Manager of the AI Business Unit at Red Hat.

Joe Fernandes, Vice President and General Manager of the AI Business Unit at Red Hat

“Building on Red Hat’s open source heritage, this collaboration aims to make generative AI more accessible and cost-effective across hybrid cloud environments.”

OpenShift, Neuron and Ansible automation

For organisations standardising on Kubernetes, Red Hat has worked with AWS to build an AWS Neuron operator for Red Hat OpenShift, Red Hat OpenShift AI and Red Hat OpenShift Service on AWS (ROSA).

This operator gives platform teams a supported, Kubernetes-native path to target AWS accelerators, simplifying lifecycle management and making it easier to align AI deployments with existing cluster operations and policies.​

The collaboration also extends into automation with the amazon.ai Certified Ansible Collection for Red Hat Ansible Automation Platform, which lets teams declare and orchestrate AWS AI services, agents and monitoring as code. 

As Gen AI stacks become more complex, this kind of idempotent, auditable automation helps enterprises keep AI deployments consistent across environments while satisfying governance and compliance requirements.

What are customers and analysts saying?

Real-world adopters are already using Red Hat OpenShift Service on AWS as a foundation for modernising mission-critical applications and embedding AI into production. 

For CAE, a global provider of simulation and training solutions, the managed OpenShift service on AWS has become a key enabler for its digital transformation and AI integration strategy.​

Jean-François Gamache, CIO and VP of Digital Services at CAE, says: “Modernising our critical applications with Red Hat OpenShift Service on AWS marks a significant milestone in our digital transformation. 

Jean-François Gamache, CIO and VP of Digital Services at CAE

“This platform supports our developers in focusing on high-value initiatives – driving product innovation and accelerating AI integration across our solutions.

“Red Hat OpenShift provides the flexibility and scalability that enable us to deliver real impact, from actionable insights through live virtual coaching to significantly reducing cycle times for user-reported issues.”

Industry analysts see the economics of inference as a defining issue for enterprise AI over the next few years. 

“As AI inference costs escalate, enterprises are prioritising efficiency alongside performance,” explains Anurag Agrawal, Founder and Chief Global Analyst at Techaisle.

Anurag Agrawal, Founder and Chief Global Analyst at Techaisle

“This collaboration exemplifies Red Hat’s ‘any model, any hardware’ strategy by combining its open hybrid cloud platform with the distinct economic advantages of AWS Trainium and Inferentia. 

“It empowers CIOs to operationalise generative AI at scale, shifting from cost-intensive experimentation to sustainable, governed production.”

Executives