How Nvidia’s New AI-Optimised GPUs Transform Gen AI on PCs

The development of AI applications across the world has been constrained by the need for substantial computing power, typically provided by cloud data centres.
This centralised approach has created bottlenecks in development and raised concerns about data privacy and processing costs.
Personal computers (PCS) have lacked the processing capability to run complex AI models locally, limiting developers' ability to create and test AI applications without relying on cloud services – and this limitation has particularly affected smaller development teams and individual programmers working on AI projects.
Now, the computing scene is shifting towards local processing of AI workloads, driven by advances in specialised hardware design and more efficient AI models – which could reduce dependency on cloud services and accelerate AI application development.
As a result, Nvidia has introduced new graphics processing units (GPUs) designed to run AI applications on PCs, marking a shift from cloud-based AI processing to local computing.
Nvidia NIM brings enterprise AI to desktop computing
To support developers working with the new hardware, Nvidia has released NIM microservices, a suite of pre-packaged AI models optimised for PCs.
- GeForce RTX 5090 and 5080 GPUs perform 3,352 trillion AI operations per second for local processing
- Models that required 23GB of memory can now run on 10GB, enabling wider GPU compatibility
These tools allow developers to integrate AI capabilities into applications without managing the complexity of model optimisation.
The company has collaborated with Microsoft to enable NIM microservices in Windows Subsystem for Linux and this integration allows developers to use the same AI tools across desktop and data centre environments.
NIM microservices include the TensorRT software development kit and TensorRT-LLM library, tools that enable AI models to utilise the specialised Tensor Core processors efficiently.
The challenge of deploying AI models
Current AI deployment requires significant technical expertise.
Models from repositories such as Hugging Face must be modified and optimised before they can run effectively on PCs – and these modifications include quantisation, the process of reducing model size and integration with existing software tools.
NIM microservices aim to simplify this process by providing pre-optimised models ready for deployment.
Black Forest Labs demonstrates real-world performance gains
The practical impact of these advances is demonstrated by Black Forest Labs' FLUX.1 development model, using previous generation hardware, the model required 23 gigabytes of video memory and took 15 seconds to generate images.
Yet the new RTX 5090 processor reduces this time to five seconds while requiring less than 10 gigabytes of memory and this improvement stems from the FP4 compression technology, which is integrated into Nvidia's NIM microservices.
The technology makes it possible to run AI models on a wider range of graphics cards, expanding access to local AI processing.
Reference designs showcase AI application development
Nvidia has also released AI Blueprints, a collection of reference implementations demonstrating potential applications which include a system that converts PDF documents into podcasts using seven interconnected AI models.
The blueprints aim to accelerate development by providing working examples of complex AI workflows running on PCs rather than cloud services.
Evolution of AI processing hardware
The technology represents a continuation of Nvidia's strategy, which began in 2018 with the introduction of dedicated AI processing cores in consumer graphics cards.
These processors enabled AI-enhanced gaming and content creation applications to run on PCs.
Additionally, the fifth generation of Tensor Cores in the Blackwell architecture can handle multiple AI models simultaneously, supporting applications from real-time rendering to intelligent assistants.
The company plans to release NIM microservices and AI Blueprints with support for the GeForce RTX 50 Series, GeForce RTX 4090 and 4080 and RTX 6000 and 5000 professional graphics processors – support for additional processors will follow.
The new hardware and software tools aim to make advanced AI capabilities accessible to developers and enthusiasts working on PCs, reducing dependency on cloud computing services for AI development and deployment.
Jesse Clayton, Product Manager at Nvidia, summarises in an Nvidia blog: “These GPUs were built to accelerate the latest Gen AI workloads, delivering up to 3,352 AI trillion operations per second (TOPS), enabling incredible experiences for AI enthusiasts, gamers, creators and developers.”
Explore the latest edition of Technology Magazine and be part of the conversation at our global conference series, Tech & AI LIVE.
Discover all our upcoming events and secure your tickets today.
Technology Magazine is a BizClik brand

