In a milestone for personal computing, Nvidia is enabling better AI on PCs by enabling generative AI processing on Windows PCs using RTX-based graphics processing units (GPUs).
In the past year, generative AI has emerged as a transformative trend. With its rapid growth and increasing accessibility, consumers now have simplified interfaces and user-friendly tools that harness the power of GPU-optimized AI, machine learning, and high-performance computing (HPC) software.
Nvidia has enabled a lot of this AI revolution to happen in data centers with lots of GPUs, and now it’s bringing that to RTX-based GPUs on over 100 million Windows PCs worldwide. The integration of AI into major Windows applications has been a five-year journey, with the dedicated AI processors called Tensor Cores, found in GeForce RTX and Nvidia RTX GPUs, driving the generative AI capabilities on Windows PCs and workstations.
Jesse Clayton, director of product management and product marketing for Windows AI at Nvidia, said in an interview with GamesBeat that we’re at a big moment.
GamesBeat Next 2023
Join the GamesBeat community in San Francisco this October 23-24. You’ll hear from the brightest minds within the gaming industry on latest developments and their take on the future of gaming.
“For AI on PCs, we think is really one of the most important moments in the history of technology. And I don’t think it’s hyperbole to say that for gamers, creators, video streamers, office workers, students, and really even casual PC users — AI is delivering new experiences. It’s unlocking creativity. And it’s making it easier for folks to get more done. AI is being incorporated into every important app. And it’s going to impact every PC user. It’s really fundamentally changing the way that people use computers.”
Previously announced for data centers, TensorRT-LLM, an open-source library designed to accelerate inference performance for large language models (LLMs), is now making its way to Windows. This library, optimized for Nvidia RTX GPUs, can enhance the performance of the latest LLMs, such as Llama 2 and Code Llama, by up to four times.
Additionally, Nvidia has released tools to assist developers in accelerating their LLMs, including scripts that enable compatibility with TensorRT-LLM, TensorRT-optimized open-source models, and a developer reference project that showcases the speed and quality of LLM responses.
“What many people don’t realize is that AI use cases on PC are actually already firmly established. And Nvidia really started this five years ago in 2018,” Clayton said. “When we launched our first GPUs with Tensor Cores, this was a fundamental change in the GPU architecture because we believed in how important AI was going to be. And so with the launch of the so called RTX GPUs, we also launched AI technology for gaming.”
Stable Diffusion demo
TensorRT acceleration has also been integrated into Stable Diffusion, a popular Web UI by Automatic1111 distribution.
Stable Diffusion takes a text prompt and makes an image based on it. Creators use them to create some stunning works of art. But it takes time and computing resources to come up with each image. That means you have to wait for it to get done. Nvidia’s latest GPUs can speed performance by two times on Stable Diffusion on the previous implementation and more than seven times faster on Apple’s latest chips. So a machine with a GeForce RTX 4090 graphics card can generate 15 images on Stable Diffusion in the time it takes an Apple machine to do two.
DLSS, was based on graphics research where AI takes a low-resolution image and upscales it to high resolution, increasing the frame rate and helping gamers get more value out of their GPUs. Game developers can also add more visual artistry in their games. Now there are more than 300 DLSS games and Nvidia just released version 3.5 of the technology.
“Generative AI has reached a point where it’s unlocking a whole new class of use cases with opportunities to bring PC AI to the mainstream,” Clayton said. “So gamers will enjoy AI-powered avatars. Office workers and students will use large language models, or LLMs, to draft documents and slides and to quickly extract insights from CSV data. Developers are using LLMs to assist with coding and debugging. And every day users will use LLMs to do everything from summarize web content to plan travel, and ultimately to use AI as a digital assistant.”
Video Super Resolution
Moreover, the release of RTX Video Super Resolution (VSR) version 1.5, as part of the Game Ready Driver, further enhances the AI-powered capabilities. VSR improves the quality of streamed video content by reducing compression artifacts, sharpening edges, and enhancing details. The latest version of VSR delivers even better visual quality with updated models, de-artifacting content played in native resolution, and support for both professional RTX and GeForce RTX 20 Series GPUs based on the Turing architecture.
The technology has been integrated into the latest Game Ready Driver and will be included in the upcoming Nvidia Studio Driver, scheduled for release in early November.
The combination of TensorRT-LLM acceleration and LLM capabilities opens up new possibilities in productivity, enabling LLMs to operate up to four times faster on RTX-powered Windows PCs. This acceleration improves the user experience for sophisticated LLM use cases, such as writing and coding assistants that provide multiple unique auto-complete results simultaneously.
Finding Alan Wake 2
The integration of TensorRT-LLM with other technologies, such as retrieval-augmented generation (RAG), allows LLMs to deliver targeted responses based on specific datasets.
For example, when asked about Nvidia technology integrations in Alan Wake 2, the LLaMa 2 model initially responded that the game had not been announced. However, when RAG was applied with recent GeForce news articles, the LLaMa 2 model quickly provided the correct answer, showcasing the speed and proficiency achieved with TensorRT-LLM acceleration.
Clayton said that if the data already exists in the cloud and if the model had already been trained on that data, it makes sense architecturally to just run it in the cloud.
However, if it’s a personal data set, or a data set that only you have access to, or the model wasn’t trained on the cloud, then you have to find some other way to do it, he said.
“Retraining the models is pretty challenging to do from a computation perspective. This enables you to do it without taking that route. I am right now paying $20 a month to be able to use [AI services]. How many of these cloud services am I going to pay if I can do a lot of that work locally with a powerful GPU?”
Developers interested in leveraging TensorRT-LLM can download it from Nvidia Developer. Additionally, TensorRT-optimized open-source models and a RAG demo trained on GeForce news are available on ngc.nvidia.com and GitHub.com/NVIDIA.
Competitors like Intel, Advanced Micro Devices, Qualcomm and Apple are using rival technologies to improve AI on the PC as well as smart devices. Clayton said these solutions will be good for lightweight AI workloads running on low power. These are more like table stakes AI, and they’re complimentary with what Nvidia’s GPUs do, he said.
RTX GPUs have 20 to 100 times the performance of CPUs on AI workloads, he said, and that’s why the tech starts with the GPU. The math at the core of modern AI is matrix multiplication, and at the core of Nvidia’s platform are RTX GPUs with Tensor Cores, which are designed to accelerate matrix multiplication. Today’s GeForce RTX GPUs can compute up to 1,300 trillion Tensor operations per second, which makes them the fastest PC AI accelerators.
“They also represent the world’s largest install base of dedicated AI hardware with more than 100 million RTX PC GPUs worldwide,” Clayton said. “So they really have the performance and flexibility for taking on not only today’s tasks but tomorrow’s AI use cases.”
Your PC can also turn to the cloud for any AI tasks that are too demanding for your PC’s GPU. Today, there are more than 400 AI-enabled PC applications and games.
GamesBeat’s creed when covering the game industry is “where passion meets business.” What does this mean? We want to tell you how the news matters to you — not just as a decision-maker at a game studio, but also as a fan of games. Whether you read our articles, listen to our podcasts, or watch our videos, GamesBeat will help you learn about the industry and enjoy engaging with it. Discover our Briefings.