Understanding AI performance and how NVIDIA RTX AI PCs take it to another level

(Image credit: NVIDIA)

AI is changing the game when it comes to the specifications we consider when determining how fast a PC is. Sure, things like GHz, Gbps, and MT/s are going to remain important for general use of the system, but with AI features on board, some new metrics are coming into play. You’ll hear about TOPS (Trillions of Operations Per Second), tokens, and even image generation speed, and NVIDIA RTX AI PCs are leading the charge to show you what you can do with all those TOPS. Let’s dig into what all that means for you.

In order to take advantage of AI, whether it’s image generation, text summarization, or any of the many new features rolling out at a blistering pace, you need a certain level of performance. That’s just the performance that TOPS is there to measure. Those TOPS are integer math operations processed by chips on the system. That speed can come from a dedicated NPU (Neural Processing Unit), CPU, GPU, or a combination of components working together to process the math operations. NVIDIA RTX AI PCs already have a huge leg up here, as they offer substantial speeds in that department.

NVIDIA RTX AI PCs graphic — (Image credit: NVIDIA)

At a speed of 40 TOPS, a system that meets the speed threshold to handle the text generation and analysis features of Microsoft’s new Copilot+. If a system offered significantly lower TOPS, the user experience could be hampered by incredibly slow responses from the AI assistants tools, which should feel as quick as interacting with another human.

This is where we encounter tokens. All those trillions of operations are going toward AI inference with the goal of producing tokens, which are the characters, words, and phrases an LLM (large language model) produces. When you get a plain language answer from a chatbot, what it’s returning is tokens. For a given LLM, the more TOPS your system can achieve, the more tokens you can get each second, and therefore the more responsive using the LLM will feel. Fast enough, and it’ll feel instantaneous. With chatbots, for example, a high enough token-per-second rate will effectively produce text as fast as most people can read it. NVIDIA’s RTX AI PCs are already well into the territory of handling LLMs quickly.

See, 40 trillion operations per second might sound like a lot, that’s just scratching the surface of what RTX AI PCs can do. There are a lot more AI applications beyond creating text, such as the image and video generation capabilities of Stable Diffusion, object recognition for video software, or even AI upscaling of video content as gamers have see for several years with the likes of NVIDIA’s DLSS technology on RTX GPUs.

These tasks can benefit from far more computational power. After all, if it takes even one second to generate an image, you won’t be able to generate fluid video in real time. That’s where the extra power of NVIDIA’s RTX GPUs can really shine. Whether it’s text or image generation, AI-assisted content creation, supersampling in PC games, or querying LLMs, NVIDIA’s RTX GPUs provide serious performance. The GeForce RTX 4090, for example, can hit 1,700 TOPS. Even the NVIDIA’s most economical mobile GPU variant achieves 194 TOPS. At this level of performance, NVIDIA RTX AI PCs are not only able to do things like answer user queries with natural language responses but can do so in a virtual environment with a fully-animated 3D character voicing that response, as we’ve seen with NVIDIA ACE.

A lot of this performance is coming from the Tensor Cores included on NVIDIA’s RTX GPUs. These are dedicated AI accelerators specifically designed to speed up AI tasks, making them a helpful partner for generative AI applications.

Illustration of a PC tapping into TensorRT — (Image credit: NVIDIA)

The performance headroom lifts even higher when using NVIDIA TensorRT software, which is available to the millions of Windows PCs and workstations using RTX GPUs. TensorRT helps optimize AI workflows to leverage NVIDIA’s RTX hardware so you’re getting the most out of the processing power of your components. With Stable Diffusion’s Automatic1111 interface, TensorRT worked to boost image generation speeds up to 2x. It also recently received support from the popular Stable Diffusion ComfyUI, where it boosted image generation speed by up to 60% and image-to-video conversion speed in Stable Video Diffusion by 70%. These advantages are even available in the freshly launched Stable Diffusion 3.

AI illustrations — (Image credit: NVIDIA)

With AI racing along at a staggering pace. Keeping up with all the latest developments and new terminology can be daunting. To help you keep pace and more easily approach the many complex topics, NVIDIA has launched the AI Decoded blog series. You can follow it to get the latest news and discover helpful tools to let you tap into AI capabilities on your own system.

Get the Windows Central Newsletter