Data Center / Cloud
Sep 06, 2024
Enhancing Application Portability and Compatibility across New Platforms Using NVIDIA Magnum IO NVSHMEM 3.0
NVSHMEM is a parallel programming interface that provides efficient and scalable communication for NVIDIA GPU clusters. Part of NVIDIA Magnum IO and based on...
7 MIN READ
Sep 05, 2024
Low Latency Inference Chapter 1: Up to 1.9x Higher Llama 3.1 Performance with Medusa on NVIDIA HGX H200 with NVLink Switch
As large language models (LLMs) continue to grow in size and complexity, multi-GPU compute is a must-have to deliver the low latency and high throughput that...
5 MIN READ
Sep 03, 2024
Real-Time Neural Receivers Drive AI-RAN Innovation
Today’s 5G New Radio (5G NR) wireless communication systems rely on highly optimized signal processing algorithms to reconstruct transmitted messages from...
11 MIN READ
Aug 28, 2024
Boosting Llama 3.1 405B Performance up to 1.44x with NVIDIA TensorRT Model Optimizer on NVIDIA H200 GPUs
The Llama 3.1 405B large language model (LLM), developed by Meta, is an open-source community model that delivers state-of-the-art performance and supports a...
7 MIN READ
Aug 28, 2024
NVIDIA Triton Inference Server Achieves Outstanding Performance in MLPerf Inference 4.1 Benchmarks
Six years ago, we embarked on a journey to develop an AI inference serving solution specifically designed for high-throughput and time-sensitive production use...
8 MIN READ
Aug 28, 2024
NVIDIA Blackwell Platform Sets New LLM Inference Records in MLPerf Inference v4.1
Large language model (LLM) inference is a full-stack challenge. Powerful GPUs, high-bandwidth GPU-to-GPU interconnects, efficient acceleration libraries, and a...
13 MIN READ
Aug 27, 2024
Optimize Large-Scale AI Workloads with NVIDIA Spectrum-X
In today’s rapidly evolving technological landscape, staying ahead of the curve is not just a goal—it's a necessity. The surge of innovations, particularly...
5 MIN READ
Aug 26, 2024
LLM Research Rewrites the Role of AI in Safeguarding Sustainable Systems
Large language models (LLMs) are emerging as a tool for safeguarding critical infrastructure systems, such as renewable energy, healthcare, or transportation,...
3 MIN READ
Aug 26, 2024
NVIDIA AI Workbench Simplifies Using GPUs on Windows
NVIDIA AI Workbench is a free, user-friendly development environment manager that streamlines data science, ML, and AI projects on your system of choice: PC,...
8 MIN READ
Aug 21, 2024
Google Cloud Run Adds Support for NVIDIA L4 GPUs, NVIDIA NIM, and Serverless AI Inference Deployments at Scale
Deploying AI-enabled applications and services presents enterprises with significant challenges: Performance is critical as it directly shapes user...
6 MIN READ
Aug 21, 2024
Mistral-NeMo-Minitron 8B Foundation Model Delivers Unparalleled Accuracy
Last month, NVIDIA and Mistral AI unveiled Mistral NeMo 12B, a leading state-of-the-art large language model (LLM). Mistral NeMo 12B consistently outperforms...
5 MIN READ
Aug 20, 2024
NVIDIA GH200 Superchip Delivers Breakthrough Energy Efficiency and Node Consolidation for Apache Spark
With the rapid growth of generative AI, CIOs and IT leaders are looking for ways to reclaim data center resources to accommodate new AI use cases that promise...
8 MIN READ
Aug 14, 2024
Video: Build Live Media Applications for AI-Enabled Infrastructure with NVIDIA Holoscan for Media
NVIDIA Holoscan for Media is a software-defined, AI-enabled platform that enables live video pipelines to run on the same infrastructure as AI. This video...
1 MIN READ
Aug 14, 2024
Just Released: DOCA 2.8 Software Framework
The new release includes support for Spectrum-X 1.1 RA and new features for AI Cloud Data Centers.
1 MIN READ
Aug 14, 2024
How to Prune and Distill Llama-3.1 8B to an NVIDIA Llama-3.1-Minitron 4B Model
Large language models (LLM) are now a dominant force in natural language processing and understanding, thanks to their effectiveness and versatility. LLMs such...
12 MIN READ
Aug 12, 2024
NVIDIA NVLink and NVIDIA NVSwitch Supercharge Large Language Model Inference
Large language models (LLM) are getting larger, increasing the amount of compute required to process inference requests. To meet real-time latency requirements...
8 MIN READ