Nvidia and Oracle team up for Zettascale cluster: Available with up to 131,072 Blackwell GPUs

Oracle
(Image credit: Oracle)

Oracle on Wednesday introduced new types of clusters set to be available for AI training through Oracle Cloud Infrastructure (OCI). The most powerful cluster will be based on Nvidia's upcoming on Blackwell GPUs and will offer up to 2.4 ZettaFLOPS of AI performance, making it even more powerful than Elon Musk's recently announced AI clusters.

Oracle's new supercomputer clusters can be configured with Nvidia's Hopper or Blackwell GPUs for AI and HPC as well as different networking gear, including ultra-low latency RoCEv2 with ConnectX-7 NICs and ConnectX-8 SuperNICs or Nvidia's Quantum-2 InfiniBand-based networks, and a choice of HPC storage, depending on performance needs:

  • OCI Superclusters equipped with H100 GPUs can support up to 16,384 GPUs, offering a peak performance of 65 FP8/INT8 exaFLOPS and a combined network throughput of 13 Pb/s (13 petabits per second).
  • H200 GPU-powered OCI Superclusters, launching later this year, will scale up to 65,536 GPUs, delivering up to 260 FP8/INT8 exaFLOPS and 52 Pb/s in network throughput.  
  • Finally, OCI Superclusters based on Blackwell B200 GPUs will scale up to 131,072 GPUs and will offer peak performance of up to 2.4 FP4/INT8 zettaFLOPS.

OCI's upcoming supercomputing clusters far exceed the capabilities of current leading systems. The range-topping B200-based OCI Superclusters feature over three times more GPUs than the Frontier supercomputer (which uses 37,888 AMD Instinct MI250X GPUs) and six times more than other hyperscalers, according to Oracle.

"We have one of the broadest AI infrastructure offerings and are supporting customers that are running some of the most demanding AI workloads in the cloud," said Mahesh Thiagarajan, executive vice president, Oracle Cloud Infrastructure. "With Oracle's distributed cloud, customers have the flexibility to deploy cloud and AI services wherever they choose while preserving the highest levels of data and AI sovereignty."

Several companies are already benefiting from this advanced infrastructure. WideLabs and Zoom are leveraging OCI's high-performance AI infrastructure to accelerate their AI development while maintaining sovereignty controls.

"As businesses, researchers and nations race to innovate using AI, access to powerful computing clusters and AI software is critical," said Ian Buck, vice president of Hyperscale and High Performance Computing at Nvidia. "Nvidia's full-stack AI computing platform on Oracles broadly distributed cloud will deliver AI compute capabilities at unprecedented scale to advance AI efforts globally and help organizations everywhere accelerate research, development and deployment."

The upcoming OCI Superclusters will use Nvidia's GB200 NVL72 liquid-cooled cabinets with 72 GPUs that communicate with each other at an aggregate bandwidth of 129.6 TB/s in a single NVLink domain. Oracle said that Nvidia's Blackwell GPUs will be available in the first half of 2025 (as availability of Blackwell this year will be limited), though it is unclear when OCI will offer fully loaded Blackwell-powered clusters.

Anton Shilov
Contributing Writer

Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.

  • Stomx
    All spending tens and tens B dollars on Zettaflops, next will be inevitable oversupply, then the AI bubble will crash and all these GPUs will be found on the city dumps
    Reply
  • ThomasKinsley
    If I'm not mistaken, Blackwell is $30,000 per chip. Unless they got it on a discount, buying 131,072 chips is close to $4 billion USD.
    Reply
  • DS426
    ThomasKinsley said:
    If I'm not mistaken, Blackwell is $30,000 per chip. Unless they got it on a discount, buying 131,072 chips is close to $4 billion USD.
    Yep, however, AI is so sacrosanct to these companies that ROI isn't even proven and yet that's a complete non-issue for them; somehow, so many starry-eyed companies are going by a "build it and they will come" strategy, seemingly throwing out all conventional and responsible business sensibility.

    There's definitely going to be an AI bubble crash at some point, and unfortunately it looks to be almost every bit as disastrous as the dot-com bubble crash. Too bad AI isn't being used to predict it's own crash, but I guess that would be too antithetical to their own existence and training, lol!

    Those that fail to learn from history are doomed to repeat it.” - Winston Churchill
    Reply
  • askyron
    I shudder to think of the power consumption of these systems. Here's hoping they're planning on a solar farm to supplement it. Say half the state of Nevada.
    Reply