Nvidia unveils next generation Hopper GPU architecture, and more accelerated applications at GTC 2022
Why it matters: Watching the evolution of the computing industry over the last few years has been a fascinating exercise. After decades of focusing almost exclusively on one type of chip — the CPU — and measuring enhancements through refinements to its internal architecture, there has been a dramatic shift to multiple chip types, particularly GPUs (Graphics Processing Units), with performance improvements being enabled by high-speed connectivity between components.
Never has this been made clearer than at Nvidia’s latest GPU Technology Conference (GTC). During the event’s keynote, company CEO Jensen Huang unveiled a host of new advancements, including the latest GPU architecture (named Hopper after computing pioneer Grace Hopper), and numerous forms of high-speed chip-to-chip and device-to-device connectivity options.
Collectively, the company used these key technology advancements to introduce everything from the enormous Eos Supercomputer down to the H100 CNX Converged Accelerator, a PCIe card designed for existing servers, with lots of other options in between.
Nvidia’s focus is being driven by the industry’s relentless pursuit of advancements in AI and Machine Learning. In fact, most of the company’s many chip, hardware, and software announcements from the show have a tie to these critical trends, whether it be supercomputing applications, autonomous driving systems, or embedded robotics applications.
Nvidia also strongly reinforced that it’s more than a chip company, offering software updates for its existing tools and platforms, particularly the Omniverse 3D collaboration and simulation suite. To encourage more use of the tool, Nvidia announced Omniverse Cloud, which lets anyone try Omniverse with nothing more than a browser.
For hyperscalers and large enterprises looking to deploy advanced AI applications, the company also debuted new or updated versions of several cloud-native application services, including Merlin 1.0 for recommender systems, and version 2.0 of both its Riva speech recognition (Riva, sounds familiar?) and text-to-speech service, as well as AI Enterprise, for a variety of data science and analytics applications
New to AI Enterprise 2.0 is support for virtualization and the ability to use containers across several platforms, including VMware and RedHat. Taken as a whole, these offerings reflect the company’s growing evolution as a software provider. It’s moving from a tools-focused approach to one that offers SaaS-style applications that can be deployed across all the major public clouds, as well as via on-premises server hardware from the likes of Dell Technologies, HP Enterprise, and Lenovo.
Never forgetting its roots, however, the star of Nvidia’s latest GTC was the new Hopper GPU architecture and the H100 datacenter GPU.
Boasting a whopping 80 billion transistors, the 4nm H100 supports several important architectural advancements. First, to speed the performance of new Transformer-based AI models (such as the one driving the GPT-3 natural language engine), the H100 includes a Transformer engine that the company claims offers a 6x improvement over the previous Ampere architecture.
It also includes a new set of instructions called DPX that are designed to accelerate dynamic programming, a technique leveraged by applications such as genomics and proteomics, that previously ran on CPUs or FPGAs.
For privacy-focused applications, the H100 is also the first accelerator to support confidential computing (previous implementations only worked with CPUs), allowing models and data to be encrypted and protected via a virtualized trusted execution environment.
The architecture does allow for federated learning while in a confidential computing mode, meaning that multiple companies with private data sets can all train the same model by essentially passing it around among different secure environments. In addition, thanks to a second-generation implementation of multi-instance GPU, or MIG, a single physical GPU can be split up into seven separate isolated workloads, improving the efficiency of the chip in shared environments.
Hopper also supports the fourth-gen version of Nvidia’s NVLink, a major leap that offers a huge 9x increase in bandwidth versus previous technologies, supports connections to up to 256 GPUs, and enables use of NVLink Switch. The latter provides the ability to maintain high-speed connections not only within a single system, but to external systems as well. This, in turn, enabled a new range of DGX Pods and DGX SuperPods, Nvidia’s own branded supercomputer hardware, as well as the aforementioned Eos Supercomputer.
Speaking of NVLink and physical connectivity, the company also announced support for a new chip-to-chip technology called Nvidia NVLink-C2C, which is designed for chip-to-chip and die-to-die connections with speeds up to 900 Gbps between Nvidia components.
The company is opening up the previously proprietary NVLink standard to work with other chip vendors, and notably announced it would also be supporting the newly unveiled UCIe standard (see “The Future of Semiconductors is UCIe” for more).
This gives the company more flexibility in terms of how it can potentially work with others to create heterogeneous parts, as others in the semiconductor industry have started to do as well.
Nvidia chose to leverage its own NVLink-C2C for a new Grace Superchip, which combines two of the company’s Arm-based CPUs, and revealed that the Grace Hopper Superchip previewed last year, uses the same interconnect technology to provide a high-speed connection between its single Grace CPU and Hopper GPU.
Both “superchips” are targeted at datacenter applications, but their architectures and underlying technologies provide a good sense of where we can likely expect to see PC and other mainstream applications headed.
The NVLink-C2C standard, which supports industry connectivity standards such as Arm’s AMBA CHI protocol and CXL, can also be used to interconnect DPUs (data processing units) to help speed up critical data transfers within and across systems.
In addition to all these datacenter-focused announcements, Nvidia launched updates and more real-world customers for its Drive Orin platform for assisted and autonomous driving, as well as its Jetson and Isaac Orin platforms for robotics.
All told, it was an impressive launch of numerous technologies, chips, systems, and platforms. What was clear is that the future of demanding AI applications, along with other difficult computing challenges, is going to require multiple different elements working in concert to complete a given task.
As a result, increasing the diversity of chip types and the mechanisms for allowing them to communicate with one another is going to be as important — if not more important — as advancements within individual categories. To put it more succinctly, we’re clearly headed into a connected, multi-chip world.
Bob O’Donnell is the founder and chief analyst of TECHnalysis Research, LLC a technology consulting firm that provides strategic consulting and market research services to the technology industry and professional financial community. You can follow him on Twitter @bobodtech.