This week I had the privilege of attending SC22, the supercomputing conference, in Dallas, TX. Supercomputing centers are early adopters of technologies that will soon be common in most data centers. Thus, the sessions I attended, the attendees I spoke with, and the solutions displayed in the exhibit hall gave me an eye-opening glimpse into tomorrow’s data center.
INCREASING DATA CENTER DENSITY IS DRIVING CHANGE
Data center density is increasing in multiple dimensions, including compute, storage, power, and heat. Mitchell Knight of CoolIT states that average data center power density has quadrupled in the last ten years. HPC is leading the way, with 45 KW per rack becoming common, and the new Frontier HPC system draws 300 KW per rack.
Much of this increase is driven by advances in CPUs and GPUs. For example, Intel’s 3rd Generation Xeon processors have a maximum thermal design (TDP) of 280 Watts per processor, and its 4th generation Xeon processors can draw up to 350W per processor. NVIDIA’s A100 GPU has a maximum TDP of 300 Watts, and its new H100 draws up to 700W per processor.
You may not need more data center space to keep up with your performance and capacity demands, but you will need more power and cooling capacity delivered to your data center.
These increases require changes in data center infrastructure, including the technologies we include in the data center and perhaps where we run our workloads.
LIQUID COOLING FOR THE DATA CENTER WAS HOT AT SC22
Advances in processors are driving the need for liquid cooling. Happily, a complete ecosystem appears to be in place to address the increasing heat density in the data center. Many vendors displayed multiple approaches to liquid cooling in the Exhibit Hall. These include immersion, direct-to-chip, and back-of-rack solutions.
Many of these liquid cooling solutions exchange heat with facility water to carry away the heat, while others dissipate heat into the air. One Stop Systems displayed a self-contained “supercomputer at the edge” that uses full immersion to cool a quad-GPU system.
THE DATA TSUNAMI QUANTIFIED
HPC requires vast amounts of data capacity and velocity. One attendee I spoke with is responsible for managing an infrastructure that is generating 2PB of genomics data daily. They process that data down to 100TB they need to persist each day.
In a meeting with VAST Data executives, they said their average customer has 10PB of VAST storage in place with 99.999%+ availability. Some of VAST Data’s customers have hundreds of PB of VAST storage in place.
One of these VAST customers is EMBL-EBI. Raphael Griman, Compute Team Lead for EMBL-EBI, said they are currently storing more than 250PB of public science data and expect to cross the exabyte boundary within a year. That means they will add an average of 2PB of data to their environment every day.
Raphael said they have had zero storage downtime in the three years they have been using VAST Data. EMBL-EBI is consolidating from seven file systems to just two, with VAST storage providing the engine for that consolidation.
THE NECESSITY OF INTELLIGENT DATA MANAGEMENT
Massive amounts of data can create massive logistical challenges that start with providing a landing place for the data and the bandwidth required to move the data to storage. Automation and smart data placement are essential to controlling costs and deriving value from data at scale.
I spoke with two vendors that are addressing the multi-petabyte data management challenge at SC22. Hammerspace creates what it terms a “global data environment” through intelligent metadata management. Hammerspace moves and aggregates metadata from all an organization’s filers to provide a global view and unified management plane for all the unstructured data stored on those systems.
Arcitecta’s Mediaflux metadata database technology provides policy-driven data management to enable data and business resilience. One application of the Mediaflux technology is the high-speed transfer of data across global networks. This capability was recently awarded the “Most Complete Solution” and “Best Software Architecture” by the International Data Mover Challenge (DMC) at Supercomputing Asia 2022 (SCA22).
COMPOSABLE INFRASTRUCTURE ARRIVES
Another kind of Liqid was on display in the Exhibit Hall. Liqid provides a composable infrastructure solution based on PCIe fabric and its software for pooling and composing PCIe-attached devices. Their technology provides a preview of the advances that CXL is bringing to data center architecture.
UNCONVENTIONAL THINKING ENABLES SUSTAINABLE COMPUTING
Unconventional thinking is expanding the realm of the possible.
Many wind farms and solar energy installations are located far from where power is needed. In many instances, there is more renewable energy production capacity than transmission capacity to carry the power to where it is needed. This is called power congestion and explains why you often see only a portion of a wind farm’s turbines spinning at any given time.
Lancium Compute addresses the power congestion problem by placing its compute in regions with more renewable energy capacity than transmission capacity. Lancium Compute pauses jobs during periods of high demand or low renewable energy production. As a result, it purchases renewable energy at the lowest possible cost. This also enables Lancium to offer high-performance computing as a service (HPC PaaS) in data centers powered by 100% renewable energy.
If sustainability through the use of renewable energy is important to your organization, you may want to identify batch-oriented workloads, as are many scientific workloads, and move that processing to where renewable energy is produced rather than moving more electricity to your data center.
Supercomputing is showing the way into the future for the data center. Increases in data center density mean you may not need more data center space to keep up with your performance and capacity demands. Instead, you will need more power and cooling capacity delivered to your data center.
In addition, you may wish to tune your demand to the power grid by scheduling batch-oriented workloads based on the price of electricity in a given time period or location.
KEEP UP-TO-DATE WITH DCIG
To be notified of new DCIG articles, reports, and webinars, sign up for DCIG’s free weekly Newsletter.
To learn about DCIG’s future research and publications, see the DCIG Editorial Calendar.
Technology providers interested in licensing DCIG TOP 5 reports or having DCIG produce custom reports, please contact DCIG for more information.