The dirty secret of high performance computing
Date:
Sat, 15 Oct 2022 13:04:24 +0000
Description:
The worlds fastest computers are becoming ever more powerful, but consuming vast quantities of energy in the process.
FULL STORY ======================================================================
In the decades since Seymour Cray developed what is widely considered the worlds first supercomputer, the CDC 6600 , an arms race has been waged in the high performance computing (HPC) community. The objective: to enhance performance, by any means, at any cost.
Propelled by advances in the fields of compute, storage, networking and software, the performance of leading systems has increased one trillion-fold since the unveiling of the CDC 6600 in 1964, from the millions of floating point operations per second (megaFLOPS) to the quintillions (exaFLOPS).
The current holder of the crown, a colossal US-based supercomputer called Frontier , is capable of achieving 1.102 exaFLOPS by the High Performance Linpack (HPL) benchmark. But even more powerful machines are suspected to be in operation elsewhere , behind closed doors.
The arrival of so-called exascale supercomputers is expected to benefit practically all sectors - from science to cybersecurity, healthcare to
finance - and set the stage for mighty new AI models that would otherwise
have taken years to train. The CDC 6600, widely considered the world's first supercomputer. (Image credit: Computer History Museum)
However, an increase in speeds of this magnitude has come at a cost: energy consumption. At full throttle, Frontier consumes up to 40MW of power, roughly the same as 40 million desktop PCs .
Supercomputing has always been about pushing the boundaries of the possible. But as the need to minimize emissions becomes ever more clear and energy prices continue to soar, the HPC industry will have to re-evaluate whether
its original guiding principle is still worth following. Performance vs. efficiency
One organization operating at the forefront of this issue is the University
of Cambridge, which in partnership with Dell Technologies has developed multiple supercomputers with power efficiency at the forefront of the design.
The Wilkes3 , for example, is positioned only 100th in the overall
performance charts , but sits in third place in the Green500 , a ranking of HPC systems based on performance per watt of energy consumed.
In conversation with TechRadar Pro , Dr. Paul Calleja, Director of Research Computing Services at the University of Cambridge, explained the institution is far more concerned with building highly productive and efficient machines than extremely powerful ones.
Were not really interested in large systems, because theyre highly specific point solutions. But the technologies deployed inside them are much more widely applicable and will enable systems an order of magnitude slower to operate in a much more cost- and energy-efficient way, says Dr. Calleja.
In doing so, you democratize access to computing for many more people. Were interested in using technologies designed for those big epoch systems to create much more sustainable supercomputers, for a wider audience. The
Wilkes3 supercomputer might not be the world's fastest, but it's among the most power efficient. (Image credit: University of Cambridge)
In the years to come, Dr. Calleja also predicts an increasingly fierce push for power efficiency in the HPC sector and wider datacenter community,
wherein energy consumption accounts for upwards of 90% of costs, we're told.
Recent fluctuations in the price of energy related to the war in Ukraine will also have made running supercomputers dramatically more expensive, particularly in the context of exascale computing, further illustrating the importance of performance per watt.
In the context of Wilkes3, the university found there were a number of optimizations that helped to improve the level of efficiency. For example, by lowering the clock speed at which some components were running, depending on the workload, the team was able to achieve energy consumption reductions in the region of 20-30%.
Within a particular architectural family, clock speed has a linear relationship with performance, but a squared relationship with power consumption. Thats the killer, explained Dr. Calleja.
Reducing the clock speed reduces the power draw at a much faster rate than
the performance, but also extends the time it takes to complete a job. So
what we should be looking at isnt power consumption during a run, but really energy consumed per job. There is a sweet spot. Software is king
Beyond fine-tuning hardware configurations for specific workloads, there are also a number of optimizations to be made elsewhere, in the context of
storage and networking, and in connected disciplines like cooling and rack design.
However, asked where specifically he would like to see resources allocated in the quest to improve power efficiency, Dr. Calleja explained that the focus should be on software, first and foremost.
The hardware is not the problem, its about application efficiency. This is going to be the major bottleneck moving forward, he said. Todays exascale systems are based on GPU architectures and the number of applications that
can run efficiently at scale in GPU systems is small.
To really take advantage of todays technology, we need to put a lot of focus into application development. The development lifecycle stretches over decades; software used today was developed 20-30 years ago and its difficult when youve got such long-lived code that needs to be rearchitected.
The problem, though, is that the HPC industry has not made a habit of
thinking software-first. Historically, much more attention has been paid to the hardware, because, in Dr. Callejas words, its easy; you just buy a faster chip. You dont have to think clever.
While we had Moores Law, with a doubling of processor performance every eighteen months, you didnt have to do anything [on a software level] to increase performance. But those days are gone. Now if we want advancements,
we have to go back and rearchitect the software. As Moore's Law begins to falter, advances in CPU architecture can no longer be relied upon as a source of performance improvements. (Image credit: Alexander_Safonov / Shutterstock)
Dr. Calleja reserved some praise for Intel, in this regard. As the server hardware space becomes more diverse from a vendor perspective (in most respects, a positive development), application compatibility has the
potential to become a problem, but Intel is working on a solution.
One differentiator I see for Intel is that it invests an awful lot [of both funds and time] into the oneAPI ecosystem, for developing code portability across silicon types. Its these kind of toolchains we need, to enable tomorrows applications to take advantage of emerging silicon, he notes.
Separately, Dr. Calleja called for a tighter focus on scientific need. Too often, things go wrong in translation, creating a misalignment between hardware and software architectures and the actual needs of the end user.
A more energetic approach to cross-industry collaboration, he says, would create a virtuous circle comprised of users, service providers and vendors, which will translate into benefits from both a performance and efficiency perspective. A zettascale future
In typical fashion, with the fall of the symbolic exascale milestone, attention will now turn to the next one: zettascale.
Zettascale is just the next flag in the ground, said Dr. Calleja, a totem
that highlights the technologies needed to reach the next milestone in computing advances, which today are unobtainable.
The worlds fastest systems are extremely expensive for what you get out of them, in terms of the scientific output. But they are important, because they demonstrate the art of the possible and they move the industry forwards. Pembroke College, University of Cambridge, the HQ of the Open Zettascale Lab. (Image credit: University of Cambridge)
Whether systems capable of achieving one zettaFLOPS of performance, one thousand times more powerful than the current crop, can be developed in a way that aligns with sustainability objectives will depend on the industrys capacity for invention.
There is not a binary relationship between performance and power efficiency, but a healthy dose of craft will be required in each subdiscipline to deliver the necessary performance increase within an appropriate power envelope.
In theory, there exists a golden ratio of performance to energy consumption, whereby the benefits to society brought about by HPC can be said to justify the expenditure of carbon emissions.
The precise figure will remain elusive in practice, of course, but the
pursuit of the idea is itself by definition a step in the right direction.
======================================================================
Link to news story:
https://www.techradar.com/news/the-dirty-secret-of-high-performance-computing/
--- Mystic BBS v1.12 A47 (Linux/64)
* Origin: tqwNet Technology News (1337:1/100)