Three HPC Gurus Ask: Do We Still Need GPUs?

Jump to main content

NEXTPLATFORM AD

Timothy Prickett Morgan

Timothy Prickett Morgan

Co-Editor, Co-Founder, The Next Platform

Published tue 30 Jun 2026 // 17:13 UTC

Yes, that simple question is, in the modern Nvidia world that has come to dominate AI training and to a certain extent HPC simulation and modeling, heretical. But given that CPUs are in many cases starting to look more like GPUs, with their hybrid vector and matrix math engines, mixed precision support, in some cases HBM stacked and high bandwidth memory as well as fatter DRAM main memory, and integrated interconnects, it is also a logical question. And so when Jack Dongarra, of the University of Tennessee and of Oak Ridge National Laboratory for 36 years, Torsten Hoefler of ETC Zurich and chief architect for AI/ML at CSCS, and Satoshi Matsuoka of RIKEN lab and of Tokyo Institute of Technology, ask that question rhetorically and answer it, people listen.

NEXTPLATFORM AD

That question is answered in a forthcoming paper that will be on arXiv as well as in flagship publication of the Association for Computing Machinery called Do We Still Need GPUs? Rethinking AI and Scientific Computing on Matrix-Enhanced CPUs, which you can read for yourself at this link until it is published. And that question was prompted by the existence of a new all-CPU supercomputer called “LineShine” that is the fastest AI/HPC supercomputer in the world, according to the latest Top500 rankings that came out this month. I did a deep dive on the processors, memory, and interconnects of the LineShine machine and its Chinese-made LX2 Arm server CPU here, which you need to read to get this paper that the three HPC gurus are going to put out there. You Go To War With The Compute Engines You Have Here is the thesis, which I will embellish. With enthusiasm. The paper compares and contrasts the A64FX processor and the architecture of the “Fugaku” supercomputer at RIKEN Lab, which went into full operation in March 2021, with the LineShine machine, which appears to have been fired up last fall. Both AI/HPC supercomputers are all-CPU designs, as was Fukagu’s predecessor at RIKEN, the “Project Keisoku” K supercomputer that went into full production in September 2012. For your reference, I did a deep dive on the K system back at The Register here. Fujitsu moved from Sparc to Arm architecture with the Fugaku machine, the deep dive on the A64FX processor is here and the Tofu D companion interconnect is there. Here is the funny bit as it relates to the K machine. It was not an all-CPU machine by choice.

NEXTPLATFORM AD

Back in 2008, the idea was for Japan to engage its three big supercomputer and system makers to create a hybrid machine that combined CPU compute made by Fujitsu as well as vector accelerators from Hitachi, with NEC working on a multiple dimension mesh/torus interconnect to link all of the CPU and vector nodes to each other to share work. But in May 2009, with the great Recession roaring and Hitachi and NEC unsure how they could afford to do the K system development and manufacturing, they both pulled out of the deal, leaving Fujitsu to create its very good “Venus” Sparc64-VIIIfx processor, which had big fat vector engines on it. The Tofu interconnect that Fujitsu finished after some initial development by NEC was also very good, and the resulting K machine was not only the fastest supercomputer in the world, but it was also the most efficient machine across many workloads. In fact, even Fugaku cannot beat its computational efficiency, even with the third generation Tofu D 6D mesh torus interconnect. (It is hard to squeeze all the flops out of progressively larger machines.) Why single out Fugaku and LineShine for the comparison? Well, both have been used to support trillion parameter GenAI models and both are also supporting traditional ModSim codes as well as mixing AI and HPC workloads to get real stuff done. The paper’s authors correctly point out that GPUs became compute engines in the first place because CPUs were not delivering enough flops at multiple precision and their memory subsystems did not supply enough bandwidth even if they did have a lot of math embedded into their designs. It was the combination of a lot of vector math and then even more powerful tensor math plus fast GDDR and then HBM stacked memory, which is skinny on capacity but enough to do useful work once solvers were parallelized, that made the GPU indispensable. The CPU makers were happy to sell two things instead of one, and eventually Nvidia was happy to sell CPUs and not just GPUs. But all things being equal, HPC shops would have simply preferred to stick with scale out networks and have CPUs with turbocharged math capabilities.

NEXTPLATFORM AD

Gradually, ever so slowly, this is happening. The paper’s authors call out the SVE vector extensions that were added by Arm with its Neoverse...

Three HPC Gurus Ask: Do We Still Need GPUs?

Related Articles

(no title)

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI

The labor share of income in the US is at its lowest post-war level