A Deep Dive on China's "LineShine" All-CPU, Exaflops-Class Supercomputer

A Deep Dive On China’s “LineShine” All-CPU, Exaflops-Class Supercomputer

Jump to main content

NEXTPLATFORM AD

A Deep Dive On China’s “LineShine” All-CPU, Exaflops-Class Supercomputer

Timothy Prickett Morgan

Timothy Prickett Morgan

Co-Editor, Co-Founder, The Next Platform

Published thu 25 Jun 2026 // 16:48 UTC

It has been nine years since a Chinese HPC supercomputer was at the top of the High Performance Linpack performance rankings, but as we all know, China did break through the exascale flops barrier at 64-bit precision before the United States did – and on two different systems. China didn’t brag about it, but let enough information leak out to US experts so the word would get out. To be specific, the Sunway OceanLight machine installed at NSC Qingdao came first, which we initially talked about in February 2021 and then did a deep dive on the system’s architecture in March 2022. I think the full OceanLight system, based on the Sunway SW26010-Pro CPU and with 41.93 million cores, had a peak theoretical performance and hit 1.22 petaflops on HPL, is rumored to have been up and running in March 2021.

NEXTPLATFORM AD

The Tianhe-3 supercomputer based on a hybrid Phytium 2000 Arm processor and the Matrix 3000 DSP co-processor, which had a peak theoretical performance of around 2.05 exaflops and delivered maybe 1.57 exaflops on HPL, was fired up initially at NSC Guangzhou at lower performance numbers in October 2021. This was not the full and was apparently fully fleshed out in December 2023 at the numbers we cite. But in late 2021, using older Matrix 2000+ DSP coprocessors, the Tianhe-3 prototype still weighed in at 1.3 exaflops on HPL against 1.7 exaflops peak. This beat out Oak Ridge’s “Frontier” hybrid AMD “Trento” Epyc CPU paired with four AMD “Aldebaran” MI250X GPU accelerators, which had just under 8.7 million cores across those compute engines and which was rated at 1.19 exaflops on HPL and 1.68 exaflops peak. The Frontier system was accepted and put into production in May 2022. China beat this date and this performance by more than a year, and using older process technologies, running hot, taking up a lot of space, and presumably high cost even if the compute engines are hanging back on chip manufacturing processes because China has no choice if it is going to use its indigenous Semiconductor Manufacturing International Corp (SMIC) foundry. These machines were expensive, but if you want to design airframes for travel and war, and nuclear weapons, then you can’t wait around for the United States to stop embargoing Nvidia and AMD GPUs or other compute engines and maybe networking. China wants to – and can easily afford to and has the will to – stand on its own two feet. The same approach was taken with the new number one ranked machine on the Top500 supercomputer rankings – the “LineShine” supercomputer installed at NSC Shenzhen in China. But all of the technologies used to make LineShine have advanced by five years and that is why this system is not only bigger, but is arguably better than its OceanLight and Tianhe-3 predecessors.

NEXTPLATFORM AD

LineShine is based on an Armv9.2 CPU core that has SVE2 vector units and the relatively new SME matrix math units as well as integer processing units, and in this sense this is like Intel P-Core Xeon processors that have integer processing as well as AVX vector units and AMX matrix units. To one way of thinking about it, the LX2 and modern Xeon P-core compute engines are a kind of hybrid CPU-GPU complex with the graphics capabilities stripped out. The LX2 chip was designed by NSC Shenzhen in conjunction with Chinese IT giant Huawei (presumably its HiSilicon chip division). The LingKun LX2 CPU design has 304 active cores in a socket, and very likely there are more cores on the chip to increase the yield. The LineShine machine has a proprietary LingQi LQLink interconnect, which I am reasonably sure is based on a variation of InfiniBand technology but it could be a jacked-up and stripped-down version of Ethernet. This LX2 CPU delivers enough FP64 oomph with its SVE2 and SME math units that it only takes 13.79 million cores to deliver a peak theoretical performance of 2.74 exaflops (rounding to three significant digits). That is 32.9 percent fewer cores than the OceanLight CPU, which also was a hybrid CPU-vector-matrix design, to deliver 46.7 percent more performance. On the HPL test, LineShine delivers just a tad under 2.2 exaflops of oomph and that makes it 21.5 percent more powerful than the former top ranked machine, the “El Capitan” supercomputer based on AMD MI300A compute engines located at Lawrence Livermore National Laboratory in the United States. China no doubt wanted to top El Capitan, but more importantly, it wanted to top OceanLight and Tianhe-3 machines. Let’s take a deeper dive into this LineShine machine, which we wish had been nicknamed “Sunbeam” because that is what it sounds like.

NEXTPLATFORM AD

NSC Shenzhen put out a...

A Deep Dive on China's "LineShine" All-CPU, Exaflops-Class Supercomputer

Related Articles

US Government directive to suspend access to Fable 5 and Mythos 5

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI

How to Earn a Billion Dollars