The AI era is pulling FP64 hardware away from scientific HPC - Fortran Discourse
= 40rem)" rel="stylesheet" data-target="desktop" />
= 40rem)" rel="stylesheet" data-target="chat_desktop" /><br>= 40rem)" rel="stylesheet" data-target="discourse-ai_desktop" /><br>= 40rem)" rel="stylesheet" data-target="discourse-reactions_desktop" /><br>= 40rem)" rel="stylesheet" data-target="poll_desktop" />
The AI era is pulling FP64 hardware away from scientific HPC
szaghi
June 17, 2026, 9:31am
Hi all.
I have one, maybe two, questions for you. This question came out of a webinar series on High Performance Computing (HPC) I took part in (the Italy–Germany HPC webinars organised on the Italian side through CNR). I raised this concern there, and my impression was that the other speakers did not share it to the same degree. The room leaned more optimistic than I am. That is exactly why I want to put it to a wider audience: I may be wrong, and I would like to hear where others land.
The concern is precision. Most scientific HPC needs double precision (FP64). In computational fluid dynamics, which is my field, we resolve physical scales spanning many orders of magnitude, and to do that correctly (with very high-order accuracy methods), we need 64-bit floating point.
AI computing does not need this. Training and inference work well at 8-bit (now even at 4-bit). So, the two workloads require different hardware: AI needs many low-precision cores, while science requires strong FP64 capabilities.
The problem is that the vendors follow the AI market because that is where the money is. Comparing on vector FP64 (peak, dense), the recent trend is to hold it flat or lower it, and spend the transistors on low-precision math instead:
NVIDIA H100: 34 TFLOP/s vector FP64, or 67 with the FP64 tensor-core path. The newer B200 does about 40 vector FP64. Blackwell dropped the dedicated FP64 tensor-core path that Hopper had, and gained around 20 PFLOP/s of FP4 for AI. The Rubin roadmap reportedly cuts FP64 further.
AMD MI300X: 81.7 TFLOP/s FP64. The newer MI355X does 78.6, below its own predecessor, with the gains all in FP8/FP4 for AI inference.
Intel has stepped back from a dedicated HPC GPU. Its current HPC silicon, the Max-series (Ponte Vecchio) in Aurora, has no standalone successor. Intel cancelled Falcon Shores as a product in early 2025 and folded its HPC and AI lines into one chip, Jaguar Shores, due around 2026/2027. Intel describes it as serving both AI and HPC, but says it will compete on total cost of ownership rather than peak FLOPS, and has published no FP64 figure.
Consumer silicon makes the direction plainest. NVIDIA’s N1X, the new Blackwell laptop chip, publishes only AI-precision figures (NVFP4, around 1000 TOPS) and quotes no FP64 at all. Double precision is simply not a design goal there.
So across all three vendors the direction looks the same. The new chips are built for AI, and double precision gets quietly de-prioritized along the way.
There is one strong counter-current. AMD’s MI430X, coming this year, is a deliberate HPC part. AMD claims more than 200 TFLOP/s of FP64, and independent estimates back out around 211 from the Alice Recoque exascale contract, which would be the highest of any GPU so far, while it still carries FP4/FP8 for AI. It will power Alice Recoque, the next European exascale machine, alongside the US Discovery and Germany’s Herder. So a dedicated FP64 line still exists, for now.
But it is one product line, from one vendor, against a whole market moving the other way. That is what I cannot resolve: whether a first-class FP64 hardware line survives, or shrinks to a small premium niche while everything else is optimized for AI.
Two questions for you:
Do you share this concern, or do you think I am overstating it?
If you share it, do you already see a way out?
I would be glad to hear how others in the Fortran and HPC community are thinking about this.
Stefano
6 Likes
mhulsen
June 17, 2026, 10:41am
There is also NextSilicon’s Maverick-2 in Sandia’s supercomputer Spectra.
1 Like
jorgeg
June 17, 2026, 11:04am
Hi Stefano,
This is in my mind constantly. However, I’ve talked to people that work at either nvidia or AMD and they’ve all assured me that FP64 is not going to be dropped like a hot potato. Cards are separating into two lines, the B300 for example is for AI/inference procedures while the B200 is going to be for science, HPC stuff.
The thing that is true is that we won’t see the crazy increase in FP64 performance, i.e. V100 FP64 was 7.9 TFLOP/s, A100 was 19 (??), H200 was 34 ish and the B200 is as you pointed out less than that creep at 40 TFLOP/s.
The thing here is that most of the workloads out there are not using the peak FLOP rate of these things, not everything is DGEMMs there are a limited amount of applications that can really run these. Most of other apps I feel are limited by memory bandwidth which is important to both AI and HPC workflows. AMD however seems to be...