DGX station and "frontier" models, my hunt for answers

Stories/Open Source DGX Station profile: the local AI box everyone wants benchmarked

Connor Turland·July 02, 2026·Copy link

NVIDIA says DGX Station can support models up to 1 trillion parameters. That is the sentence that makes everyone stop scrolling. The careful version is harder. Can you put a 1T-class model somewhere in the 748GB coherent memory pool? Sometimes, depending on the model, quant, cache, and runtime. Can you run the full weights of a frontier open model as if the whole box were 748GB of HBM? No. That would be a very expensive mistake. So we went looking for the real answer. We have been trying to understand the actual machine underneath the pitch. We have talked to Cornell researchers who had temporary access. We talked to NVIDIA. We inquired about buying one and priced it at about $100,000 per machine. We searched the NVIDIA forums, Reddit, X, LocalMaxxing, model cards, and conference-floor posts. We asked people at the AI Engineer local table for GLM-5.2 numbers because public numbers were thin. This is a profile of DGX Station from that reporting. It is not a review. We do not have a full benchmark suite yet, but we do have some useful numbers hinting the way. The local AI interest itself is part of the story. This is not just a few people trying to make old GPUs do one more trick. At AI Engineer, the Local AI room was full of people who knew exactly why the memory split mattered and why the public numbers were still not enough.

Photo credit: Ahmad Osman. Original X post: x.com/TheAhmadOsman/status/2072789682254180776.

The question is simple: does DGX Station let you buy a lot of local model capacity without giving up the memory behavior that makes GPU inference fast? The machine

NVIDIA's DGX Station page says the machine is powered by GB300 Grace Blackwell Ultra, has 748GB of coherent memory, and supports models up to 1 trillion parameters. It does not give you 748GB of HBM. The split that matters is: Memory tierCapacityBandwidthWhy it mattersHBM3e252GB7.1TB/sfast GPU memory tierLPDDR5X496GB396GB/slarger CPU-side memory tierTotal coherent memory748GBmixedaddressable pool, not all GPU-speed memoryNVLink-C2Cn/a900GB/s listedCPU-GPU link, with real workload caveats That is the entire debate in one table. The capacity number is real. The memory split is also real. If your model and KV cache stay inside HBM, the machine is easy to understand. If the active workload crosses into LPDDR5X, the question becomes empirical: how much does prefill, decode, long context, and concurrency change? There is another asterisk on the link between the CPU and GPU. Stas Bekman measured NVLink-C2C on DGX Station and reported that it was not behaving like 900GB/s full duplex end-to-end in his bidirectional test. His point was not that DGX Station is useless. His point was that the marketing number and the workload behavior are not the same thing. That is why we care about actual runs more than spec sheets. The $100k question

The DGX Station price we got was about $100,000 per machine. At that price, the machine is not competing with a hobby rig. It is competing with other ways to answer the same three questions: what can we run, how does it feel, and how much does it cost? The GLM-5.2-specific version of that comparison is in the GLM 5.2 local hardware requirements post. For DGX Station, the competing buckets are: AlternativeWhat it buysmulti-GPU RTX PRO 6000 rigsmore conventional VRAM capacity, more assembly and operations workcloud inferenceno hardware ownership, but recurring per-token billsMac Studio-style unified memorylots of addressable memory, much less GPU-style memory bandwidthDGX Spark clusterscheaper nodes, but different memory and interconnect tradeoffs The buying question is not simply "can I fit a big model?" A lot of machines can fit a heavily quantized big model somewhere in memory. The buying question is: can we serve useful local frontier-ish workloads with enough speed, context, and concurrency to justify a six-figure workstation? Why people are skeptical

The local AI crowd is not confused about the headline number. The skepticism is sharper than that. This r/LocalLLaMA thread about 4x-8x RTX PRO 6000 systems, especially the comment thread starting here, gets at the objection. Someone asks why not buy DGX Station instead of 4-8 RTX PRO 6000s. The reply is basically: DGX Station does not actually have 748GB of VRAM. It has a 252GB HBM tier plus a larger LPDDR5X tier. The NVIDIA Developer Forums have the same question in more formal language: can a 1T-class model really be served well on this memory layout, and has anyone seen real tokens/sec benchmarks for the GB300 DGX Station when context or weights move past HBM3e? That is the right standard. Not "does the model load?" The standard is "what happens to prefill, decode, context, and concurrency when the workload crosses the fast memory tier?" Who has touched one

DGX Station access is still scarce enough that the list of people with...

DGX station and "frontier" models, my hunt for answers

Related Articles

(no title)

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

ZCode – Harness for GLM-5.2

Apertus – Open Foundation Model for Sovereign AI