Six months on the Strix Halo chip AMD now markets as "first-class ROCm"

AMD Is Selling "First-Class ROCm" on Strix Halo. I've Run the Same Chip for Six Months.

A note on links: some hardware below is linked through affiliate programs, marked (affiliate). If you buy through one, I earn a small commission at no cost to you. It never changes what I recommend or what I run — these are the boxes I'd point you to either way.

On June 8, Micro Center opened pre-orders for the AMD Ryzen AI Halo, a $3,999 developer box built on the Ryzen AI Max+ 395. That is the same Strix Halo silicon sitting in my Bosgame M5. The pitch is local AI without the cloud bill, and the headline is the software: full ROCm support, a pre-configured stack, day-zero model support, and the now-repeated framing that AMD treats this APU as a "first-class ROCm citizen." AMD's own cost-per-token comparison for the box, the one behind the "pays for itself" claim, runs on Qwen 3.6 35B.

I have run that exact chip, in production, since the start of the year. Qwen3.6-35B-A3B is my production model, and it is the model behind AMD's own "pays for itself" math for this box. In six months, ROCm has never loaded a single model on my board under the standard packages.

That reads like a contradiction with everything AMD is selling. It is not, quite, and the way it is not is the entire point of this post. The chip is fine. The hardware is fine. What I have spent six months mapping is the distance between "ROCm" as a word on a product page and ROCm as a stack you actually have to assemble, on hardware AMD did not build and does not validate. The Halo Box is AMD's answer to that distance. It is also, just by existing, the clearest admission that the distance is real.

Here is the map, because I think it is more useful than another spec comparison.

What "no ROCm" actually means here

My board hits a specific, repeatable wall. Every attempt to load a model into the GPU under ROCm dies with the same HSA fault, Memory critical error by agent node-0 ... Reason: Memory in use, fired during the GPU's queue setup, before the model is even resident. It is filed as ROCm issue #6182.

The detail that matters: this is board-specific, not chip-specific. The same silicon runs ROCm fine on other machines. A user with a Minisforum MS-S1 Max (affiliate), identical gfx1151, has ROCm working. I went through fourteen ROCm configurations on my board and got fourteen failures. So I did the only thing that kept the box useful: I version-locked ROCm at 6.4, froze all 66 packages so nothing could drift, and moved my entire stack to Vulkan through Mesa's RADV driver. Everything I publish runs on Vulkan because ROCm is not an option here.

Last week I tested whether firmware was the lever. Bosgame had shipped a newer BIOS, so I flashed it and re-ran the exact thing ROCm dies on. Same fault. A newer BIOS does not fix it, and the firmware came with no changelog to even suggest it might.

That was the state of things a week ago: board-specific bug, no firmware fix, Vulkan-only, six months in. Then a thread on r/StrixHalo changed my understanding of it, and that is the part worth your time.

The same vendor ships ROCm that crashes, ROCm that hangs, and ROCm that almost works

A commenter pointed out that AMD's own Lemonade server, which bundles its own build of llama.cpp on a different ROCm runtime (AMD's TheRock pipeline), runs models on this hardware where the standard ROCm packages crash. So I tested it, isolated in its own directory, with nothing else on the box changed: same board, same kernel, same BIOS, system ROCm untouched.

The result split cleanly, and both halves are findings.

The crash is gone. Under the TheRock 7.13 runtime that Lemonade bundles, a 1.5B model loads fully onto the GPU, all layers, and generates at 243 tokens per second. The HSA fault that the official ROCm 7.2.x packages and the community container both throw on this exact box simply does not happen. That tells you something the issue tracker has been circling for two months: #6182 is a bug in the ROCm userspace build, not in the board. Swap the runtime, keep everything else, and the wall moves.

But it is not a working path. With the whole model on the GPU (-ngl 99), anything over roughly 20 GB loads into memory and then hangs in initialization, before a single layer is assigned. No crash, no error, just stuck. I tested my 35B MoE production model and a 27B dense model of similar size. Both hang at the same spot. I ruled out the obvious explanations one at a time:

Not a locked-memory limit. Running as root with unlimited memlock changes nothing; it hangs identically.

Not MoE-specific. The dense model of the same size hangs the same way.

Not a stale build. The bleeding-edge nightly channel hangs too.

What is left is the large GPU allocation itself. There is one way around it: split the model, keep most of it on the GPU and spill the rest to the CPU (-ngl 20 and similar). That loads, and it generates. At about 3 tokens per second, CPU-bound, which is...

Six months on the Strix Halo chip AMD now markets as "first-class ROCm"

Related Articles

Apple WWDC 2026 Livestream

Claude Fable 5

US Government directive to suspend access to Fable 5 and Mythos 5

German ruling declares Google liable for false answers in AI Overviews

Britain Became as Poor as Mississippi