I Put a Datacenter GPU in My Gaming PC for £200

I Put a Datacenter GPU in My Gaming PC for £200 :: The Tymscar BlogI Put a Datacenter GPU in My Gaming PC for £200 2026-05-30Oscar Molnar13 min read (2710 words) #homelab #gpu #nixos #local-llm #hardware #debugging I already had an RTX 4080. 16GB of VRAM. Good enough for gaming, not good enough for the models I wanted to run locally. The next step up in GPU land is either spend a fortune on a card with more VRAM, or find another way. I found another way. I bought a datacenter GPU that doesn’t even have a normal PCIe connector, stuck it in my gaming PC with an adapter, and now I have 32GB of VRAM across two GPUs running a 27 billion parameter model at 32 tokens per second. The whole thing cost me £200. The GPU#

This is a Tesla V100 SXM2 16GB. It was designed for NVIDIA’s DGX servers and hyperscaler racks. The SXM2 form factor means it does not have a PCIe slot. It does not have display outputs. It does not have a normal power connector. It sits on a proprietary board inside a server rack and communicates over NVLink.

You cannot plug this into a motherboard. Not without help. But here is the thing: this is a Volta GPU with 16GB of HBM2 memory, 5120 CUDA cores, and I picked it up for about £150 on eBay. The compute is still real. The VRAM is still real. And the memory bandwidth is where it gets genuinely surprising. HBM2 is a different class of memory. The V100 has a 4096-bit memory bus delivering 900 GB/s of bandwidth. To put that in perspective, my RTX 4080 with its fancy GDDR6X manages 736 GB/s. The V100 from 2017 has 22% more memory bandwidth than a GPU that launched in 2022. And it is not just NVIDIA’s consumer cards that lose. Apple’s M3 Max does 400 GB/s. The M4 Max does 546 GB/s. The brand new M5 Max, which will set you back over £3,000 for a laptop, manages 614 GB/s. A GPU from 2017 beats every Mac on the market. The closest AMD competition to my 4080 is the RX 7900 XTX, which does 960 GB/s on its 24GB of GDDR6. Technically that edges out the V100, but the 7900 XTX costs £700+ and ROCm support for LLM inference is still rough compared to CUDA. The V100 gives you 94% of that bandwidth for less than a quarter of the price, and it just works with llama.cpp. The only consumer GPU that comfortably beats it is the RTX 5090 at 1,792 GB/s, and that card costs over £2,000. For LLM inference, where memory bandwidth is the bottleneck that determines your tokens per second, this matters more than almost anything else. The only problem is the connector. The adapter# Turns out, someone makes an SXM2-to-PCIe adapter. It is not made by NVIDIA. It is not officially supported by anyone. It is a bare PCB with the SXM2 socket on one side and a PCIe edge connector on the other. I paid about £50 for it. Half of that might just be the copper.

So for about £200 total, I had a 16GB VRAM GPU that could slot into my motherboard alongside my RTX 4080. That is 32GB of total VRAM. A single RTX 5090 with 32GB costs over £2,000. I am not saying this is the same experience. I am saying the VRAM is the same. The fan from hell# Before I could do anything useful with the V100, I had to deal with the fan. The V100 SXM2 was designed to live inside a 2U server with industrial cooling. The fan on the adapter is not subtle. It is not quiet. It is not something you want in a room you also sleep in. I measured it with my Apple Watch:

82 decibels. That is somewhere between a garbage disposal and a lawnmower, well past “loud PC” and into “should I be wearing earplugs in my own house” territory. And the worst part: you cannot control it. I tried nvidia-smi, I tried scanning for it on Linux, I even tried Afterburner on Windows (more on that later, the whole setup barely works on Windows). Nothing. The fan on this adapter is not designed to be controlled. It is designed to run at 100%, forever, inside a server rack where nobody has to hear it. Here is me trying to figure out the fan pinout. I guessed it might be a standard case fan pinout on a weird connector, so I jammed two jumper wires into VCC and ground and prodded a 9V battery against them. It spun. And it was so much quieter than the 12V it normally gets:

Your browser does not support the video tag.That confirmed the pinout and gave me hope that the fan could actually be tamed. Making the fan listen to reason# The 9V battery test told me the pinout was standard case fan territory, just with a weird connector. The next question was whether the fan would actually respond to PWM control if I wired the tachometer and PWM pins to my motherboard. So I shoved some jumper wires into the connector and jammed the other ends into a spare fan header (turn your volume up):

Your browser does not support the video tag.It works. The motherboard can read the RPM and the fan responds to PWM. I keep it at 10%. It never goes above 50C even at full load, and I cannot really hear it. Now I just needed a proper cable instead of jumper wires held in by...

I Put a Datacenter GPU in My Gaming PC for £200

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy

SpaceX not the behemoth everyone thought

Naphtha Shortages Having a Growing Impact in Japan