DeepSeek V4 Flash: Bringing Frontier AI to the Home

Introduction

In a home lab it is now possible to score 88.6% on the Ph.D.-level science question benchmark GPQA Diamond !

The first time a frontier model achieved 88% on GPQA Diamond was GPT-5.1 (high) on 13 Nov 2025:

In other words, you can now run an open-weights model at home that is just 6 months behind SOTA commercial frontier models!

Hardware Setup

My hardware setup:

Two NVIDIA DGX Sparks, bought from Scan Computers International Ltd.

Connected with a QSFP112 cable purchased from DigiKey for GBP 32.00 (ships internationally in just 4 days)

A 400 mm × 300 mm (Circa A3) × 2.0 mm brass sheet bought from a seller called Metaloffcuts on Amazon for GBP 49.48

I bought the brass sheet to act as a heatsink, and I chose brass for its reasonable thermal conductivity and aesthetic match to the beautiful gold Sparks! The Sparks are on top of a couple of drive cages cooled via convection by a large fan blowing air through the assemblies from front to back (the units intake air from the front and exhaust heat from their rear). In the picture you can also see the blue QSFP112 cable, a 0.5 metre Amphenol cable (NJAAKR-0006*). This provides a high speed (25 GB/s) connection between the devices, which will come in useful shortly...

*Strictly speaking, this cable, NJAAKR0006, is a wider gauge (30AWG instead of 32AWG) version of the NJAAKK0006 cable mentioned here. U.S. customers can buy an official cable direct from the NVIDIA marketplace.

Network Setup

First I connected the QSFP112 cable to the Sparks' outermost ports (right-hand side when viewed from the rear).

Then I followed the community Network Setup Guide to create a cluster from the Sparks:

Ensure that each Spark user is a member of the docker group.

Spark 1:

$ sudo tee /etc/netplan/40-cx7.yaml > /dev/null

Spark 2:

$ sudo tee /etc/netplan/40-cx7.yaml > /dev/null

Run on both Sparks:

$ sudo chmod 600 /etc/netplan/40-cx7.yaml $ sudo netplan apply

Then on Spark 1:

$ wget https://raw.githubusercontent.com/NVIDIA/dgx-spark-playbooks/refs/heads/main/nvidia/connect-two-sparks/assets/discover-sparks $ chmod +x discover-sparks $ ./discover-sparks

Now the Sparks are connected via RDMA over Converged Ethernet (RoCE), explained here. This is not native InfiniBand, but runs in Ethernet mode.

I tested the bandwidth and latency by running the following:

Bandwidth Test

Spark 2:

$ ib_write_bw -d rocep1s0f1 --report_gbits -q 4 -R --force-link IB

* Waiting for client to connect... *

Spark 1:

$ ib_write_bw 192.168.177.12 -d rocep1s0f1 --report_gbits -q 4 -R --force-link IB RDMA_Write BW Test Dual-port : OFF Device : rocep1s0f1 Number of qps : 4 Transport type : IB Connection type : RC Using SRQ : OFF PCIe relax order: ON ibv_wr* API : ON TX depth : 128 CQ Moderation : 1 Mtu : 4096[B] Link type : IB Max inline data : 0[B] rdma_cm QPs : ON Data ex. method : rdma_cm local address: LID 0000 QPN 0x0279 PSN 0xb8d762 local address: LID 0000 QPN 0x027a PSN 0x552aa4 local address: LID 0000 QPN 0x027b PSN 0x34011e local address: LID 0000 QPN 0x027c PSN 0xf40d15 remote address: LID 0000 QPN 0x0278 PSN 0x9a7284 remote address: LID 0000 QPN 0x0279 PSN 0x933755 remote address: LID 0000 QPN 0x027a PSN 0xbbfb45 remote address: LID 0000 QPN 0x027b PSN 0x162679 #bytes #iterations BW peak[Gb/sec] BW average[Gb/sec] MsgRate[Mpps] 65536 20000 111.57 107.34 0.204742

Although only 13.4 GB/s is shown here, this is because only one logical port of the pair is being used in the speed test. For the full story, see this article: https://www.servethehome.com/the-nvidia-gb10-connectx-7-200gbe-networking-is-really-different/

Latency Test

Spark 2:

$ ib_write_lat -d rocep1s0f1 --report_gbits -R --force-link IB

* Waiting for client to connect... *

Spark 1:

$ ib_write_lat 192.168.177.12 -d rocep1s0f1 --report_gbits -R --force-link IB RDMA_Write Latency Test Dual-port : OFF Device : rocep1s0f1 Number of qps : 1 Transport type : IB Connection type : RC Using SRQ : OFF PCIe relax order: OFF ibv_wr* API : ON TX depth : 1 Mtu : 4096[B] Link type : IB Max inline data : 220[B] rdma_cm QPs : ON Data ex. method : rdma_cm local address: LID 0000 QPN 0x027e PSN 0x19406 remote address: LID 0000 QPN 0x027d PSN 0xc06bb7 #bytes #iterations t_min[usec] t_max[usec] t_typical[usec] t_avg[usec] t_stdev[usec] 99% percentile[usec] 99.9% percentile[usec] 2 1000 1.92 2.32 1.97 1.97 0.00 2.08 2.32

Software Setup

Note that these branches may be merged in the future, which will simplify the setup!

I cloned Arthur Drozdov's fork of the DGX Spark community vLLM Docker repository, maintained by eugr (eugr_nv since becoming an NVIDIA employee!), on the DGX Spark / GB10 User Forum. (This forum is an amazing community for DGX Spark owners and an essential resource for discovering the latest developments on the DGX Spark - this software...

DeepSeek V4 Flash: Bringing Frontier AI to the Home

Related Articles

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play

Old Reddit Is Down

The ultimate female fantasy – A feminist critique of Beauty and the Beast