DeepSeek V4 Flash: Bringing Frontier AI to the Home

jonsoft1 pts0 comments

DeepSeek V4 Flash: Bringing Frontier AI to the Home

Introduction

In a home lab it is now possible to score 88.6% on the<br>Ph.D.-level science question benchmark GPQA Diamond !

The first time a frontier model achieved 88% on GPQA Diamond was GPT-5.1 (high)<br>on 13 Nov 2025:

Diagram from Epoch AI<br>(link),<br>© 2026, reproduced under the<br>Creative Commons 4.0 licence.<br>No modifications were made.

In other words, you can now run an open-weights model at home that is just<br>6 months behind SOTA commercial frontier models!

Hardware Setup

My hardware setup:

Two NVIDIA DGX Sparks, bought from<br>Scan Computers International Ltd.

Connected with a<br>QSFP112<br>cable purchased from<br>DigiKey<br>for GBP 32.00 (ships internationally in just 4 days)

A 400 mm × 300 mm (Circa A3) × 2.0 mm brass sheet bought from a seller called<br>Metaloffcuts on Amazon<br>for GBP 49.48

I bought the brass sheet to act as a heatsink, and I chose brass for its reasonable<br>thermal conductivity and aesthetic match to the beautiful gold Sparks! The Sparks are on<br>top of a couple of drive cages cooled via convection by a large fan blowing air through<br>the assemblies from front to back (the units intake air from the front and exhaust heat<br>from their rear). In the picture you can also see the blue QSFP112 cable, a<br>0.5 metre Amphenol cable (NJAAKR-0006*). This provides a<br>high speed (25 GB/s) connection between the devices, which will come in useful<br>shortly...

*Strictly speaking, this cable, NJAAKR0006, is a wider gauge<br>(30AWG instead of 32AWG) version of the NJAAKK0006 cable mentioned<br>here.<br>U.S. customers can buy an official cable direct from the<br>NVIDIA marketplace.

Network Setup

First I connected the QSFP112 cable to the Sparks' outermost ports (right-hand side<br>when viewed from the rear).

Then I followed the community<br>Network Setup Guide<br>to create a cluster from the Sparks:

Ensure that each Spark user is a member of the docker group.

Spark 1:

$ sudo tee /etc/netplan/40-cx7.yaml > /dev/null

Spark 2:

$ sudo tee /etc/netplan/40-cx7.yaml > /dev/null

Run on both Sparks:

$ sudo chmod 600 /etc/netplan/40-cx7.yaml<br>$ sudo netplan apply

Then on Spark 1:

$ wget https://raw.githubusercontent.com/NVIDIA/dgx-spark-playbooks/refs/heads/main/nvidia/connect-two-sparks/assets/discover-sparks<br>$ chmod +x discover-sparks<br>$ ./discover-sparks

Now the Sparks are connected via RDMA over Converged Ethernet (RoCE), explained<br>here.<br>This is not native InfiniBand, but runs in Ethernet mode.

I tested the bandwidth and latency by running the following:

Bandwidth Test

Spark 2:

$ ib_write_bw -d rocep1s0f1 --report_gbits -q 4 -R --force-link IB

* Waiting for client to connect... *

Spark 1:

$ ib_write_bw 192.168.177.12 -d rocep1s0f1 --report_gbits -q 4 -R --force-link IB<br>RDMA_Write BW Test<br>Dual-port : OFF Device : rocep1s0f1<br>Number of qps : 4 Transport type : IB<br>Connection type : RC Using SRQ : OFF<br>PCIe relax order: ON<br>ibv_wr* API : ON<br>TX depth : 128<br>CQ Moderation : 1<br>Mtu : 4096[B]<br>Link type : IB<br>Max inline data : 0[B]<br>rdma_cm QPs : ON<br>Data ex. method : rdma_cm<br>local address: LID 0000 QPN 0x0279 PSN 0xb8d762<br>local address: LID 0000 QPN 0x027a PSN 0x552aa4<br>local address: LID 0000 QPN 0x027b PSN 0x34011e<br>local address: LID 0000 QPN 0x027c PSN 0xf40d15<br>remote address: LID 0000 QPN 0x0278 PSN 0x9a7284<br>remote address: LID 0000 QPN 0x0279 PSN 0x933755<br>remote address: LID 0000 QPN 0x027a PSN 0xbbfb45<br>remote address: LID 0000 QPN 0x027b PSN 0x162679<br>#bytes #iterations BW peak[Gb/sec] BW average[Gb/sec] MsgRate[Mpps]<br>65536 20000 111.57 107.34 0.204742

Although only 13.4 GB/s is shown here, this is because only one logical port<br>of the pair is being used in the speed test. For the full story, see this article:<br>https://www.servethehome.com/the-nvidia-gb10-connectx-7-200gbe-networking-is-really-different/

Latency Test

Spark 2:

$ ib_write_lat -d rocep1s0f1 --report_gbits -R --force-link IB

* Waiting for client to connect... *

Spark 1:

$ ib_write_lat 192.168.177.12 -d rocep1s0f1 --report_gbits -R --force-link IB<br>RDMA_Write Latency Test<br>Dual-port : OFF Device : rocep1s0f1<br>Number of qps : 1 Transport type : IB<br>Connection type : RC Using SRQ : OFF<br>PCIe relax order: OFF<br>ibv_wr* API : ON<br>TX depth : 1<br>Mtu : 4096[B]<br>Link type : IB<br>Max inline data : 220[B]<br>rdma_cm QPs : ON<br>Data ex. method : rdma_cm<br>local address: LID 0000 QPN 0x027e PSN 0x19406<br>remote address: LID 0000 QPN 0x027d PSN 0xc06bb7<br>#bytes #iterations t_min[usec] t_max[usec] t_typical[usec] t_avg[usec] t_stdev[usec] 99% percentile[usec] 99.9% percentile[usec]<br>2 1000 1.92 2.32 1.97 1.97 0.00 2.08 2.32

Software Setup

Note that these branches may be merged in the future, which will simplify the setup!

I cloned Arthur Drozdov's<br>fork of the DGX Spark community vLLM Docker repository, maintained by<br>eugr<br>(eugr_nv since becoming an<br>NVIDIA employee!), on the<br>DGX Spark / GB10 User Forum.<br>(This forum is an amazing community for DGX Spark owners and an essential resource for<br>discovering the latest developments on the DGX Spark - this software...

spark sparks from address link cable

Related Articles