Moe Estimator – Simulate decode speed with layer-major prefetch hiding

MoE SSD-Streaming Speed Estimator (Pipelined)

MoE SSD-Streaming Speed Estimator

Includes Layer-Major Double Buffering (Prefetch Hiding) simulation and Model Presets

Model Preset

DeepSeek-V4 Pro DeepSeek-V4 Flash GLM 5.2 MoE Custom...

Hardware Architecture

2 Tiers (RAM -> SSD) e.g., Apple Silicon 3 Tiers (VRAM -> RAM -> SSD) e.g., PC Desktop/Server

Model Parameters

Total Parameters (Billions)

Active Parameters per Token (Billions)

Total Experts

Active Experts

Number of Layers

Quantiz. (Bytes)

FP16 (2 B) Q8 (1 B) Q4 (0.5 B) Q2 (0.25 B)

Hardware Tiers

Tier 1: GPU VRAM

Capacity (GB)

Bandwidth (GB/s)

Tier 2: PC RAM

Capacity (GB)

Bandwidth (GB/s)

Tier 3: SSD

Capacity (GB)

Bandwidth (GB/s)

Reading Strategy

Cache Hit Rate / I/O Efficiency (%)

Enable Pipelining Layer-major double buffering

PIPELINED

Estimated Decode Speed

0.00 tok/s

Bottleneck: Calculating...

Model Weights Analysis

Total Size

0 GB

Core (Always-On)

0 GB

Single Expert

0 GB

Active per 1 Layer

0 GB

Model Allocation & Native I/O Times

VRAM 0 GB

Native read/tok: 0 ms

RAM 0 GB

Native read/tok: 0 ms

SSD 0 GB

Native read/tok: 0 ms

Without Pipelining: Times are added linearly.

Moe Estimator – Simulate decode speed with layer-major prefetch hiding

Related Articles

(no title)

AI has torched the market for junior programmers

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

ZCode – Harness for GLM-5.2