MoE SSD-Streaming Speed Estimator (Pipelined)
MoE SSD-Streaming Speed Estimator
Includes Layer-Major Double Buffering (Prefetch Hiding) simulation and Model Presets
Model Preset
DeepSeek-V4 Pro<br>DeepSeek-V4 Flash<br>GLM 5.2 MoE<br>Custom...
Hardware Architecture
2 Tiers (RAM -> SSD) e.g., Apple Silicon<br>3 Tiers (VRAM -> RAM -> SSD) e.g., PC Desktop/Server
Model Parameters
Total Parameters (Billions)
Active Parameters per Token (Billions)
Total Experts
Active Experts
Number of Layers
Quantiz. (Bytes)
FP16 (2 B)<br>Q8 (1 B)<br>Q4 (0.5 B)<br>Q2 (0.25 B)
Hardware Tiers
Tier 1: GPU VRAM
Capacity (GB)
Bandwidth (GB/s)
Tier 2: PC RAM
Capacity (GB)
Bandwidth (GB/s)
Tier 3: SSD
Capacity (GB)
Bandwidth (GB/s)
Reading Strategy
Cache Hit Rate / I/O Efficiency (%)
Enable Pipelining<br>Layer-major double buffering
PIPELINED
Estimated Decode Speed
0.00 tok/s
Bottleneck: Calculating...
Model Weights Analysis
Total Size
0 GB
Core (Always-On)
0 GB
Single Expert
0 GB
Active per 1 Layer
0 GB
Model Allocation & Native I/O Times
VRAM 0 GB
Native read/tok: 0 ms
RAM 0 GB
Native read/tok: 0 ms
SSD 0 GB
Native read/tok: 0 ms
Without Pipelining: Times are added linearly.