Moe Estimator – Simulate decode speed with layer-major prefetch hiding

ConteMascetti711 pts0 comments

MoE SSD-Streaming Speed Estimator (Pipelined)

MoE SSD-Streaming Speed Estimator

Includes Layer-Major Double Buffering (Prefetch Hiding) simulation and Model Presets

Model Preset

DeepSeek-V4 Pro<br>DeepSeek-V4 Flash<br>GLM 5.2 MoE<br>Custom...

Hardware Architecture

2 Tiers (RAM -> SSD) e.g., Apple Silicon<br>3 Tiers (VRAM -> RAM -> SSD) e.g., PC Desktop/Server

Model Parameters

Total Parameters (Billions)

Active Parameters per Token (Billions)

Total Experts

Active Experts

Number of Layers

Quantiz. (Bytes)

FP16 (2 B)<br>Q8 (1 B)<br>Q4 (0.5 B)<br>Q2 (0.25 B)

Hardware Tiers

Tier 1: GPU VRAM

Capacity (GB)

Bandwidth (GB/s)

Tier 2: PC RAM

Capacity (GB)

Bandwidth (GB/s)

Tier 3: SSD

Capacity (GB)

Bandwidth (GB/s)

Reading Strategy

Cache Hit Rate / I/O Efficiency (%)

Enable Pipelining<br>Layer-major double buffering

PIPELINED

Estimated Decode Speed

0.00 tok/s

Bottleneck: Calculating...

Model Weights Analysis

Total Size

0 GB

Core (Always-On)

0 GB

Single Expert

0 GB

Active per 1 Layer

0 GB

Model Allocation & Native I/O Times

VRAM 0 GB

Native read/tok: 0 ms

RAM 0 GB

Native read/tok: 0 ms

SSD 0 GB

Native read/tok: 0 ms

Without Pipelining: Times are added linearly.

model speed layer native estimator major

Related Articles