Kubescheduler: The Game

ImJasonH1 pts0 comments

kubescheduler — the game

Scenario

↺ Reset

▶ Start

1×<br>2×<br>4×

00:00

Node pool

+ Add node

Auto-schedule

Cluster autoscaler

? Help

Nodes

Pending pods

Click a pod to highlight where it can land, then click a node — or drag it onto one.

Event log

scheduler & cluster events

How to play

You are the Kubernetes scheduler and cluster operator .<br>Pods stream into the Pending queue and you must bind each one to a node that<br>satisfies all of its constraints — then keep the cluster cheap, busy, and responsive.

Scheduling a pod

Click a pending pod to select it. Feasible nodes glow green; infeasible nodes are dimmed and show the blocking reason.

Click a green node (or drag the pod onto it) to bind it.

Hit ⚡ on a pod to auto-place just that one on its best node.

The constraints (just like real k8s)

Resources — a pod's CPU/memory/GPU requests must fit in the node's free capacity.

nodeSelector — the node must carry the required label (e.g. disktype=ssd).

Taints & tolerations — a node's NoSchedule taint (e.g. nvidia.com/gpu, spot) blocks pods that don't tolerate it.

Pod anti-affinity — two replicas of the same app can't share a node.

Cordon — cordoned/booting nodes won't accept new pods.

Operating the cluster

+ Add node — provision capacity. New nodes take time to boot and cost $/hr the whole time.

Cordon — mark a node unschedulable without disturbing its pods.

Drain — cordon and gracefully evict workload pods back to the queue (small penalty). DaemonSet pods stay, just like kubectl drain.

Delete — terminate a node; any pods still on it are force-killed (big penalty). Drain first!

⤴ Upgrade — appears on out-of-date nodes: drains, then reboots the node onto the new version.

Curveballs

DaemonSets ⚙ — node agents (logging, metrics, kube-proxy, GPU plugin) that the controller runs on every node automatically. They're per-node overhead you can't move, so fewer/larger nodes waste less.

Spot nodes ⚡ — much cheaper, but billed at a fluctuating spot price and reclaimed without warning. You get a short Reclaiming countdown — drain it to save pods. Run only fault-tolerant work (batch) on spot.

Cluster upgrades — every so often the control plane jumps a version and every node falls behind (amber v1.x). Restart them responsibly, a few at a time, so workloads always have somewhere to land. Outdated nodes leak score until the rollout completes; finishing it pays a bonus.

Score

Each tick you earn for cluster utilization and lose for<br>pending pods (scheduling latency) and node cost (spot at the live price).<br>Finishing jobs pays a bonus and completing a version rollout pays a bigger one; SLA breaches, force-kills,<br>spot losses, and lingering out-of-date nodes all hurt. Pack tightly, schedule fast, run only the nodes you need —<br>and keep the fleet patched.

Tip: turn on Auto-schedule / Cluster autoscaler to watch a baseline policy play, then try to beat it by hand.

node nodes pods cluster spot pending

Related Articles