Provisioning a Private Talos Kubernetes Cluster on Hetzner Cloud

onatm2 pts0 comments

Provisioning a Private Talos Kubernetes Cluster on Hetzner Cloud | Onat Mercan’s Blog

This is a follow up to Private Networking on Hetzner Cloud with Tailscale

The previous post was about the network. This one is about what I put inside that network: a private Kubernetes cluster running Talos on Hetzner Cloud.

The important part is not just “Kubernetes on Hetzner Cloud”. There are many posts about it. The part I cared about was making the cluster private from the first boot. No public IPs on the control plane. No public IPs on the workers. Access only through the Tailnet.

That made Talos a good fit. No package manager, no SSH. You give it machine configuration, it becomes a Kubernetes node, and that is mostly it.

Mostly.

What I Wanted from the Cluster

Private-only nodes : every Kubernetes node should live only on the Hetzner private network.

Terraform-managed bootstrap : machines, Talos config, kubeconfig, and base add-ons should come from code.

Talos : no manual server maintenance.

Separate node pools : platform components should not fight application workloads.

GitOps : Terraform can bootstrap ArgoCD, then ArgoCD owns the platform.

The goal was to build something small enough that I could understand every moving part, but powerful enough that I could run actual projects on it.

Cluster Shape

The private network from the previous post gives the cluster a /24 to live in. I split that range into explicit chunks:

Control plane: 10.0.128.16/28

Platform workers: 10.0.128.32/27

General workers: 10.0.128.64/27

Service network: 10.0.192.0/21

Pod network: 10.0.200.0/19

The control plane has three nodes. Platform workers run things like ArgoCD and platform components. General workers run applications like snapbyte.dev.

flowchart TB<br>Tailnet((Tailnet))<br>Internet((Internet))

subgraph VPC["Private network 10.0.0.0/16"]<br>subgraph Subnet["Subnet 10.0.128.0/24"]<br>NAT["NAT Gateway"]

subgraph CP["Control Plane 10.0.128.16/28"]<br>CP1["cp-1"]<br>CP2["cp-2"]<br>CP3["cp-3"]<br>end

subgraph Platform["Platform Workers 10.0.128.32/27"]<br>ArgoCD["ArgoCD"]<br>PlatformApps["Platform components"]<br>end

subgraph General["General Workers 10.0.128.64/27"]<br>PublicApps["Public apps"]<br>InternalApps["Internal apps"]<br>end<br>end<br>end

Tailnet -->|kubectl and talosctl| CP1<br>Tailnet --> Platform<br>Tailnet --> General<br>CP --> NAT<br>Platform --> NAT<br>General --> NAT<br>NAT --> Internet

The Kubernetes API endpoint is the first control plane node’s private IP:

locals {<br>cluster_endpoint = "https://${local.control_plane_private_ips[0]}:6443"

That endpoint is only useful if you are already inside the private network through Tailscale.

Building the Talos Image

Before Terraform could create any nodes, I needed a Talos image that Hetzner could boot.

I started this cluster on Talos v1.11.3. The later v1.12.6 upgrade came from an operational incident, not the initial design.

Hetzner does not give you Talos as an image option, so I build my own snapshot with Packer. The flow is based on hcloud-talos/terraform-hcloud-talos.

It starts a temporary Hetzner server, downloads the Talos raw image from the Talos Image Factory, writes it to disk, and saves the result as a snapshot.

variable "talos_version" {<br>type = string<br>default = "v1.11.3"

source "hcloud" "talos" {<br>rescue = "linux64"<br>image = "debian-11"<br>location = "nbg1"<br>server_type = "cx22"<br>ssh_username = "root"

snapshot_name = "talos-${var.talos_version}-amd64"<br>snapshot_labels = {<br>type = "infra"<br>os = "talos"<br>version = var.talos_version<br>arch = "amd64"

The label part is the important bit. Terraform can later find the image by selector instead of relying on snapshot name:

data "hcloud_image" "talos" {<br>with_selector = "os=talos,type=infra,version=${var.talos_version},arch=amd64"

Worker Pools

Before creating the machines, I needed a way to describe what kind of nodes I wanted.

This is basically the same idea as node pools in managed Kubernetes offerings. GKE, EKS, and AKS all let you create groups of nodes with different sizes, labels, or taints. I wanted the same mental model.

Each pool also gets its own Hetzner placement group. That tells Hetzner to spread the nodes in that pool across different physical hosts where possible. It does not make the pool highly available, but it avoids the failure mode where every platform worker ends up on the same machine.

The pool config looks like this:

worker_pools = {<br>platform = {<br>count = 3<br>sku = "cx33"<br>cidr = "10.0.128.32/27"<br>datacenter = "nbg1-dc3"<br>labels = { purpose = "platform" }

general = {<br>count = 3<br>sku = "cx23"<br>cidr = "10.0.128.64/27"<br>datacenter = "nbg1-dc3"<br>labels = { purpose = "general" }

This makes the Terraform code easier to reason about. It lets me create named groups of machines with known CIDR ranges, placement groups, and labels.

Terraform Creates the Machines

The node resources are just regular Hetzner servers, but with the public network disabled.

resource "hcloud_server" "control_plane" {<br>count = var.control_plane.count

name =...

talos platform hetzner private network kubernetes

Related Articles