Skip to content

Multi-Node K3s Scaling

This document describes the plan for expanding from a single-node K3s cluster to a multi-node setup for improved reliability and zero-downtime deployments.

Current State: Single Node

The apps server (CX53: 16 vCPU, 32GB RAM) runs both the K3s control plane and all workloads. This creates limitations:

Limitation Impact
Single point of failure Server down = all services down
Rolling updates require 2x RAM New pod must start before old stops
Maintenance causes downtime OS updates, resizes require reboot
No workload isolation Heavy services affect others

Target Architecture: 1 Server + N Agents

K3s supports separating control plane (server) from workloads (agents):

flowchart TB
    subgraph Internet
        Users([Users])
    end

    subgraph Cloudflare
        Tunnel[Tunnel]
    end

    subgraph Hetzner["Hetzner Cloud"]
        subgraph Private["Private Network 10.0.0.0/16"]
            Server[apps - Server<br/>10.0.2.1<br/>CX53]
            Agent1[worker-1 - Agent<br/>10.0.2.2<br/>CX32]
        end
    end

    subgraph Storage["Hetzner Volumes"]
        HV[Cloud Block Storage]
    end

    Users --> Tunnel --> Server
    Server <--> Agent1
    Server --> HV
    Agent1 --> HV

Node Roles

Node Type Specs Purpose
apps Server CX53 (16 vCPU, 32GB) Control plane + heavy workloads
worker-1 Agent CX32 (8 vCPU, 16GB) Additional capacity, rolling updates

Benefits of Multi-Node

Zero-Downtime Rolling Updates

With single node, RollingUpdate strategy needs 2x RAM (old + new pod):

# Current: Requires double RAM on same node
strategy:
  type: RollingUpdate
  rollingUpdate:
    maxSurge: 1        # Creates new pod first
    maxUnavailable: 0  # Keeps old running

With multi-node, new pod schedules on different node:

Before:  [Node 1: Pod v1] [Node 2: empty]
During:  [Node 1: Pod v1] [Node 2: Pod v2 starting]
After:   [Node 1: empty]  [Node 2: Pod v2 running]

Maintenance Without Downtime

# Drain node for maintenance
kubectl drain worker-1 --ignore-daemonsets

# Pods migrate to other nodes
# Do maintenance (reboot, resize, etc)

# Bring back
kubectl uncordon worker-1

Database High Availability

CloudNative-PG supports standby replicas:

spec:
  instances: 2  # Primary on Node 1, Standby on Node 2

If primary fails, standby promotes automatically.

Implementation Steps

Phase 1: Infrastructure

  1. Add worker node in Terraform
# infra/live/hetzner/servers.tf
module "worker_1" {
  source = "../../modules/hetzner/vps"

  name        = "worker-1"
  server_type = "cx32"
  image       = "debian-12"
  location    = "fsn1"
  ssh_keys    = [hcloud_ssh_key.phcurado.id]
  firewall_ids = [module.firewall_private.id]

  network_id = hcloud_network.private.id
  private_ip = "10.0.2.2"

  labels = {
    environment = "production"
    role        = "worker"
  }
}
  1. Bootstrap with Ansible
# New playbook: worker.yml
- hosts: worker_1
  roles:
    - common
    - k3s-agent

Phase 2: K3s Agent Join

The agent needs a token from the server to join:

# On server, get token
cat /var/lib/rancher/k3s/server/node-token

# On agent, join cluster
curl -sfL https://get.k3s.io | K3S_URL=https://10.0.2.1:6443 K3S_TOKEN=<token> sh -

Verify:

kubectl get nodes
# NAME       STATUS   ROLES                  AGE
# apps       Ready    control-plane,master   90d
# worker-1   Ready    <none>                 1m

Phase 3: Install Hetzner CSI Driver

Before pods can schedule across nodes, storage must support multi-node:

helm repo add hcloud https://charts.hetzner.cloud
helm install hcloud-csi hcloud/hcloud-csi \
  --namespace kube-system \
  --set storageClasses[0].name=hcloud-volumes \
  --set storageClasses[0].defaultStorageClass=true

This enables Hetzner Volumes to detach/attach across nodes automatically.

Phase 4: Migrate Workloads

  1. Migrate PVCs to Hetzner Volumes (see Storage Strategy)
  2. Enable pod scheduling across nodes
  3. Test rolling updates

Phase 5: Database HA (Optional)

For critical databases, add standby replicas:

# Example: Authentik PostgreSQL
spec:
  instances: 2
  # Primary will be on one node, standby on another

Cost Analysis

Configuration Monthly Cost RAM Benefit
Single CX53 ~€45 32GB Current
CX53 + CX32 ~€60 48GB Multi-node, zero-downtime
CX53 + CX22 ~€50 36GB Basic multi-node

Recommendation: CX32 as worker provides good capacity for ~€15/month extra.

Node Affinity and Taints

For workload placement control:

# Force pod to specific node
affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
        - matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values:
                - apps

# Or use labels
nodeSelector:
  role: worker

Monitoring Multi-Node

Prometheus already scrapes node metrics. After adding nodes:

  1. Node Exporter DaemonSet runs on new nodes automatically
  2. Grafana Node Exporter dashboard shows all nodes
  3. Alerts fire for any unhealthy node

Rollback Plan

If multi-node causes issues:

  1. Drain worker node: kubectl drain worker-1
  2. All pods return to apps server
  3. Delete worker from Terraform
  4. Continue with single-node