Multi-Node K3s Scaling¶

This document describes the plan for expanding from a single-node K3s cluster to a multi-node setup for improved reliability and zero-downtime deployments.

Current State: Single Node¶

The apps server (CX53: 16 vCPU, 32GB RAM) runs both the K3s control plane and all workloads. This creates limitations:

Limitation	Impact
Single point of failure	Server down = all services down
Rolling updates require 2x RAM	New pod must start before old stops
Maintenance causes downtime	OS updates, resizes require reboot
No workload isolation	Heavy services affect others

Target Architecture: 1 Server + N Agents¶

K3s supports separating control plane (server) from workloads (agents):

flowchart TB
    subgraph Internet
        Users([Users])
    end

    subgraph Cloudflare
        Tunnel[Tunnel]
    end

    subgraph Hetzner["Hetzner Cloud"]
        subgraph Private["Private Network 10.0.0.0/16"]
            Server[apps - Server<br/>10.0.2.1<br/>CX53]
            Agent1[worker-1 - Agent<br/>10.0.2.2<br/>CX32]
        end
    end

    subgraph Storage["Hetzner Volumes"]
        HV[Cloud Block Storage]
    end

    Users --> Tunnel --> Server
    Server <--> Agent1
    Server --> HV
    Agent1 --> HV

Node Roles¶

Node	Type	Specs	Purpose
apps	Server	CX53 (16 vCPU, 32GB)	Control plane + heavy workloads
worker-1	Agent	CX32 (8 vCPU, 16GB)	Additional capacity, rolling updates

Benefits of Multi-Node¶

Zero-Downtime Rolling Updates¶

With single node, RollingUpdate strategy needs 2x RAM (old + new pod):

# Current: Requires double RAM on same node
strategy:
  type: RollingUpdate
  rollingUpdate:
    maxSurge: 1        # Creates new pod first
    maxUnavailable: 0  # Keeps old running

With multi-node, new pod schedules on different node:

Before:  [Node 1: Pod v1] [Node 2: empty]
During:  [Node 1: Pod v1] [Node 2: Pod v2 starting]
After:   [Node 1: empty]  [Node 2: Pod v2 running]

Maintenance Without Downtime¶

# Drain node for maintenance
kubectl drain worker-1 --ignore-daemonsets

# Pods migrate to other nodes
# Do maintenance (reboot, resize, etc)

# Bring back
kubectl uncordon worker-1

Database High Availability¶

CloudNative-PG supports standby replicas:

spec:
  instances: 2  # Primary on Node 1, Standby on Node 2

If primary fails, standby promotes automatically.

Implementation Steps¶

Phase 1: Infrastructure¶

Add worker node in Terraform

# infra/live/hetzner/servers.tf
module "worker_1" {
  source = "../../modules/hetzner/vps"

  name        = "worker-1"
  server_type = "cx32"
  image       = "debian-12"
  location    = "fsn1"
  ssh_keys    = [hcloud_ssh_key.phcurado.id]
  firewall_ids = [module.firewall_private.id]

  network_id = hcloud_network.private.id
  private_ip = "10.0.2.2"

  labels = {
    environment = "production"
    role        = "worker"
  }
}

Bootstrap with Ansible

# New playbook: worker.yml
- hosts: worker_1
  roles:
    - common
    - k3s-agent

Phase 2: K3s Agent Join¶

The agent needs a token from the server to join:

# On server, get token
cat /var/lib/rancher/k3s/server/node-token

# On agent, join cluster
curl -sfL https://get.k3s.io | K3S_URL=https://10.0.2.1:6443 K3S_TOKEN=<token> sh -

Verify:

kubectl get nodes
# NAME       STATUS   ROLES                  AGE
# apps       Ready    control-plane,master   90d
# worker-1   Ready    <none>                 1m

Phase 3: Install Hetzner CSI Driver¶

Before pods can schedule across nodes, storage must support multi-node:

helm repo add hcloud https://charts.hetzner.cloud
helm install hcloud-csi hcloud/hcloud-csi \
  --namespace kube-system \
  --set storageClasses[0].name=hcloud-volumes \
  --set storageClasses[0].defaultStorageClass=true

This enables Hetzner Volumes to detach/attach across nodes automatically.

Phase 4: Migrate Workloads¶

Migrate PVCs to Hetzner Volumes (see Storage Strategy)
Enable pod scheduling across nodes
Test rolling updates

Phase 5: Database HA (Optional)¶

For critical databases, add standby replicas:

# Example: Authentik PostgreSQL
spec:
  instances: 2
  # Primary will be on one node, standby on another

Cost Analysis¶

Configuration	Monthly Cost	RAM	Benefit
Single CX53	~€45	32GB	Current
CX53 + CX32	~€60	48GB	Multi-node, zero-downtime
CX53 + CX22	~€50	36GB	Basic multi-node

Recommendation: CX32 as worker provides good capacity for ~€15/month extra.

Node Affinity and Taints¶

For workload placement control:

# Force pod to specific node
affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
        - matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values:
                - apps

# Or use labels
nodeSelector:
  role: worker

Monitoring Multi-Node¶

Prometheus already scrapes node metrics. After adding nodes:

Node Exporter DaemonSet runs on new nodes automatically
Grafana Node Exporter dashboard shows all nodes
Alerts fire for any unhealthy node

Rollback Plan¶

If multi-node causes issues:

Drain worker node: kubectl drain worker-1
All pods return to apps server
Delete worker from Terraform
Continue with single-node

Storage Strategy - Hetzner Volumes + NFS setup
Architecture Overview - Current infrastructure
Implementation Status - Phase 8 tracking