Multi-Node K3s Scaling¶
This document describes the plan for expanding from a single-node K3s cluster to a multi-node setup for improved reliability and zero-downtime deployments.
Current State: Single Node¶
The apps server (CX53: 16 vCPU, 32GB RAM) runs both the K3s control plane and all workloads. This creates limitations:
| Limitation | Impact |
|---|---|
| Single point of failure | Server down = all services down |
| Rolling updates require 2x RAM | New pod must start before old stops |
| Maintenance causes downtime | OS updates, resizes require reboot |
| No workload isolation | Heavy services affect others |
Target Architecture: 1 Server + N Agents¶
K3s supports separating control plane (server) from workloads (agents):
flowchart TB
subgraph Internet
Users([Users])
end
subgraph Cloudflare
Tunnel[Tunnel]
end
subgraph Hetzner["Hetzner Cloud"]
subgraph Private["Private Network 10.0.0.0/16"]
Server[apps - Server<br/>10.0.2.1<br/>CX53]
Agent1[worker-1 - Agent<br/>10.0.2.2<br/>CX32]
end
end
subgraph Storage["Hetzner Volumes"]
HV[Cloud Block Storage]
end
Users --> Tunnel --> Server
Server <--> Agent1
Server --> HV
Agent1 --> HV
Node Roles¶
| Node | Type | Specs | Purpose |
|---|---|---|---|
| apps | Server | CX53 (16 vCPU, 32GB) | Control plane + heavy workloads |
| worker-1 | Agent | CX32 (8 vCPU, 16GB) | Additional capacity, rolling updates |
Benefits of Multi-Node¶
Zero-Downtime Rolling Updates¶
With single node, RollingUpdate strategy needs 2x RAM (old + new pod):
# Current: Requires double RAM on same node
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1 # Creates new pod first
maxUnavailable: 0 # Keeps old running
With multi-node, new pod schedules on different node:
Before: [Node 1: Pod v1] [Node 2: empty]
During: [Node 1: Pod v1] [Node 2: Pod v2 starting]
After: [Node 1: empty] [Node 2: Pod v2 running]
Maintenance Without Downtime¶
# Drain node for maintenance
kubectl drain worker-1 --ignore-daemonsets
# Pods migrate to other nodes
# Do maintenance (reboot, resize, etc)
# Bring back
kubectl uncordon worker-1
Database High Availability¶
CloudNative-PG supports standby replicas:
spec:
instances: 2 # Primary on Node 1, Standby on Node 2
If primary fails, standby promotes automatically.
Implementation Steps¶
Phase 1: Infrastructure¶
- Add worker node in Terraform
# infra/live/hetzner/servers.tf
module "worker_1" {
source = "../../modules/hetzner/vps"
name = "worker-1"
server_type = "cx32"
image = "debian-12"
location = "fsn1"
ssh_keys = [hcloud_ssh_key.phcurado.id]
firewall_ids = [module.firewall_private.id]
network_id = hcloud_network.private.id
private_ip = "10.0.2.2"
labels = {
environment = "production"
role = "worker"
}
}
- Bootstrap with Ansible
# New playbook: worker.yml
- hosts: worker_1
roles:
- common
- k3s-agent
Phase 2: K3s Agent Join¶
The agent needs a token from the server to join:
# On server, get token
cat /var/lib/rancher/k3s/server/node-token
# On agent, join cluster
curl -sfL https://get.k3s.io | K3S_URL=https://10.0.2.1:6443 K3S_TOKEN=<token> sh -
Verify:
kubectl get nodes
# NAME STATUS ROLES AGE
# apps Ready control-plane,master 90d
# worker-1 Ready <none> 1m
Phase 3: Install Hetzner CSI Driver¶
Before pods can schedule across nodes, storage must support multi-node:
helm repo add hcloud https://charts.hetzner.cloud
helm install hcloud-csi hcloud/hcloud-csi \
--namespace kube-system \
--set storageClasses[0].name=hcloud-volumes \
--set storageClasses[0].defaultStorageClass=true
This enables Hetzner Volumes to detach/attach across nodes automatically.
Phase 4: Migrate Workloads¶
- Migrate PVCs to Hetzner Volumes (see Storage Strategy)
- Enable pod scheduling across nodes
- Test rolling updates
Phase 5: Database HA (Optional)¶
For critical databases, add standby replicas:
# Example: Authentik PostgreSQL
spec:
instances: 2
# Primary will be on one node, standby on another
Cost Analysis¶
| Configuration | Monthly Cost | RAM | Benefit |
|---|---|---|---|
| Single CX53 | ~€45 | 32GB | Current |
| CX53 + CX32 | ~€60 | 48GB | Multi-node, zero-downtime |
| CX53 + CX22 | ~€50 | 36GB | Basic multi-node |
Recommendation: CX32 as worker provides good capacity for ~€15/month extra.
Node Affinity and Taints¶
For workload placement control:
# Force pod to specific node
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- apps
# Or use labels
nodeSelector:
role: worker
Monitoring Multi-Node¶
Prometheus already scrapes node metrics. After adding nodes:
- Node Exporter DaemonSet runs on new nodes automatically
- Grafana Node Exporter dashboard shows all nodes
- Alerts fire for any unhealthy node
Rollback Plan¶
If multi-node causes issues:
- Drain worker node:
kubectl drain worker-1 - All pods return to apps server
- Delete worker from Terraform
- Continue with single-node
Related Documentation¶
- Storage Strategy - Hetzner Volumes + NFS setup
- Architecture Overview - Current infrastructure
- Implementation Status - Phase 8 tracking