Storage Strategy¶
This document outlines the storage architecture for the Minnova K3s cluster, including current limitations and planned improvements.
Current State: Local-Path Provisioner¶
The cluster currently uses K3s's built-in local-path provisioner for persistent storage. All PVCs are stored at /var/lib/rancher/k3s/storage/ on the apps server.
/var/lib/rancher/k3s/storage/
├── pvc-xxx_authentik_authentik-pg-1/ # PostgreSQL data
├── pvc-xxx_zulip_zulip-pg-1/ # PostgreSQL data
├── pvc-xxx_nextcloud_nextcloud-nextcloud/ # Nextcloud files
├── pvc-xxx_monitoring_prometheus-*/ # Prometheus TSDB
└── ...
Limitations¶
| Issue | Description | Impact |
|---|---|---|
| Single-node bound | PVCs are tied to the local disk | Pods can't move to other nodes |
| No replication | Data exists only on one disk | Disk failure = data loss |
| No size enforcement | PVCs can grow beyond requested size | Must monitor manually |
| Rolling updates | Requires 2x resources on same node | Memory spikes during deploys |
Current Mitigations¶
- CloudNative-PG backups: All PostgreSQL databases backup to Cloudflare R2 (WAL archiving + daily full backups)
- Velero/Kopia: Application-level backups to R2
- Prometheus retention: Limited to 5 days / 15GB to prevent disk exhaustion
Planned: Hetzner Volumes (CSI Driver)¶
Hetzner Volumes are cloud block storage that can be attached to any server in the same datacenter. Using the Hetzner CSI Driver, Kubernetes can dynamically provision and attach volumes.
Why Hetzner Volumes (Not Longhorn)¶
| Consideration | Hetzner Volumes | Longhorn |
|---|---|---|
| Complexity | Simple CSI driver | Complex distributed system |
| Management | Hetzner manages durability | Self-managed replication |
| Multi-node | Volumes move with pods | Data replicated across nodes |
| Failover speed | ~30-60s (detach/attach) | Near-instant |
| Cost | €0.052/GB/month | "Free" (uses local disk) |
| Our DB HA | CloudNative-PG handles replication | Redundant with CNPG |
Decision: Hetzner Volumes + CloudNative-PG replication is simpler and sufficient for our scale. Longhorn adds complexity without significant benefit when databases already replicate at the application layer.
How It Works¶
flowchart TB
subgraph Node1["Node 1 (apps)"]
Pod1[PostgreSQL Pod]
end
subgraph Node2["Node 2 (worker)"]
Pod2[PostgreSQL Standby]
end
subgraph Hetzner["Hetzner Cloud Storage"]
Vol1[(Volume 1)]
Vol2[(Volume 2)]
end
Pod1 --> Vol1
Pod2 --> Vol2
Pod1 <-->|Streaming Replication| Pod2
When a pod moves:
- CSI driver detaches volume from old node
- CSI driver attaches volume to new node
- Pod starts with data intact (~30-60s total)
For databases, CloudNative-PG standby takes over instantly while primary recovers.
Volume Characteristics¶
- Persistent: Survives server reboots, upgrades, even server deletion
- Portable: Can detach and attach to different servers
- Scalable: Resize without downtime
- Durable: Hetzner manages redundancy at storage layer
- Limitation: Single-attach only (one node at a time)
Planned: Hetzner Storage Box (NFS)¶
Hetzner Storage Box provides NFS-accessible network storage, separate from VPS instances. Useful for data that doesn't need block storage performance.
Use Cases¶
| Use Case | Why NFS Works |
|---|---|
| Nextcloud files | Large files, shared access okay |
| Media storage | Read-heavy, write-light |
| Backup targets | Sequential writes, not latency-sensitive |
| Shared configs | Read by multiple pods |
When NOT to Use NFS¶
- Databases (PostgreSQL, MySQL) - Need block storage for ACID guarantees
- Redis/caching - Latency-sensitive
- High-IOPS workloads - Network latency adds up
Storage Classes After Migration¶
| StorageClass | Backend | Use Case | Notes |
|---|---|---|---|
hcloud-volumes |
Hetzner Volumes | Databases, stateful apps | Default |
nfs |
Hetzner Storage Box | Large files, Nextcloud | Shared access |
local-path |
Local disk | Ephemeral, non-critical | Keep for testing |
Migration Plan¶
Phase 1: Install Hetzner CSI Driver¶
- Create Hetzner API token with volume permissions
- Deploy CSI driver via Helm:
helm repo add hcloud https://charts.hetzner.cloud helm install hcloud-csi hcloud/hcloud-csi -n kube-system - Verify
hcloud-volumesStorageClass is available - Test with non-critical workload
Phase 2: Migrate Databases¶
For each CloudNative-PG cluster:
- Verify backup is current (
kubectl cnpg status <cluster>) - Update cluster spec to use
hcloud-volumesStorageClass - Let CNPG handle migration via backup/restore
- Verify replication is healthy
Phase 3: Migrate Other PVCs¶
- Prometheus: Delete and recreate (historical data not critical, 5-day retention)
- Grafana: Export dashboards, recreate PVC
- Nextcloud files: Migrate to NFS (Storage Box)
- Vaultwarden: Backup, recreate with Hetzner Volume
Phase 4: Decommission local-path (Optional)¶
- Verify all critical workloads on Hetzner Volumes/NFS
- Keep
local-pathfor non-critical workloads (lower cost) - Clean up old PVC directories
Backup Strategy¶
Backups remain multi-layered:
┌─────────────────────────────────────────────────┐
│ Application Layer │
│ - CloudNative-PG → R2 (WAL + daily backups) │
│ - Velero/Kopia → R2 (app snapshots) │
└─────────────────────────────────────────────────┘
+
┌─────────────────────────────────────────────────┐
│ Storage Layer │
│ - Hetzner Volumes (Hetzner-managed durability)│
│ - Storage Box (separate from compute) │
└─────────────────────────────────────────────────┘
+
┌─────────────────────────────────────────────────┐
│ Disaster Recovery │
│ - All backups in Cloudflare R2 │
│ - Cross-region by default │
└─────────────────────────────────────────────────┘
Cost Estimate¶
| Storage | Size | Monthly Cost |
|---|---|---|
| Hetzner Volumes (DBs) | ~50GB | ~€2.60 |
| Storage Box BX11 (files) | 1TB | €3.81 |
| Total | ~€6.50/month |
Related Documentation¶
- Orchestration - K3s components and storage
- Multi-Node Scaling - Adding worker nodes
- Implementation Status - Phase 7 tracking