Container Orchestration¶
This document covers how containers are deployed and managed on Minnova infrastructure.
Current Setup: K3s + ArgoCD¶
All services run on K3s with ArgoCD handling GitOps deployments. K3s is CNCF-certified Kubernetes that strips out unnecessary features to run in about 512MB of RAM.
Benefits we get from K3s:
- Declarative deployments: Apply a manifest describing what you want. K3s figures out how to get there.
- Rolling updates: Deploy a new version and K3s gradually replaces old pods with new ones.
- Self-healing: If a container crashes, K3s restarts it automatically.
- Portable workflows: Same kubectl commands and Helm charts work anywhere.
GitOps with ArgoCD¶
All deployments follow GitOps principles - Git is the source of truth, not the cluster.
Deployment workflow:
- Edit manifests in
infra/kubernetes/<service>/or Helm values - Commit and push to
main - ArgoCD detects changes and syncs cluster state to match Git
ArgoCD UI is at argocd.minnova.io, protected by Cloudflare Zero Trust with proxy auth.
App of Apps Pattern¶
ArgoCD manages applications using the App of Apps pattern. A single ApplicationSet auto-discovers directories in kubernetes/* and creates Applications for each. Helm-based services have dedicated Application manifests in argocd/apps/.
This means adding a new raw-manifest service is as simple as creating a directory - no ArgoCD config needed.
What ArgoCD Manages¶
All runtime services: homepage, authentik, monitoring stack (Prometheus, Grafana, Loki, Alloy), Portainer, Forgejo, Gatus/status, Nextcloud, Umami, Traefik, cloudflared tunnel sidecar, CloudNativePG operator, SOPS secrets operator.
What Ansible Still Handles (Bootstrap Only)¶
Some components can't be GitOps'd due to chicken-egg problems:
- K3s installation - must exist before ArgoCD
- Traefik config - K3s built-in component
- SOPS age key - private key on server, not in Git
- ArgoCD itself - can't deploy itself initially
Ansible is run once for initial setup or when updating ArgoCD config. Day-to-day deployments go through Git.
Key Decisions¶
- ServerSideApply: Enabled for apps with large CRDs (Prometheus, CloudNativePG) to avoid annotation size limits
- Directory recursion: Enabled to support
secrets/subdirectories - Auto-sync with self-heal: Changes made directly to cluster get reverted to match Git
- Prune enabled: Resources deleted from Git get deleted from cluster
Migration Status¶
Migration to K3s + ArgoCD GitOps is complete:
| Service | Managed By | Notes |
|---|---|---|
| Homepage | ArgoCD (ApplicationSet) | Raw manifests |
| Authentik | ArgoCD (ApplicationSet) | Raw manifests + CloudNativePG |
| Portainer | ArgoCD (Helm) | Helm chart with extra manifests |
| Monitoring | ArgoCD (Helm) | kube-prometheus-stack |
| Loki | ArgoCD (Helm) | Log aggregation |
| Alloy | ArgoCD (Helm) | Log collection |
| CloudNativePG | ArgoCD (Helm) | PostgreSQL operator |
| SOPS Operator | ArgoCD (Helm) | Secrets decryption |
Old Podman setup has been removed. Ansible is now bootstrap-only (install K3s, Traefik config, ArgoCD + age key); all runtime apps stay in sync via ArgoCD GitOps.
K3s Architecture¶
Components¶
K3s bundles several components. Some we use, some we disable:
| Component | Status | Purpose |
|---|---|---|
| Traefik | Enabled | Internal routing between services via IngressRoute CRDs. Even with Cloudflare Tunnel handling external traffic, Traefik routes internally. |
| CoreDNS | Enabled | Service discovery. Pods find each other via DNS like postgres.database.svc.cluster.local. |
| Local-path provisioner | Enabled | Automatic PersistentVolume creation on local disk when pods request storage. Single-node only. |
| Metrics server | Enabled | Resource monitoring via kubectl top. Required for autoscaling. |
| ServiceLB (Klipper) | Disabled | LoadBalancer IP allocation. Not needed - Cloudflare Tunnel handles ingress. |
Storage Limitations¶
The default local-path provisioner has important limitations:
| Limitation | Impact | Future Solution |
|---|---|---|
| Single-node only | PVCs are tied to one node's disk | Longhorn |
| No replication | Data loss if disk fails | Longhorn replication |
| No multi-node | Pods can't move between nodes | Longhorn |
| No size enforcement | PVCs can exceed requested size | Longhorn quotas |
For multi-node clusters, Longhorn (by Rancher, same team as K3s) provides distributed block storage with replication across nodes. See Storage Strategy for migration planning.
Traffic Flow with Traefik¶
flowchart LR
Internet --> CF[Cloudflare Tunnel]
CF --> CD[cloudflared]
CD --> T[Traefik :80]
T --> G[Grafana]
T --> A[Authentik]
T --> R[ArgoCD]
T --> P[Portainer]
T --> F[Forgejo]
T --> N[Nextcloud]
T --> S[Gatus / Status]
T --> H[Homepage]
T --> U[Umami]
Traefik uses IngressRoute CRDs to match hostnames and route to the correct service. This keeps routing config declarative and in Git.
Secrets Management¶
Secrets use SOPS with Age encryption, decrypted in-cluster via the SOPS Secrets Operator.
flowchart LR
Dev[Developer] -->|sops edit| Encrypted[SopsSecret CRD]
Encrypted -->|git push| Repo[Git]
Repo -->|kubectl apply| K3s
K3s -->|operator + age key| Secret[K8s Secret]
Secret --> Pod
The Age private key is stored as a Kubernetes Secret during cluster bootstrap. The operator watches for SopsSecret resources and creates corresponding Secret resources with decrypted values.
This keeps secrets encrypted in Git while following Kubernetes-native patterns.
Package Management¶
Helm is the standard package manager for Kubernetes. Charts bundle manifests with templating and versioning, enabling reproducible deployments across environments.
Most tools in this stack are deployed via Helm:
| Component | Helm Chart | Status |
|---|---|---|
| Portainer | portainer/portainer |
Deployed |
| CloudNativePG | cloudnative-pg/cloudnative-pg |
Deployed |
| SOPS Operator | isindir/sops-secrets-operator |
Deployed |
| Prometheus/Grafana | prometheus-community/kube-prometheus-stack |
Deployed |
| Loki | grafana/loki |
Deployed |
| Alloy | grafana/alloy |
Deployed |
| Gatus | gatus/gatus |
Deployed |
| Nextcloud | nextcloud/nextcloud |
Deployed |
| Traefik | Built-in K3s chart (HelmChartConfig) | Deployed |
Some services use custom manifests instead of Helm (Authentik, Redis).
Container Management¶
Portainer¶
Portainer provides a web UI for K3s management:
- Visual pod/deployment management
- Log viewing and container shell access
- Resource monitoring per workload
- Helm chart deployments from UI
- YAML editor for manifests
Exposed via Cloudflare Tunnel at portainer.minnova.io with Authentik OIDC protection.
Database Strategy¶
Databases run inside K3s via operators.
PostgreSQL: CloudNativePG
CNCF Sandbox project for PostgreSQL in Kubernetes. Handles replication, automated backups, and failover without external tools like Patroni.
Features:
- Declarative cluster configuration
- Automated failover (promotes standby if primary fails)
- Point-in-time recovery via WAL archiving to S3/R2
- Built-in Prometheus metrics exporter
- Rolling updates without downtime
Redis-compatible: Dragonfly or Valkey
| Redis | Valkey | Dragonfly | |
|---|---|---|---|
| License | AGPLv3 | BSD 3-clause | BSL → Apache 2.0 |
| Architecture | Single-threaded | Single-threaded | Multi-threaded |
| Throughput | Baseline | ~Same | 25x higher |
| API | Native | 100% compatible | 100% compatible |
Dragonfly recommended for new deployments due to multi-threaded performance on modern hardware.
Both have Kubernetes operators:
Current Choice¶
| Component | Choice | Rationale |
|---|---|---|
| PostgreSQL | CloudNativePG | Kubernetes-native operations, automated backups, Prometheus integration |
| Cache | Redis | Simple, sufficient for current workloads |
Current Architecture¶
flowchart TB
subgraph Internet
Users([Users])
GitHub([GitHub])
end
subgraph Cloudflare
Tunnel[Tunnel]
end
subgraph Apps["apps server (10.0.2.1)"]
CD[cloudflared]
subgraph K3s
subgraph System["kube-system"]
Traefik
CoreDNS
end
subgraph ArgoNS["argocd"]
ArgoCD[ArgoCD]
end
subgraph Mon["monitoring"]
Grafana
Prometheus
Loki
Alloy
end
subgraph Auth["authentik"]
AuthServer[Authentik Server]
AuthWorker[Authentik Worker]
AuthPG[PostgreSQL - CloudNativePG]
AuthRedis[Redis]
end
subgraph Tools["portainer"]
Portainer
end
subgraph Home["homepage"]
Homepage
end
end
end
Users --> Tunnel --> CD --> Traefik
GitHub -->|GitOps| ArgoCD
ArgoCD -->|deploys| Mon
ArgoCD -->|deploys| Auth
ArgoCD -->|deploys| Tools
ArgoCD -->|deploys| Home
Traefik --> Grafana
Traefik --> AuthServer
Traefik --> Portainer
Traefik --> Homepage
Traefik --> ArgoCD
AuthServer --> AuthPG
AuthServer --> AuthRedis
All services in K3s with:
- ArgoCD for GitOps deployments
- Traefik for internal routing (IngressRoutes)
- Portainer for visual management
- CloudNativePG for PostgreSQL with metrics
- Prometheus/Grafana/Loki for observability
- SOPS Secrets Operator for encrypted secrets