Skip to content

Container Orchestration

This document covers how containers are deployed and managed on Minnova infrastructure.

Current Setup: K3s + ArgoCD

All services run on K3s with ArgoCD handling GitOps deployments. K3s is CNCF-certified Kubernetes that strips out unnecessary features to run in about 512MB of RAM.

Benefits we get from K3s:

  • Declarative deployments: Apply a manifest describing what you want. K3s figures out how to get there.
  • Rolling updates: Deploy a new version and K3s gradually replaces old pods with new ones.
  • Self-healing: If a container crashes, K3s restarts it automatically.
  • Portable workflows: Same kubectl commands and Helm charts work anywhere.

GitOps with ArgoCD

All deployments follow GitOps principles - Git is the source of truth, not the cluster.

Deployment workflow:

  1. Edit manifests in infra/kubernetes/<service>/ or Helm values
  2. Commit and push to main
  3. ArgoCD detects changes and syncs cluster state to match Git

ArgoCD UI is at argocd.minnova.io, protected by Cloudflare Zero Trust with proxy auth.

App of Apps Pattern

ArgoCD manages applications using the App of Apps pattern. A single ApplicationSet auto-discovers directories in kubernetes/* and creates Applications for each. Helm-based services have dedicated Application manifests in argocd/apps/.

This means adding a new raw-manifest service is as simple as creating a directory - no ArgoCD config needed.

What ArgoCD Manages

All runtime services: homepage, authentik, monitoring stack (Prometheus, Grafana, Loki, Alloy), Portainer, Forgejo, Gatus/status, Nextcloud, Umami, Traefik, cloudflared tunnel sidecar, CloudNativePG operator, SOPS secrets operator.

What Ansible Still Handles (Bootstrap Only)

Some components can't be GitOps'd due to chicken-egg problems:

  • K3s installation - must exist before ArgoCD
  • Traefik config - K3s built-in component
  • SOPS age key - private key on server, not in Git
  • ArgoCD itself - can't deploy itself initially

Ansible is run once for initial setup or when updating ArgoCD config. Day-to-day deployments go through Git.

Key Decisions

  • ServerSideApply: Enabled for apps with large CRDs (Prometheus, CloudNativePG) to avoid annotation size limits
  • Directory recursion: Enabled to support secrets/ subdirectories
  • Auto-sync with self-heal: Changes made directly to cluster get reverted to match Git
  • Prune enabled: Resources deleted from Git get deleted from cluster

Migration Status

Migration to K3s + ArgoCD GitOps is complete:

Service Managed By Notes
Homepage ArgoCD (ApplicationSet) Raw manifests
Authentik ArgoCD (ApplicationSet) Raw manifests + CloudNativePG
Portainer ArgoCD (Helm) Helm chart with extra manifests
Monitoring ArgoCD (Helm) kube-prometheus-stack
Loki ArgoCD (Helm) Log aggregation
Alloy ArgoCD (Helm) Log collection
CloudNativePG ArgoCD (Helm) PostgreSQL operator
SOPS Operator ArgoCD (Helm) Secrets decryption

Old Podman setup has been removed. Ansible is now bootstrap-only (install K3s, Traefik config, ArgoCD + age key); all runtime apps stay in sync via ArgoCD GitOps.

K3s Architecture

Components

K3s bundles several components. Some we use, some we disable:

Component Status Purpose
Traefik Enabled Internal routing between services via IngressRoute CRDs. Even with Cloudflare Tunnel handling external traffic, Traefik routes internally.
CoreDNS Enabled Service discovery. Pods find each other via DNS like postgres.database.svc.cluster.local.
Local-path provisioner Enabled Automatic PersistentVolume creation on local disk when pods request storage. Single-node only.
Metrics server Enabled Resource monitoring via kubectl top. Required for autoscaling.
ServiceLB (Klipper) Disabled LoadBalancer IP allocation. Not needed - Cloudflare Tunnel handles ingress.

Storage Limitations

The default local-path provisioner has important limitations:

Limitation Impact Future Solution
Single-node only PVCs are tied to one node's disk Longhorn
No replication Data loss if disk fails Longhorn replication
No multi-node Pods can't move between nodes Longhorn
No size enforcement PVCs can exceed requested size Longhorn quotas

For multi-node clusters, Longhorn (by Rancher, same team as K3s) provides distributed block storage with replication across nodes. See Storage Strategy for migration planning.

Traffic Flow with Traefik

flowchart LR
    Internet --> CF[Cloudflare Tunnel]
    CF --> CD[cloudflared]
    CD --> T[Traefik :80]
    T --> G[Grafana]
    T --> A[Authentik]
    T --> R[ArgoCD]
    T --> P[Portainer]
    T --> F[Forgejo]
    T --> N[Nextcloud]
    T --> S[Gatus / Status]
    T --> H[Homepage]
    T --> U[Umami]

Traefik uses IngressRoute CRDs to match hostnames and route to the correct service. This keeps routing config declarative and in Git.

Secrets Management

Secrets use SOPS with Age encryption, decrypted in-cluster via the SOPS Secrets Operator.

flowchart LR
    Dev[Developer] -->|sops edit| Encrypted[SopsSecret CRD]
    Encrypted -->|git push| Repo[Git]
    Repo -->|kubectl apply| K3s
    K3s -->|operator + age key| Secret[K8s Secret]
    Secret --> Pod

The Age private key is stored as a Kubernetes Secret during cluster bootstrap. The operator watches for SopsSecret resources and creates corresponding Secret resources with decrypted values.

This keeps secrets encrypted in Git while following Kubernetes-native patterns.

Package Management

Helm is the standard package manager for Kubernetes. Charts bundle manifests with templating and versioning, enabling reproducible deployments across environments.

Most tools in this stack are deployed via Helm:

Component Helm Chart Status
Portainer portainer/portainer Deployed
CloudNativePG cloudnative-pg/cloudnative-pg Deployed
SOPS Operator isindir/sops-secrets-operator Deployed
Prometheus/Grafana prometheus-community/kube-prometheus-stack Deployed
Loki grafana/loki Deployed
Alloy grafana/alloy Deployed
Gatus gatus/gatus Deployed
Nextcloud nextcloud/nextcloud Deployed
Traefik Built-in K3s chart (HelmChartConfig) Deployed

Some services use custom manifests instead of Helm (Authentik, Redis).

Container Management

Portainer

Portainer provides a web UI for K3s management:

  • Visual pod/deployment management
  • Log viewing and container shell access
  • Resource monitoring per workload
  • Helm chart deployments from UI
  • YAML editor for manifests

Exposed via Cloudflare Tunnel at portainer.minnova.io with Authentik OIDC protection.

Database Strategy

Databases run inside K3s via operators.

PostgreSQL: CloudNativePG

CNCF Sandbox project for PostgreSQL in Kubernetes. Handles replication, automated backups, and failover without external tools like Patroni.

Features:

  • Declarative cluster configuration
  • Automated failover (promotes standby if primary fails)
  • Point-in-time recovery via WAL archiving to S3/R2
  • Built-in Prometheus metrics exporter
  • Rolling updates without downtime

Redis-compatible: Dragonfly or Valkey

Redis Valkey Dragonfly
License AGPLv3 BSD 3-clause BSL → Apache 2.0
Architecture Single-threaded Single-threaded Multi-threaded
Throughput Baseline ~Same 25x higher
API Native 100% compatible 100% compatible

Dragonfly recommended for new deployments due to multi-threaded performance on modern hardware.

Both have Kubernetes operators:

Current Choice

Component Choice Rationale
PostgreSQL CloudNativePG Kubernetes-native operations, automated backups, Prometheus integration
Cache Redis Simple, sufficient for current workloads

Current Architecture

flowchart TB
    subgraph Internet
        Users([Users])
        GitHub([GitHub])
    end

    subgraph Cloudflare
        Tunnel[Tunnel]
    end

    subgraph Apps["apps server (10.0.2.1)"]
        CD[cloudflared]

        subgraph K3s
            subgraph System["kube-system"]
                Traefik
                CoreDNS
            end

            subgraph ArgoNS["argocd"]
                ArgoCD[ArgoCD]
            end

            subgraph Mon["monitoring"]
                Grafana
                Prometheus
                Loki
                Alloy
            end

            subgraph Auth["authentik"]
                AuthServer[Authentik Server]
                AuthWorker[Authentik Worker]
                AuthPG[PostgreSQL - CloudNativePG]
                AuthRedis[Redis]
            end

            subgraph Tools["portainer"]
                Portainer
            end

            subgraph Home["homepage"]
                Homepage
            end
        end
    end

    Users --> Tunnel --> CD --> Traefik
    GitHub -->|GitOps| ArgoCD
    ArgoCD -->|deploys| Mon
    ArgoCD -->|deploys| Auth
    ArgoCD -->|deploys| Tools
    ArgoCD -->|deploys| Home
    Traefik --> Grafana
    Traefik --> AuthServer
    Traefik --> Portainer
    Traefik --> Homepage
    Traefik --> ArgoCD
    AuthServer --> AuthPG
    AuthServer --> AuthRedis

All services in K3s with:

  • ArgoCD for GitOps deployments
  • Traefik for internal routing (IngressRoutes)
  • Portainer for visual management
  • CloudNativePG for PostgreSQL with metrics
  • Prometheus/Grafana/Loki for observability
  • SOPS Secrets Operator for encrypted secrets