Skip to content

Infrastructure Implementation Status

Track ongoing infrastructure work.

Last updated: 2025-12-21


Current Focus: Storage & Multi-Node Preparation

  • Install Hetzner CSI driver for cloud volumes
  • Plan multi-node k3s expansion for zero-downtime deployments
  • Configure Google Workspace SMTP for Authentik (password reset, notifications)
  • Keep docs synced with infra main (last sweep: 2025-12-27)

Server Inventory

Name Type Specs Private IP Tailscale IP Status
headscale CX22 2 vCPU, 4GB RAM, 40GB - 100.64.0.2 Live
bastion CX22 2 vCPU, 4GB RAM, 40GB 10.0.1.1 100.64.0.6 Live
apps CX53 16 vCPU, 32GB RAM, 300GB 10.0.2.1 - Live

Services

Service URL Runs On Database Status
Authentik auth.minnova.io K8s (CloudNative-PG + Redis) PostgreSQL Live
Headscale headscale.minnova.io Systemd SQLite Live
ArgoCD argocd.minnova.io K8s - Live
Grafana grafana.minnova.io K8s (kube-prometheus-stack) SQLite (embedded) Live
Prometheus prometheus.minnova.io K8s (5d retention, 15GB limit) TSDB Live
Loki (internal) K8s (7d retention) TSDB Live
Portainer portainer.minnova.io K8s - Live
Forgejo forgejo.minnova.io K8s (CloudNative-PG) PostgreSQL Live
Gatus status.minnova.io K8s SQLite Live
Traefik traefik.minnova.io K8s - Live
Homepage homepage.minnova.io K8s - Live
Glance glance.minnova.io K8s - Live
Nextcloud nextcloud.minnova.io K8s (CloudNative-PG + Redis) PostgreSQL Live
Umami analytics.minnova.io K8s (CloudNative-PG) PostgreSQL Live
Zulip zulip.minnova.io K8s (CloudNative-PG + Redis + RabbitMQ) PostgreSQL Live
Kimai kimai.minnova.io K8s (MariaDB) MariaDB Live
Hoop hoop.minnova.io K8s (CloudNative-PG) PostgreSQL Live
Oracle oracle.minnova.io Cloudflare Pages (Access protected) - Live

Phase Status

Phase 1: SSH Protection

Task Status
Architecture decisions documented Done
Headscale deployed with Traefik Done
Bastion accessible via Tailscale Done
Firewall restricted to Tailscale only Done

Phase 2: Identity

Task Status
Authentik on K8s (CloudNative-PG + Redis) Done
SOPS Secrets Operator for K8s secrets Done
Headscale OIDC integration Done
Initial user setup Done
Google SSO login Done
SMTP configuration (Google Workspace) Pending

Phase 3: Email

Task Status
Google Workspace setup Pending
Corporate mailboxes Pending
MX records in Cloudflare Pending
Configure Authentik SMTP Pending

Phase 4: Observability

Task Status
Add tunnel route (grafana.minnova.io) Done
Deploy kube-prometheus-stack (Grafana + Prometheus) Done
Deploy Node Exporter (via kube-prometheus-stack) Done
Deploy kube-state-metrics Done
CloudNative-PG metrics (PodMonitor) Done
Cross-namespace Prometheus scraping Done
Import CloudNative-PG dashboard (ID: 20417) Done
Import Node Exporter Full dashboard (ID: 1860) Done
Alertmanager Discord integration Done
Deploy Loki Done
Deploy Alloy log collection Done
Deploy Node Exporter (headscale, bastion) Pending

Phase 5: Data Protection

Task Status
PostgreSQL backups to R2 (CloudNative-PG barman) Done
WAL archiving with compression Done
3-day retention policy Done
Test restore procedure Done

Phase 6: Secrets Migration

Task Status
AWS KMS key Pending
IAM OIDC federation with Authentik Pending
SOPS KMS configuration Pending

Phase 7: Storage Strategy

Task Status
Document local-path limitations Done
Evaluate Longhorn for distributed block Planned
Evaluate Hetzner Storage Box for NFS Planned
Plan PVC migration strategy Pending
Implement Longhorn Pending
Migrate databases to Longhorn Pending

Phase 8: Multi-Node Scaling

Task Status
Document multi-node architecture Done
Add second node in Terraform Pending
Configure k3s agent join Pending
Enable Longhorn replication across nodes Pending
Test rolling updates with zero downtime Pending
Configure CNPG standby replicas Pending

Architecture Decisions

Decision Choice Notes
Edge/CDN Cloudflare
Ingress Cloudflare Tunnel → Traefik (K8s)
VPN/Network Headscale (self-hosted Tailscale coordinator)
Identity Authentik (K8s)
Database CloudNative-PG (PostgreSQL operator) Daily scheduled backups + WAL to R2
Cache Redis (per-app) Considering Dragonfly for future
Storage (current) local-path provisioner Single-node only, no replication
Storage (planned) Longhorn Distributed block storage for multi-node
File Storage Hetzner Storage Box (planned) NFS for large files, backups
Secrets (K8s) SOPS Secrets Operator + Age
Secrets (Ansible) SOPS + Age
Email Google Workspace (pending)
Orchestration K3s + ArgoCD GitOps
Analytics Umami (self-hosted, PostgreSQL)
Zero Trust Cloudflare Access + Authentik OIDC
Security/IDS CrowdSec (Traefik + Cloudflare bouncers)
Alerting Alertmanager → Zulip Critical alerts to team chat

Key Files

File Purpose
infra/live/hetzner/servers.tf Server definitions (CX53 apps server)
infra/live/cloudflare/tunnel.tf Tunnel configuration
infra/live/cloudflare/zero_trust.tf Cloudflare Access applications
infra/ansible/playbooks/headscale.yml Headscale + Traefik
infra/ansible/playbooks/bastion.yml Bastion with Tailscale
infra/ansible/playbooks/k3s.yml K3s bootstrap
infra/kubernetes/monitoring/values.yaml Prometheus/Grafana config
infra/kubernetes/monitoring/alert-rules.yaml Custom Prometheus alerts
infra/kubernetes/monitoring/alertmanager-config.yaml Zulip alerting config
infra/kubernetes/monitoring/loki-values.yaml Loki configuration
infra/kubernetes/authentik/ Authentik K8s manifests
infra/kubernetes/zulip/ Zulip chat deployment
infra/kubernetes/nextcloud/ Nextcloud file sharing
infra/kubernetes/forgejo/ Forgejo git server
infra/kubernetes/*/postgres-cluster.yaml CloudNative-PG clusters
infra/kubernetes/*/scheduled-backup.yaml Daily PostgreSQL backups
infra/kubernetes/cloudnative-pg/ CloudNative-PG operator
infra/kubernetes/sops-secrets-operator/ SOPS operator
infra/kubernetes/crowdsec/ CrowdSec IDS/IPS deployment