Infrastructure Implementation Status
Track ongoing infrastructure work.
Last updated: 2025-12-21
Current Focus: Storage & Multi-Node Preparation
- Install Hetzner CSI driver for cloud volumes
- Plan multi-node k3s expansion for zero-downtime deployments
- Configure Google Workspace SMTP for Authentik (password reset, notifications)
- Keep docs synced with infra
main (last sweep: 2025-12-27)
Server Inventory
| Name |
Type |
Specs |
Private IP |
Tailscale IP |
Status |
| headscale |
CX22 |
2 vCPU, 4GB RAM, 40GB |
- |
100.64.0.2 |
Live |
| bastion |
CX22 |
2 vCPU, 4GB RAM, 40GB |
10.0.1.1 |
100.64.0.6 |
Live |
| apps |
CX53 |
16 vCPU, 32GB RAM, 300GB |
10.0.2.1 |
- |
Live |
Services
| Service |
URL |
Runs On |
Database |
Status |
| Authentik |
auth.minnova.io |
K8s (CloudNative-PG + Redis) |
PostgreSQL |
Live |
| Headscale |
headscale.minnova.io |
Systemd |
SQLite |
Live |
| ArgoCD |
argocd.minnova.io |
K8s |
- |
Live |
| Grafana |
grafana.minnova.io |
K8s (kube-prometheus-stack) |
SQLite (embedded) |
Live |
| Prometheus |
prometheus.minnova.io |
K8s (5d retention, 15GB limit) |
TSDB |
Live |
| Loki |
(internal) |
K8s (7d retention) |
TSDB |
Live |
| Portainer |
portainer.minnova.io |
K8s |
- |
Live |
| Forgejo |
forgejo.minnova.io |
K8s (CloudNative-PG) |
PostgreSQL |
Live |
| Gatus |
status.minnova.io |
K8s |
SQLite |
Live |
| Traefik |
traefik.minnova.io |
K8s |
- |
Live |
| Homepage |
homepage.minnova.io |
K8s |
- |
Live |
| Glance |
glance.minnova.io |
K8s |
- |
Live |
| Nextcloud |
nextcloud.minnova.io |
K8s (CloudNative-PG + Redis) |
PostgreSQL |
Live |
| Umami |
analytics.minnova.io |
K8s (CloudNative-PG) |
PostgreSQL |
Live |
| Zulip |
zulip.minnova.io |
K8s (CloudNative-PG + Redis + RabbitMQ) |
PostgreSQL |
Live |
| Kimai |
kimai.minnova.io |
K8s (MariaDB) |
MariaDB |
Live |
| Hoop |
hoop.minnova.io |
K8s (CloudNative-PG) |
PostgreSQL |
Live |
| Oracle |
oracle.minnova.io |
Cloudflare Pages (Access protected) |
- |
Live |
Phase Status
Phase 1: SSH Protection
| Task |
Status |
| Architecture decisions documented |
Done |
| Headscale deployed with Traefik |
Done |
| Bastion accessible via Tailscale |
Done |
| Firewall restricted to Tailscale only |
Done |
Phase 2: Identity
| Task |
Status |
| Authentik on K8s (CloudNative-PG + Redis) |
Done |
| SOPS Secrets Operator for K8s secrets |
Done |
| Headscale OIDC integration |
Done |
| Initial user setup |
Done |
| Google SSO login |
Done |
| SMTP configuration (Google Workspace) |
Pending |
Phase 3: Email
| Task |
Status |
| Google Workspace setup |
Pending |
| Corporate mailboxes |
Pending |
| MX records in Cloudflare |
Pending |
| Configure Authentik SMTP |
Pending |
Phase 4: Observability
| Task |
Status |
| Add tunnel route (grafana.minnova.io) |
Done |
| Deploy kube-prometheus-stack (Grafana + Prometheus) |
Done |
| Deploy Node Exporter (via kube-prometheus-stack) |
Done |
| Deploy kube-state-metrics |
Done |
| CloudNative-PG metrics (PodMonitor) |
Done |
| Cross-namespace Prometheus scraping |
Done |
| Import CloudNative-PG dashboard (ID: 20417) |
Done |
| Import Node Exporter Full dashboard (ID: 1860) |
Done |
| Alertmanager Discord integration |
Done |
| Deploy Loki |
Done |
| Deploy Alloy log collection |
Done |
| Deploy Node Exporter (headscale, bastion) |
Pending |
Phase 5: Data Protection
| Task |
Status |
| PostgreSQL backups to R2 (CloudNative-PG barman) |
Done |
| WAL archiving with compression |
Done |
| 3-day retention policy |
Done |
| Test restore procedure |
Done |
Phase 6: Secrets Migration
| Task |
Status |
| AWS KMS key |
Pending |
| IAM OIDC federation with Authentik |
Pending |
| SOPS KMS configuration |
Pending |
Phase 7: Storage Strategy
| Task |
Status |
| Document local-path limitations |
Done |
| Evaluate Longhorn for distributed block |
Planned |
| Evaluate Hetzner Storage Box for NFS |
Planned |
| Plan PVC migration strategy |
Pending |
| Implement Longhorn |
Pending |
| Migrate databases to Longhorn |
Pending |
Phase 8: Multi-Node Scaling
| Task |
Status |
| Document multi-node architecture |
Done |
| Add second node in Terraform |
Pending |
| Configure k3s agent join |
Pending |
| Enable Longhorn replication across nodes |
Pending |
| Test rolling updates with zero downtime |
Pending |
| Configure CNPG standby replicas |
Pending |
Architecture Decisions
| Decision |
Choice |
Notes |
| Edge/CDN |
Cloudflare |
|
| Ingress |
Cloudflare Tunnel → Traefik (K8s) |
|
| VPN/Network |
Headscale (self-hosted Tailscale coordinator) |
|
| Identity |
Authentik (K8s) |
|
| Database |
CloudNative-PG (PostgreSQL operator) |
Daily scheduled backups + WAL to R2 |
| Cache |
Redis (per-app) |
Considering Dragonfly for future |
| Storage (current) |
local-path provisioner |
Single-node only, no replication |
| Storage (planned) |
Longhorn |
Distributed block storage for multi-node |
| File Storage |
Hetzner Storage Box (planned) |
NFS for large files, backups |
| Secrets (K8s) |
SOPS Secrets Operator + Age |
|
| Secrets (Ansible) |
SOPS + Age |
|
| Email |
Google Workspace (pending) |
|
| Orchestration |
K3s + ArgoCD GitOps |
|
| Analytics |
Umami (self-hosted, PostgreSQL) |
|
| Zero Trust |
Cloudflare Access + Authentik OIDC |
|
| Security/IDS |
CrowdSec (Traefik + Cloudflare bouncers) |
|
| Alerting |
Alertmanager → Zulip |
Critical alerts to team chat |
Key Files
| File |
Purpose |
infra/live/hetzner/servers.tf |
Server definitions (CX53 apps server) |
infra/live/cloudflare/tunnel.tf |
Tunnel configuration |
infra/live/cloudflare/zero_trust.tf |
Cloudflare Access applications |
infra/ansible/playbooks/headscale.yml |
Headscale + Traefik |
infra/ansible/playbooks/bastion.yml |
Bastion with Tailscale |
infra/ansible/playbooks/k3s.yml |
K3s bootstrap |
infra/kubernetes/monitoring/values.yaml |
Prometheus/Grafana config |
infra/kubernetes/monitoring/alert-rules.yaml |
Custom Prometheus alerts |
infra/kubernetes/monitoring/alertmanager-config.yaml |
Zulip alerting config |
infra/kubernetes/monitoring/loki-values.yaml |
Loki configuration |
infra/kubernetes/authentik/ |
Authentik K8s manifests |
infra/kubernetes/zulip/ |
Zulip chat deployment |
infra/kubernetes/nextcloud/ |
Nextcloud file sharing |
infra/kubernetes/forgejo/ |
Forgejo git server |
infra/kubernetes/*/postgres-cluster.yaml |
CloudNative-PG clusters |
infra/kubernetes/*/scheduled-backup.yaml |
Daily PostgreSQL backups |
infra/kubernetes/cloudnative-pg/ |
CloudNative-PG operator |
infra/kubernetes/sops-secrets-operator/ |
SOPS operator |
infra/kubernetes/crowdsec/ |
CrowdSec IDS/IPS deployment |