Observability¶
The observability stack provides visibility into what's happening across the infrastructure - server health, application performance, and Kubernetes workloads. Everything is accessible through Grafana at grafana.minnova.io.
Stack¶
All components run on K3s and are deployed via ArgoCD:
| Component | Namespace | Purpose |
|---|---|---|
| Grafana | monitoring | Dashboards and visualization |
| Prometheus | monitoring | Metrics storage and querying |
| Loki | monitoring | Log aggregation |
| Alloy | monitoring | Pod log collection via Kubernetes API |
| Node Exporter | monitoring | System metrics (CPU, memory, disk, network) |
| kube-state-metrics | monitoring | Kubernetes object metrics |
Architecture¶
flowchart LR
NE[Node Exporter] -->|metrics| PR[Prometheus] --> GF[Grafana]
KSM[kube-state-metrics] -->|k8s metrics| PR
CNPG[CloudNative-PG] -->|pg metrics| PR
Alloy[Alloy] -->|logs| Loki --> GF
Metrics¶
Node Exporter runs as a DaemonSet and exposes metrics about the host system - CPU usage, memory, disk space, network traffic, etc.
kube-state-metrics exposes Kubernetes object metrics - pod status, deployment replicas, PVC usage, etc.
CloudNative-PG exports PostgreSQL metrics via PodMonitor - connections, queries, replication status, etc.
Prometheus scrapes all these endpoints and stores the data as time series with short retention to keep disk usage low. It handles aggregation and provides PromQL for querying. Configured to scrape PodMonitors/ServiceMonitors from all namespaces.
Logs¶
Alloy tails pod logs via the Kubernetes API (no node-level file access) and ships them to Loki. Logs are available in Grafana through the Loki datasource with short retention tuned for the single-node cluster.
Dashboards¶
Pre-installed from kube-prometheus-stack:
- Kubernetes cluster/namespace/pod resource dashboards
- Node Exporter dashboards
Manually imported:
- CloudNative-PG (ID: 20417) - PostgreSQL cluster monitoring
- Node Exporter Full (ID: 1860) - Detailed host metrics
Authentication¶
Grafana uses Cloudflare Access for authentication, which chains back to Authentik. When you visit grafana.minnova.io:
- Cloudflare Access intercepts the request and checks if you're authenticated
- If not, it redirects to Authentik for login
- After successful login, Cloudflare adds your email to a header (
Cf-Access-Authenticated-User-Email) - Grafana reads this header and auto-creates an account for you
The result is that anyone who can authenticate with Authentik automatically gets a Grafana account. No separate user management needed in Grafana itself.
Retention¶
Short-term retention on local PVCs to keep disk usage low. Tune values in the Helm chart configs if more history is needed.
Alerting¶
Alerts are sent to Discord using Prometheus Alertmanager. The setup uses:
- AlertmanagerConfig CRD - Defines routing rules and Discord receiver
- SOPS Secret - Stores Discord webhook URL securely
- kube-prometheus-stack default rules - Built-in alerts for K8s, node, and Prometheus issues
- Config lives in
infra/kubernetes/monitoring/alertmanager-config.yaml
Known Limitations¶
The prometheus-operator doesn't support webhook_url_file for Discord yet (Issue #7159). Use the AlertmanagerConfig CRD with apiURL secret reference instead of raw helm values.
Current Limitations¶
Node Exporter is only deployed on the apps server (K3s node). Headscale and bastion servers are not yet monitored.
To get full infrastructure monitoring, we need to deploy standalone Node Exporter on headscale and bastion. See Implementation Status for progress.
Configuration¶
Deployment is GitOps-managed by ArgoCD:
infra/argocd/apps/monitoring.yamlinstalls kube-prometheus-stack withinfra/kubernetes/monitoring/values.yamlinfra/argocd/apps/loki.yamldeploys Loki withinfra/kubernetes/monitoring/loki-values.yamlinfra/argocd/apps/alloy.yamldeploys Alloy log collection withinfra/kubernetes/monitoring/alloy-values.yaml
Edit the values files and let ArgoCD sync to apply changes.
Web Analytics (Umami)¶
Umami is a privacy-focused, self-hosted web analytics platform. It runs on K3s with PostgreSQL (CloudNative-PG) as the backend.
| Component | Details |
|---|---|
| URL | analytics.minnova.io |
| Database | PostgreSQL via CloudNative-PG |
| Storage | 5Gi PVC |
| Authentication | Umami's built-in username/password |
Why Umami?¶
- Privacy-first: No cookies, GDPR compliant out of the box
- Self-hosted: Full data ownership, no third-party tracking
- Lightweight: ~2KB tracking script, minimal performance impact
- Simple: Clean UI, easy to understand metrics
Zero Trust Compatibility¶
Umami is intentionally not behind Cloudflare Zero Trust (Access). This is a deliberate architecture decision:
- Analytics endpoints must be public - The tracking script (
/api/send) needs to accept POST requests from external sites' visitors - Cloudflare Access blocks unauthenticated requests - Would prevent analytics from being recorded
- Umami has built-in authentication - The dashboard requires login, protecting sensitive data
For internal services that need authentication, Cloudflare Access with Authentik OIDC remains the standard approach. Umami is the exception because its primary function requires public accessibility.
Configuration¶
Deployment is GitOps-managed by ArgoCD via the App-of-Apps generator (auto-discovers infra/kubernetes/umami/).
Key files:
- infra/kubernetes/umami/deployment.yaml - Umami deployment
- infra/kubernetes/umami/postgres-cluster.yaml - PostgreSQL cluster
- infra/kubernetes/umami/ingress.yaml - Traefik IngressRoute