Skip to content

Observability

The observability stack provides visibility into what's happening across the infrastructure - server health, application performance, and Kubernetes workloads. Everything is accessible through Grafana at grafana.minnova.io.

Stack

All components run on K3s and are deployed via ArgoCD:

Component Namespace Purpose
Grafana monitoring Dashboards and visualization
Prometheus monitoring Metrics storage and querying
Loki monitoring Log aggregation
Alloy monitoring Pod log collection via Kubernetes API
Node Exporter monitoring System metrics (CPU, memory, disk, network)
kube-state-metrics monitoring Kubernetes object metrics

Architecture

flowchart LR
    NE[Node Exporter] -->|metrics| PR[Prometheus] --> GF[Grafana]
    KSM[kube-state-metrics] -->|k8s metrics| PR
    CNPG[CloudNative-PG] -->|pg metrics| PR
    Alloy[Alloy] -->|logs| Loki --> GF

Metrics

Node Exporter runs as a DaemonSet and exposes metrics about the host system - CPU usage, memory, disk space, network traffic, etc.

kube-state-metrics exposes Kubernetes object metrics - pod status, deployment replicas, PVC usage, etc.

CloudNative-PG exports PostgreSQL metrics via PodMonitor - connections, queries, replication status, etc.

Prometheus scrapes all these endpoints and stores the data as time series with short retention to keep disk usage low. It handles aggregation and provides PromQL for querying. Configured to scrape PodMonitors/ServiceMonitors from all namespaces.

Logs

Alloy tails pod logs via the Kubernetes API (no node-level file access) and ships them to Loki. Logs are available in Grafana through the Loki datasource with short retention tuned for the single-node cluster.

Dashboards

Pre-installed from kube-prometheus-stack:

  • Kubernetes cluster/namespace/pod resource dashboards
  • Node Exporter dashboards

Manually imported:

  • CloudNative-PG (ID: 20417) - PostgreSQL cluster monitoring
  • Node Exporter Full (ID: 1860) - Detailed host metrics

Authentication

Grafana uses Cloudflare Access for authentication, which chains back to Authentik. When you visit grafana.minnova.io:

  1. Cloudflare Access intercepts the request and checks if you're authenticated
  2. If not, it redirects to Authentik for login
  3. After successful login, Cloudflare adds your email to a header (Cf-Access-Authenticated-User-Email)
  4. Grafana reads this header and auto-creates an account for you

The result is that anyone who can authenticate with Authentik automatically gets a Grafana account. No separate user management needed in Grafana itself.

Retention

Short-term retention on local PVCs to keep disk usage low. Tune values in the Helm chart configs if more history is needed.

Alerting

Alerts are sent to Discord using Prometheus Alertmanager. The setup uses:

  • AlertmanagerConfig CRD - Defines routing rules and Discord receiver
  • SOPS Secret - Stores Discord webhook URL securely
  • kube-prometheus-stack default rules - Built-in alerts for K8s, node, and Prometheus issues
  • Config lives in infra/kubernetes/monitoring/alertmanager-config.yaml

Known Limitations

The prometheus-operator doesn't support webhook_url_file for Discord yet (Issue #7159). Use the AlertmanagerConfig CRD with apiURL secret reference instead of raw helm values.

Current Limitations

Node Exporter is only deployed on the apps server (K3s node). Headscale and bastion servers are not yet monitored.

To get full infrastructure monitoring, we need to deploy standalone Node Exporter on headscale and bastion. See Implementation Status for progress.

Configuration

Deployment is GitOps-managed by ArgoCD:

  • infra/argocd/apps/monitoring.yaml installs kube-prometheus-stack with infra/kubernetes/monitoring/values.yaml
  • infra/argocd/apps/loki.yaml deploys Loki with infra/kubernetes/monitoring/loki-values.yaml
  • infra/argocd/apps/alloy.yaml deploys Alloy log collection with infra/kubernetes/monitoring/alloy-values.yaml

Edit the values files and let ArgoCD sync to apply changes.

Web Analytics (Umami)

Umami is a privacy-focused, self-hosted web analytics platform. It runs on K3s with PostgreSQL (CloudNative-PG) as the backend.

Component Details
URL analytics.minnova.io
Database PostgreSQL via CloudNative-PG
Storage 5Gi PVC
Authentication Umami's built-in username/password

Why Umami?

  • Privacy-first: No cookies, GDPR compliant out of the box
  • Self-hosted: Full data ownership, no third-party tracking
  • Lightweight: ~2KB tracking script, minimal performance impact
  • Simple: Clean UI, easy to understand metrics

Zero Trust Compatibility

Umami is intentionally not behind Cloudflare Zero Trust (Access). This is a deliberate architecture decision:

  1. Analytics endpoints must be public - The tracking script (/api/send) needs to accept POST requests from external sites' visitors
  2. Cloudflare Access blocks unauthenticated requests - Would prevent analytics from being recorded
  3. Umami has built-in authentication - The dashboard requires login, protecting sensitive data

For internal services that need authentication, Cloudflare Access with Authentik OIDC remains the standard approach. Umami is the exception because its primary function requires public accessibility.

Configuration

Deployment is GitOps-managed by ArgoCD via the App-of-Apps generator (auto-discovers infra/kubernetes/umami/).

Key files: - infra/kubernetes/umami/deployment.yaml - Umami deployment - infra/kubernetes/umami/postgres-cluster.yaml - PostgreSQL cluster - infra/kubernetes/umami/ingress.yaml - Traefik IngressRoute