Files
taskplaner/.planning/phases/08-observability-stack/08-02-SUMMARY.md
Thomas Richter de82532bcd
Some checks failed
Build and Push / build (push) Has been cancelled
docs(08-02): complete Promtail to Alloy migration plan
Tasks completed: 2/2
- Deploy Grafana Alloy via Helm (DaemonSet on all 5 nodes)
- Verify log flow and remove Promtail

SUMMARY: .planning/phases/08-observability-stack/08-02-SUMMARY.md

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 22:13:22 +01:00

3.9 KiB

phase, plan, subsystem, tags, requires, provides, affects, tech-stack, key-files, key-decisions, patterns-established, duration, completed
phase plan subsystem tags requires provides affects tech-stack key-files key-decisions patterns-established duration completed
08-observability-stack 02 infra
alloy
grafana
loki
logging
daemonset
helm
phase provides
08-01 Prometheus ServiceMonitor pattern for TaskPlanner
Grafana Alloy DaemonSet replacing Promtail
Log forwarding to Loki via loki.write endpoint
Helm chart wrapper for alloy configuration
08-03-verification
future-logging
added patterns
grafana-alloy
river-config
daemonset-tolerations
helm-umbrella-chart
created modified
helm/alloy/Chart.yaml
helm/alloy/values.yaml
Match Promtail labels (namespace, pod, container) for query compatibility
Add control-plane tolerations to run on all 5 nodes
Disable Promtail in loki-stack rather than manual delete
River config: Alloy uses River language not YAML for log pipelines
DaemonSet tolerations: control-plane nodes need explicit tolerations
8min 2026-02-03

Phase 8 Plan 02: Promtail to Alloy Migration Summary

Grafana Alloy DaemonSet deployed on all 5 nodes, forwarding logs to Loki with Promtail removed

Performance

  • Duration: 8 min
  • Started: 2026-02-03T21:04:24Z
  • Completed: 2026-02-03T21:12:07Z
  • Tasks: 2
  • Files created: 2

Accomplishments

  • Deployed Grafana Alloy as DaemonSet via Helm umbrella chart
  • Configured River config for Kubernetes pod log discovery with matching labels
  • Verified log flow to Loki before and after Promtail removal
  • Cleanly removed Promtail by disabling in loki-stack values

Task Commits

Each task was committed atomically:

  1. Task 1: Deploy Grafana Alloy via Helm - c295228 (feat)
  2. Task 2: Verify log flow and remove Promtail - no code changes (kubectl operations only)

Plan metadata: Pending

Files Created/Modified

  • helm/alloy/Chart.yaml - Umbrella chart for grafana/alloy dependency
  • helm/alloy/values.yaml - Alloy River config for Loki forwarding with DaemonSet tolerations

Decisions Made

  • Match Promtail labels: Kept same label extraction (namespace, pod, container) for query compatibility with existing dashboards
  • Control-plane tolerations: Added tolerations for master/control-plane nodes to ensure Alloy runs on all 5 nodes (not just 2 workers)
  • Promtail removal via Helm: Upgraded loki-stack with promtail.enabled=false rather than manual deletion for clean state management

Deviations from Plan

Auto-fixed Issues

1. [Rule 3 - Blocking] Installed Helm locally

  • Found during: Task 1 (helm dependency build)
  • Issue: helm command not found on local system
  • Fix: Downloaded and installed Helm 3.20.0 to ~/.local/bin/
  • Files modified: None (binary installation)
  • Verification: helm version returns correct version
  • Committed in: N/A (environment setup)

2. [Rule 1 - Bug] Added control-plane tolerations

  • Found during: Task 1 (DaemonSet verification)
  • Issue: Alloy only scheduled on 2 nodes (workers), not all 5
  • Fix: Added tolerations for node-role.kubernetes.io/master and control-plane
  • Files modified: helm/alloy/values.yaml
  • Verification: DaemonSet shows DESIRED=5, READY=5
  • Committed in: c295228 (Task 1 commit)

Total deviations: 2 auto-fixed (1 blocking, 1 bug) Impact on plan: Both fixes necessary for correct operation. No scope creep.

Issues Encountered

  • Initial "entry too far behind" errors in Alloy logs - expected Loki behavior rejecting old log entries during catch-up, settles automatically
  • TaskPlanner logs show "too many open files" warning - unrelated to Alloy migration, pre-existing application issue

Next Phase Readiness

  • Alloy collecting logs from all pods cluster-wide
  • Loki receiving logs via Alloy loki.write endpoint
  • Ready for 08-03 verification of end-to-end observability

Phase: 08-observability-stack Completed: 2026-02-03