Files
Thomas Richter de82532bcd
Some checks failed
Build and Push / build (push) Has been cancelled
docs(08-02): complete Promtail to Alloy migration plan
Tasks completed: 2/2
- Deploy Grafana Alloy via Helm (DaemonSet on all 5 nodes)
- Verify log flow and remove Promtail

SUMMARY: .planning/phases/08-observability-stack/08-02-SUMMARY.md

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 22:13:22 +01:00

115 lines
3.9 KiB
Markdown

---
phase: 08-observability-stack
plan: 02
subsystem: infra
tags: [alloy, grafana, loki, logging, daemonset, helm]
# Dependency graph
requires:
- phase: 08-01
provides: Prometheus ServiceMonitor pattern for TaskPlanner
provides:
- Grafana Alloy DaemonSet replacing Promtail
- Log forwarding to Loki via loki.write endpoint
- Helm chart wrapper for alloy configuration
affects: [08-03-verification, future-logging]
# Tech tracking
tech-stack:
added: [grafana-alloy, river-config]
patterns: [daemonset-tolerations, helm-umbrella-chart]
key-files:
created:
- helm/alloy/Chart.yaml
- helm/alloy/values.yaml
modified: []
key-decisions:
- "Match Promtail labels (namespace, pod, container) for query compatibility"
- "Add control-plane tolerations to run on all 5 nodes"
- "Disable Promtail in loki-stack rather than manual delete"
patterns-established:
- "River config: Alloy uses River language not YAML for log pipelines"
- "DaemonSet tolerations: control-plane nodes need explicit tolerations"
# Metrics
duration: 8min
completed: 2026-02-03
---
# Phase 8 Plan 02: Promtail to Alloy Migration Summary
**Grafana Alloy DaemonSet deployed on all 5 nodes, forwarding logs to Loki with Promtail removed**
## Performance
- **Duration:** 8 min
- **Started:** 2026-02-03T21:04:24Z
- **Completed:** 2026-02-03T21:12:07Z
- **Tasks:** 2
- **Files created:** 2
## Accomplishments
- Deployed Grafana Alloy as DaemonSet via Helm umbrella chart
- Configured River config for Kubernetes pod log discovery with matching labels
- Verified log flow to Loki before and after Promtail removal
- Cleanly removed Promtail by disabling in loki-stack values
## Task Commits
Each task was committed atomically:
1. **Task 1: Deploy Grafana Alloy via Helm** - `c295228` (feat)
2. **Task 2: Verify log flow and remove Promtail** - no code changes (kubectl operations only)
**Plan metadata:** Pending
## Files Created/Modified
- `helm/alloy/Chart.yaml` - Umbrella chart for grafana/alloy dependency
- `helm/alloy/values.yaml` - Alloy River config for Loki forwarding with DaemonSet tolerations
## Decisions Made
- **Match Promtail labels:** Kept same label extraction (namespace, pod, container) for query compatibility with existing dashboards
- **Control-plane tolerations:** Added tolerations for master/control-plane nodes to ensure Alloy runs on all 5 nodes (not just 2 workers)
- **Promtail removal via Helm:** Upgraded loki-stack with `promtail.enabled=false` rather than manual deletion for clean state management
## Deviations from Plan
### Auto-fixed Issues
**1. [Rule 3 - Blocking] Installed Helm locally**
- **Found during:** Task 1 (helm dependency build)
- **Issue:** helm command not found on local system
- **Fix:** Downloaded and installed Helm 3.20.0 to ~/.local/bin/
- **Files modified:** None (binary installation)
- **Verification:** `helm version` returns correct version
- **Committed in:** N/A (environment setup)
**2. [Rule 1 - Bug] Added control-plane tolerations**
- **Found during:** Task 1 (DaemonSet verification)
- **Issue:** Alloy only scheduled on 2 nodes (workers), not all 5
- **Fix:** Added tolerations for node-role.kubernetes.io/master and control-plane
- **Files modified:** helm/alloy/values.yaml
- **Verification:** DaemonSet shows DESIRED=5, READY=5
- **Committed in:** c295228 (Task 1 commit)
---
**Total deviations:** 2 auto-fixed (1 blocking, 1 bug)
**Impact on plan:** Both fixes necessary for correct operation. No scope creep.
## Issues Encountered
- Initial "entry too far behind" errors in Alloy logs - expected Loki behavior rejecting old log entries during catch-up, settles automatically
- TaskPlanner logs show "too many open files" warning - unrelated to Alloy migration, pre-existing application issue
## Next Phase Readiness
- Alloy collecting logs from all pods cluster-wide
- Loki receiving logs via Alloy loki.write endpoint
- Ready for 08-03 verification of end-to-end observability
---
*Phase: 08-observability-stack*
*Completed: 2026-02-03*