Some checks failed
Build and Push / build (push) Has been cancelled
Tasks completed: 2/2 - Deploy Grafana Alloy via Helm (DaemonSet on all 5 nodes) - Verify log flow and remove Promtail SUMMARY: .planning/phases/08-observability-stack/08-02-SUMMARY.md Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
3.9 KiB
3.9 KiB
phase, plan, subsystem, tags, requires, provides, affects, tech-stack, key-files, key-decisions, patterns-established, duration, completed
| phase | plan | subsystem | tags | requires | provides | affects | tech-stack | key-files | key-decisions | patterns-established | duration | completed | |||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 08-observability-stack | 02 | infra |
|
|
|
|
|
|
|
|
8min | 2026-02-03 |
Phase 8 Plan 02: Promtail to Alloy Migration Summary
Grafana Alloy DaemonSet deployed on all 5 nodes, forwarding logs to Loki with Promtail removed
Performance
- Duration: 8 min
- Started: 2026-02-03T21:04:24Z
- Completed: 2026-02-03T21:12:07Z
- Tasks: 2
- Files created: 2
Accomplishments
- Deployed Grafana Alloy as DaemonSet via Helm umbrella chart
- Configured River config for Kubernetes pod log discovery with matching labels
- Verified log flow to Loki before and after Promtail removal
- Cleanly removed Promtail by disabling in loki-stack values
Task Commits
Each task was committed atomically:
- Task 1: Deploy Grafana Alloy via Helm -
c295228(feat) - Task 2: Verify log flow and remove Promtail - no code changes (kubectl operations only)
Plan metadata: Pending
Files Created/Modified
helm/alloy/Chart.yaml- Umbrella chart for grafana/alloy dependencyhelm/alloy/values.yaml- Alloy River config for Loki forwarding with DaemonSet tolerations
Decisions Made
- Match Promtail labels: Kept same label extraction (namespace, pod, container) for query compatibility with existing dashboards
- Control-plane tolerations: Added tolerations for master/control-plane nodes to ensure Alloy runs on all 5 nodes (not just 2 workers)
- Promtail removal via Helm: Upgraded loki-stack with
promtail.enabled=falserather than manual deletion for clean state management
Deviations from Plan
Auto-fixed Issues
1. [Rule 3 - Blocking] Installed Helm locally
- Found during: Task 1 (helm dependency build)
- Issue: helm command not found on local system
- Fix: Downloaded and installed Helm 3.20.0 to ~/.local/bin/
- Files modified: None (binary installation)
- Verification:
helm versionreturns correct version - Committed in: N/A (environment setup)
2. [Rule 1 - Bug] Added control-plane tolerations
- Found during: Task 1 (DaemonSet verification)
- Issue: Alloy only scheduled on 2 nodes (workers), not all 5
- Fix: Added tolerations for node-role.kubernetes.io/master and control-plane
- Files modified: helm/alloy/values.yaml
- Verification: DaemonSet shows DESIRED=5, READY=5
- Committed in:
c295228(Task 1 commit)
Total deviations: 2 auto-fixed (1 blocking, 1 bug) Impact on plan: Both fixes necessary for correct operation. No scope creep.
Issues Encountered
- Initial "entry too far behind" errors in Alloy logs - expected Loki behavior rejecting old log entries during catch-up, settles automatically
- TaskPlanner logs show "too many open files" warning - unrelated to Alloy migration, pre-existing application issue
Next Phase Readiness
- Alloy collecting logs from all pods cluster-wide
- Loki receiving logs via Alloy loki.write endpoint
- Ready for 08-03 verification of end-to-end observability
Phase: 08-observability-stack Completed: 2026-02-03