docs(08-02): complete Promtail to Alloy migration plan
Some checks failed
Build and Push / build (push) Has been cancelled
Some checks failed
Build and Push / build (push) Has been cancelled
Tasks completed: 2/2 - Deploy Grafana Alloy via Helm (DaemonSet on all 5 nodes) - Verify log flow and remove Promtail SUMMARY: .planning/phases/08-observability-stack/08-02-SUMMARY.md Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -10,11 +10,11 @@ See: .planning/PROJECT.md (updated 2026-02-01)
|
|||||||
## Current Position
|
## Current Position
|
||||||
|
|
||||||
Phase: 8 of 9 (Observability Stack) - IN PROGRESS
|
Phase: 8 of 9 (Observability Stack) - IN PROGRESS
|
||||||
Plan: 1 of 3 in current phase - COMPLETE
|
Plan: 2 of 3 in current phase - COMPLETE
|
||||||
Status: In progress
|
Status: In progress
|
||||||
Last activity: 2026-02-03 — Completed 08-01-PLAN.md (TaskPlanner /metrics and ServiceMonitor)
|
Last activity: 2026-02-03 — Completed 08-02-PLAN.md (Promtail to Alloy Migration)
|
||||||
|
|
||||||
Progress: [█████████████████████░░░░░░░░░] 84% (21/25 plans complete)
|
Progress: [██████████████████████░░░░░░░░] 88% (22/25 plans complete)
|
||||||
|
|
||||||
## Performance Metrics
|
## Performance Metrics
|
||||||
|
|
||||||
@@ -26,8 +26,8 @@ Progress: [█████████████████████░░
|
|||||||
- Requirements satisfied: 31/31
|
- Requirements satisfied: 31/31
|
||||||
|
|
||||||
**v2.0 Progress:**
|
**v2.0 Progress:**
|
||||||
- Plans completed: 3/7
|
- Plans completed: 4/7
|
||||||
- Total execution time: 30 min
|
- Total execution time: 38 min
|
||||||
|
|
||||||
**By Phase (v1.0):**
|
**By Phase (v1.0):**
|
||||||
|
|
||||||
@@ -45,7 +45,7 @@ Progress: [█████████████████████░░
|
|||||||
| Phase | Plans | Total | Avg/Plan |
|
| Phase | Plans | Total | Avg/Plan |
|
||||||
|-------|-------|-------|----------|
|
|-------|-------|-------|----------|
|
||||||
| 07-gitops-foundation | 2/2 | 26 min | 13 min |
|
| 07-gitops-foundation | 2/2 | 26 min | 13 min |
|
||||||
| 08-observability-stack | 1/3 | 4 min | 4 min |
|
| 08-observability-stack | 2/3 | 12 min | 6 min |
|
||||||
|
|
||||||
## Accumulated Context
|
## Accumulated Context
|
||||||
|
|
||||||
@@ -72,6 +72,11 @@ For v2.0, key decisions from research:
|
|||||||
- Use prom-client default metrics only (no custom metrics for initial setup)
|
- Use prom-client default metrics only (no custom metrics for initial setup)
|
||||||
- ServiceMonitor enabled by default in values.yaml
|
- ServiceMonitor enabled by default in values.yaml
|
||||||
|
|
||||||
|
**From Phase 8-02:**
|
||||||
|
- Alloy uses River config language (not YAML)
|
||||||
|
- Match Promtail labels for Loki query compatibility
|
||||||
|
- Control-plane node tolerations required for full DaemonSet coverage
|
||||||
|
|
||||||
### Pending Todos
|
### Pending Todos
|
||||||
|
|
||||||
- Deploy Gitea Actions runner for automatic CI builds
|
- Deploy Gitea Actions runner for automatic CI builds
|
||||||
@@ -83,10 +88,10 @@ For v2.0, key decisions from research:
|
|||||||
|
|
||||||
## Session Continuity
|
## Session Continuity
|
||||||
|
|
||||||
Last session: 2026-02-03 21:08 UTC
|
Last session: 2026-02-03 21:12 UTC
|
||||||
Stopped at: Completed 08-01-PLAN.md
|
Stopped at: Completed 08-02-PLAN.md
|
||||||
Resume file: None
|
Resume file: None
|
||||||
|
|
||||||
---
|
---
|
||||||
*State initialized: 2026-01-29*
|
*State initialized: 2026-01-29*
|
||||||
*Last updated: 2026-02-03 — Completed 08-01-PLAN.md (TaskPlanner /metrics and ServiceMonitor)*
|
*Last updated: 2026-02-03 — Completed 08-02-PLAN.md (Promtail to Alloy Migration)*
|
||||||
|
|||||||
114
.planning/phases/08-observability-stack/08-02-SUMMARY.md
Normal file
114
.planning/phases/08-observability-stack/08-02-SUMMARY.md
Normal file
@@ -0,0 +1,114 @@
|
|||||||
|
---
|
||||||
|
phase: 08-observability-stack
|
||||||
|
plan: 02
|
||||||
|
subsystem: infra
|
||||||
|
tags: [alloy, grafana, loki, logging, daemonset, helm]
|
||||||
|
|
||||||
|
# Dependency graph
|
||||||
|
requires:
|
||||||
|
- phase: 08-01
|
||||||
|
provides: Prometheus ServiceMonitor pattern for TaskPlanner
|
||||||
|
provides:
|
||||||
|
- Grafana Alloy DaemonSet replacing Promtail
|
||||||
|
- Log forwarding to Loki via loki.write endpoint
|
||||||
|
- Helm chart wrapper for alloy configuration
|
||||||
|
affects: [08-03-verification, future-logging]
|
||||||
|
|
||||||
|
# Tech tracking
|
||||||
|
tech-stack:
|
||||||
|
added: [grafana-alloy, river-config]
|
||||||
|
patterns: [daemonset-tolerations, helm-umbrella-chart]
|
||||||
|
|
||||||
|
key-files:
|
||||||
|
created:
|
||||||
|
- helm/alloy/Chart.yaml
|
||||||
|
- helm/alloy/values.yaml
|
||||||
|
modified: []
|
||||||
|
|
||||||
|
key-decisions:
|
||||||
|
- "Match Promtail labels (namespace, pod, container) for query compatibility"
|
||||||
|
- "Add control-plane tolerations to run on all 5 nodes"
|
||||||
|
- "Disable Promtail in loki-stack rather than manual delete"
|
||||||
|
|
||||||
|
patterns-established:
|
||||||
|
- "River config: Alloy uses River language not YAML for log pipelines"
|
||||||
|
- "DaemonSet tolerations: control-plane nodes need explicit tolerations"
|
||||||
|
|
||||||
|
# Metrics
|
||||||
|
duration: 8min
|
||||||
|
completed: 2026-02-03
|
||||||
|
---
|
||||||
|
|
||||||
|
# Phase 8 Plan 02: Promtail to Alloy Migration Summary
|
||||||
|
|
||||||
|
**Grafana Alloy DaemonSet deployed on all 5 nodes, forwarding logs to Loki with Promtail removed**
|
||||||
|
|
||||||
|
## Performance
|
||||||
|
|
||||||
|
- **Duration:** 8 min
|
||||||
|
- **Started:** 2026-02-03T21:04:24Z
|
||||||
|
- **Completed:** 2026-02-03T21:12:07Z
|
||||||
|
- **Tasks:** 2
|
||||||
|
- **Files created:** 2
|
||||||
|
|
||||||
|
## Accomplishments
|
||||||
|
- Deployed Grafana Alloy as DaemonSet via Helm umbrella chart
|
||||||
|
- Configured River config for Kubernetes pod log discovery with matching labels
|
||||||
|
- Verified log flow to Loki before and after Promtail removal
|
||||||
|
- Cleanly removed Promtail by disabling in loki-stack values
|
||||||
|
|
||||||
|
## Task Commits
|
||||||
|
|
||||||
|
Each task was committed atomically:
|
||||||
|
|
||||||
|
1. **Task 1: Deploy Grafana Alloy via Helm** - `c295228` (feat)
|
||||||
|
2. **Task 2: Verify log flow and remove Promtail** - no code changes (kubectl operations only)
|
||||||
|
|
||||||
|
**Plan metadata:** Pending
|
||||||
|
|
||||||
|
## Files Created/Modified
|
||||||
|
- `helm/alloy/Chart.yaml` - Umbrella chart for grafana/alloy dependency
|
||||||
|
- `helm/alloy/values.yaml` - Alloy River config for Loki forwarding with DaemonSet tolerations
|
||||||
|
|
||||||
|
## Decisions Made
|
||||||
|
- **Match Promtail labels:** Kept same label extraction (namespace, pod, container) for query compatibility with existing dashboards
|
||||||
|
- **Control-plane tolerations:** Added tolerations for master/control-plane nodes to ensure Alloy runs on all 5 nodes (not just 2 workers)
|
||||||
|
- **Promtail removal via Helm:** Upgraded loki-stack with `promtail.enabled=false` rather than manual deletion for clean state management
|
||||||
|
|
||||||
|
## Deviations from Plan
|
||||||
|
|
||||||
|
### Auto-fixed Issues
|
||||||
|
|
||||||
|
**1. [Rule 3 - Blocking] Installed Helm locally**
|
||||||
|
- **Found during:** Task 1 (helm dependency build)
|
||||||
|
- **Issue:** helm command not found on local system
|
||||||
|
- **Fix:** Downloaded and installed Helm 3.20.0 to ~/.local/bin/
|
||||||
|
- **Files modified:** None (binary installation)
|
||||||
|
- **Verification:** `helm version` returns correct version
|
||||||
|
- **Committed in:** N/A (environment setup)
|
||||||
|
|
||||||
|
**2. [Rule 1 - Bug] Added control-plane tolerations**
|
||||||
|
- **Found during:** Task 1 (DaemonSet verification)
|
||||||
|
- **Issue:** Alloy only scheduled on 2 nodes (workers), not all 5
|
||||||
|
- **Fix:** Added tolerations for node-role.kubernetes.io/master and control-plane
|
||||||
|
- **Files modified:** helm/alloy/values.yaml
|
||||||
|
- **Verification:** DaemonSet shows DESIRED=5, READY=5
|
||||||
|
- **Committed in:** c295228 (Task 1 commit)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Total deviations:** 2 auto-fixed (1 blocking, 1 bug)
|
||||||
|
**Impact on plan:** Both fixes necessary for correct operation. No scope creep.
|
||||||
|
|
||||||
|
## Issues Encountered
|
||||||
|
- Initial "entry too far behind" errors in Alloy logs - expected Loki behavior rejecting old log entries during catch-up, settles automatically
|
||||||
|
- TaskPlanner logs show "too many open files" warning - unrelated to Alloy migration, pre-existing application issue
|
||||||
|
|
||||||
|
## Next Phase Readiness
|
||||||
|
- Alloy collecting logs from all pods cluster-wide
|
||||||
|
- Loki receiving logs via Alloy loki.write endpoint
|
||||||
|
- Ready for 08-03 verification of end-to-end observability
|
||||||
|
|
||||||
|
---
|
||||||
|
*Phase: 08-observability-stack*
|
||||||
|
*Completed: 2026-02-03*
|
||||||
Reference in New Issue
Block a user