docs(08): create phase plan
Phase 08: Observability Stack - 3 plans in 2 waves - Wave 1: 08-01 (metrics), 08-02 (Alloy) - parallel - Wave 2: 08-03 (verification) - depends on both - Ready for execution Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
229
.planning/phases/08-observability-stack/08-02-PLAN.md
Normal file
229
.planning/phases/08-observability-stack/08-02-PLAN.md
Normal file
@@ -0,0 +1,229 @@
|
||||
---
|
||||
phase: 08-observability-stack
|
||||
plan: 02
|
||||
type: execute
|
||||
wave: 1
|
||||
depends_on: []
|
||||
files_modified:
|
||||
- helm/alloy/values.yaml (new)
|
||||
- helm/alloy/Chart.yaml (new)
|
||||
autonomous: true
|
||||
|
||||
must_haves:
|
||||
truths:
|
||||
- "Alloy DaemonSet runs on all nodes"
|
||||
- "Alloy forwards logs to Loki"
|
||||
- "Promtail DaemonSet is removed"
|
||||
artifacts:
|
||||
- path: "helm/alloy/Chart.yaml"
|
||||
provides: "Alloy Helm chart wrapper"
|
||||
contains: "name: alloy"
|
||||
- path: "helm/alloy/values.yaml"
|
||||
provides: "Alloy configuration for Loki forwarding"
|
||||
contains: "loki.write"
|
||||
key_links:
|
||||
- from: "Alloy pods"
|
||||
to: "loki-stack:3100"
|
||||
via: "loki.write endpoint"
|
||||
pattern: "endpoint.*loki"
|
||||
---
|
||||
|
||||
<objective>
|
||||
Migrate from Promtail to Grafana Alloy for log collection
|
||||
|
||||
Purpose: Replace EOL Promtail (March 2026) with Grafana Alloy DaemonSet (OBS-04)
|
||||
Output: Alloy DaemonSet forwarding logs to Loki, Promtail removed
|
||||
</objective>
|
||||
|
||||
<execution_context>
|
||||
@/home/tho/.claude/get-shit-done/workflows/execute-plan.md
|
||||
@/home/tho/.claude/get-shit-done/templates/summary.md
|
||||
</execution_context>
|
||||
|
||||
<context>
|
||||
@.planning/PROJECT.md
|
||||
@.planning/ROADMAP.md
|
||||
@.planning/STATE.md
|
||||
@.planning/phases/08-observability-stack/CONTEXT.md
|
||||
</context>
|
||||
|
||||
<tasks>
|
||||
|
||||
<task type="auto">
|
||||
<name>Task 1: Deploy Grafana Alloy via Helm</name>
|
||||
<files>
|
||||
helm/alloy/Chart.yaml
|
||||
helm/alloy/values.yaml
|
||||
</files>
|
||||
<action>
|
||||
1. Create helm/alloy directory and Chart.yaml as umbrella chart:
|
||||
```yaml
|
||||
apiVersion: v2
|
||||
name: alloy
|
||||
description: Grafana Alloy log collector
|
||||
version: 0.1.0
|
||||
dependencies:
|
||||
- name: alloy
|
||||
version: "0.12.*"
|
||||
repository: https://grafana.github.io/helm-charts
|
||||
```
|
||||
|
||||
2. Create helm/alloy/values.yaml with minimal config for Loki forwarding:
|
||||
```yaml
|
||||
alloy:
|
||||
alloy:
|
||||
configMap:
|
||||
content: |
|
||||
// Discover pods and collect logs
|
||||
discovery.kubernetes "pods" {
|
||||
role = "pod"
|
||||
}
|
||||
|
||||
// Relabel to extract pod metadata
|
||||
discovery.relabel "pods" {
|
||||
targets = discovery.kubernetes.pods.targets
|
||||
|
||||
rule {
|
||||
source_labels = ["__meta_kubernetes_namespace"]
|
||||
target_label = "namespace"
|
||||
}
|
||||
rule {
|
||||
source_labels = ["__meta_kubernetes_pod_name"]
|
||||
target_label = "pod"
|
||||
}
|
||||
rule {
|
||||
source_labels = ["__meta_kubernetes_pod_container_name"]
|
||||
target_label = "container"
|
||||
}
|
||||
}
|
||||
|
||||
// Collect logs from discovered pods
|
||||
loki.source.kubernetes "pods" {
|
||||
targets = discovery.relabel.pods.output
|
||||
forward_to = [loki.write.default.receiver]
|
||||
}
|
||||
|
||||
// Forward to Loki
|
||||
loki.write "default" {
|
||||
endpoint {
|
||||
url = "http://loki-stack.monitoring.svc.cluster.local:3100/loki/api/v1/push"
|
||||
}
|
||||
}
|
||||
|
||||
controller:
|
||||
type: daemonset
|
||||
|
||||
serviceAccount:
|
||||
create: true
|
||||
```
|
||||
|
||||
3. Add Grafana Helm repo and build dependencies:
|
||||
```bash
|
||||
helm repo add grafana https://grafana.github.io/helm-charts
|
||||
helm repo update
|
||||
cd helm/alloy && helm dependency build
|
||||
```
|
||||
|
||||
4. Deploy Alloy to monitoring namespace:
|
||||
```bash
|
||||
helm upgrade --install alloy ./helm/alloy -n monitoring --create-namespace
|
||||
```
|
||||
|
||||
5. Verify Alloy pods are running:
|
||||
```bash
|
||||
kubectl get pods -n monitoring -l app.kubernetes.io/name=alloy
|
||||
```
|
||||
Expected: 5 pods (one per node) in Running state
|
||||
|
||||
NOTE:
|
||||
- Alloy uses River configuration language (not YAML)
|
||||
- Labels (namespace, pod, container) match existing Promtail labels for query compatibility
|
||||
- Loki endpoint is cluster-internal: loki-stack.monitoring.svc.cluster.local:3100
|
||||
</action>
|
||||
<verify>
|
||||
1. kubectl get pods -n monitoring -l app.kubernetes.io/name=alloy shows 5 Running pods
|
||||
2. kubectl logs -n monitoring -l app.kubernetes.io/name=alloy --tail=20 shows no errors
|
||||
3. Alloy logs show "loki.write" component started successfully
|
||||
</verify>
|
||||
<done>
|
||||
Alloy DaemonSet deployed with 5 pods collecting logs and forwarding to Loki
|
||||
</done>
|
||||
</task>
|
||||
|
||||
<task type="auto">
|
||||
<name>Task 2: Verify log flow and remove Promtail</name>
|
||||
<files>
|
||||
(no files - kubectl operations)
|
||||
</files>
|
||||
<action>
|
||||
1. Generate a test log by restarting TaskPlanner pod:
|
||||
```bash
|
||||
kubectl rollout restart deployment taskplaner
|
||||
```
|
||||
|
||||
2. Wait for pod to be ready:
|
||||
```bash
|
||||
kubectl rollout status deployment taskplaner --timeout=60s
|
||||
```
|
||||
|
||||
3. Verify logs appear in Loki via LogCLI or curl:
|
||||
```bash
|
||||
# Query recent TaskPlanner logs via Loki API
|
||||
kubectl run --rm -it logtest --image=curlimages/curl --restart=Never -- \
|
||||
curl -s "http://loki-stack.monitoring.svc.cluster.local:3100/loki/api/v1/query_range" \
|
||||
--data-urlencode 'query={namespace="default",pod=~"taskplaner.*"}' \
|
||||
--data-urlencode 'limit=5'
|
||||
```
|
||||
Expected: JSON response with "result" containing log entries
|
||||
|
||||
4. Once logs confirmed flowing via Alloy, remove Promtail:
|
||||
```bash
|
||||
# Find and delete Promtail release
|
||||
helm list -n monitoring | grep promtail
|
||||
# If promtail found:
|
||||
helm uninstall loki-stack-promtail -n monitoring 2>/dev/null || \
|
||||
helm uninstall promtail -n monitoring 2>/dev/null || \
|
||||
kubectl delete daemonset -n monitoring -l app=promtail
|
||||
```
|
||||
|
||||
5. Verify Promtail is gone:
|
||||
```bash
|
||||
kubectl get pods -n monitoring | grep -i promtail
|
||||
```
|
||||
Expected: No promtail pods
|
||||
|
||||
6. Verify logs still flowing after Promtail removal (repeat step 3)
|
||||
|
||||
NOTE: Promtail may be installed as part of loki-stack or separately. Check both.
|
||||
</action>
|
||||
<verify>
|
||||
1. Loki API returns TaskPlanner log entries
|
||||
2. kubectl get pods -n monitoring shows NO promtail pods
|
||||
3. kubectl get pods -n monitoring shows Alloy pods still running
|
||||
4. Second Loki query after Promtail removal still returns logs
|
||||
</verify>
|
||||
<done>
|
||||
Logs confirmed flowing from Alloy to Loki, Promtail DaemonSet removed from cluster
|
||||
</done>
|
||||
</task>
|
||||
|
||||
</tasks>
|
||||
|
||||
<verification>
|
||||
- [ ] Alloy DaemonSet has 5 Running pods (one per node)
|
||||
- [ ] Alloy pods show no errors in logs
|
||||
- [ ] Loki API returns TaskPlanner log entries
|
||||
- [ ] Promtail pods no longer exist
|
||||
- [ ] Log flow continues after Promtail removal
|
||||
</verification>
|
||||
|
||||
<success_criteria>
|
||||
1. Alloy DaemonSet running on all 5 nodes
|
||||
2. Logs from TaskPlanner appear in Loki within 60 seconds of generation
|
||||
3. Promtail DaemonSet completely removed
|
||||
4. No log collection gap (Alloy verified before Promtail removal)
|
||||
</success_criteria>
|
||||
|
||||
<output>
|
||||
After completion, create `.planning/phases/08-observability-stack/08-02-SUMMARY.md`
|
||||
</output>
|
||||
Reference in New Issue
Block a user