Files
Thomas Richter 8c3dc137ca docs(08): create phase plan
Phase 08: Observability Stack
- 3 plans in 2 waves
- Wave 1: 08-01 (metrics), 08-02 (Alloy) - parallel
- Wave 2: 08-03 (verification) - depends on both
- Ready for execution

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 21:24:24 +01:00

6.9 KiB

phase, plan, type, wave, depends_on, files_modified, autonomous, must_haves
phase plan type wave depends_on files_modified autonomous must_haves
08-observability-stack 02 execute 1
helm/alloy/values.yaml (new)
helm/alloy/Chart.yaml (new)
true
truths artifacts key_links
Alloy DaemonSet runs on all nodes
Alloy forwards logs to Loki
Promtail DaemonSet is removed
path provides contains
helm/alloy/Chart.yaml Alloy Helm chart wrapper name: alloy
path provides contains
helm/alloy/values.yaml Alloy configuration for Loki forwarding loki.write
from to via pattern
Alloy pods loki-stack:3100 loki.write endpoint endpoint.*loki
Migrate from Promtail to Grafana Alloy for log collection

Purpose: Replace EOL Promtail (March 2026) with Grafana Alloy DaemonSet (OBS-04) Output: Alloy DaemonSet forwarding logs to Loki, Promtail removed

<execution_context> @/home/tho/.claude/get-shit-done/workflows/execute-plan.md @/home/tho/.claude/get-shit-done/templates/summary.md </execution_context>

@.planning/PROJECT.md @.planning/ROADMAP.md @.planning/STATE.md @.planning/phases/08-observability-stack/CONTEXT.md Task 1: Deploy Grafana Alloy via Helm helm/alloy/Chart.yaml helm/alloy/values.yaml 1. Create helm/alloy directory and Chart.yaml as umbrella chart: ```yaml apiVersion: v2 name: alloy description: Grafana Alloy log collector version: 0.1.0 dependencies: - name: alloy version: "0.12.*" repository: https://grafana.github.io/helm-charts ```
2. Create helm/alloy/values.yaml with minimal config for Loki forwarding:
   ```yaml
   alloy:
     alloy:
       configMap:
         content: |
           // Discover pods and collect logs
           discovery.kubernetes "pods" {
             role = "pod"
           }

           // Relabel to extract pod metadata
           discovery.relabel "pods" {
             targets = discovery.kubernetes.pods.targets

             rule {
               source_labels = ["__meta_kubernetes_namespace"]
               target_label  = "namespace"
             }
             rule {
               source_labels = ["__meta_kubernetes_pod_name"]
               target_label  = "pod"
             }
             rule {
               source_labels = ["__meta_kubernetes_pod_container_name"]
               target_label  = "container"
             }
           }

           // Collect logs from discovered pods
           loki.source.kubernetes "pods" {
             targets    = discovery.relabel.pods.output
             forward_to = [loki.write.default.receiver]
           }

           // Forward to Loki
           loki.write "default" {
             endpoint {
               url = "http://loki-stack.monitoring.svc.cluster.local:3100/loki/api/v1/push"
             }
           }

     controller:
       type: daemonset

     serviceAccount:
       create: true
   ```

3. Add Grafana Helm repo and build dependencies:
   ```bash
   helm repo add grafana https://grafana.github.io/helm-charts
   helm repo update
   cd helm/alloy && helm dependency build
   ```

4. Deploy Alloy to monitoring namespace:
   ```bash
   helm upgrade --install alloy ./helm/alloy -n monitoring --create-namespace
   ```

5. Verify Alloy pods are running:
   ```bash
   kubectl get pods -n monitoring -l app.kubernetes.io/name=alloy
   ```
   Expected: 5 pods (one per node) in Running state

NOTE:
- Alloy uses River configuration language (not YAML)
- Labels (namespace, pod, container) match existing Promtail labels for query compatibility
- Loki endpoint is cluster-internal: loki-stack.monitoring.svc.cluster.local:3100
1. kubectl get pods -n monitoring -l app.kubernetes.io/name=alloy shows 5 Running pods 2. kubectl logs -n monitoring -l app.kubernetes.io/name=alloy --tail=20 shows no errors 3. Alloy logs show "loki.write" component started successfully Alloy DaemonSet deployed with 5 pods collecting logs and forwarding to Loki Task 2: Verify log flow and remove Promtail (no files - kubectl operations) 1. Generate a test log by restarting TaskPlanner pod: ```bash kubectl rollout restart deployment taskplaner ```
2. Wait for pod to be ready:
   ```bash
   kubectl rollout status deployment taskplaner --timeout=60s
   ```

3. Verify logs appear in Loki via LogCLI or curl:
   ```bash
   # Query recent TaskPlanner logs via Loki API
   kubectl run --rm -it logtest --image=curlimages/curl --restart=Never -- \
     curl -s "http://loki-stack.monitoring.svc.cluster.local:3100/loki/api/v1/query_range" \
     --data-urlencode 'query={namespace="default",pod=~"taskplaner.*"}' \
     --data-urlencode 'limit=5'
   ```
   Expected: JSON response with "result" containing log entries

4. Once logs confirmed flowing via Alloy, remove Promtail:
   ```bash
   # Find and delete Promtail release
   helm list -n monitoring | grep promtail
   # If promtail found:
   helm uninstall loki-stack-promtail -n monitoring 2>/dev/null || \
   helm uninstall promtail -n monitoring 2>/dev/null || \
   kubectl delete daemonset -n monitoring -l app=promtail
   ```

5. Verify Promtail is gone:
   ```bash
   kubectl get pods -n monitoring | grep -i promtail
   ```
   Expected: No promtail pods

6. Verify logs still flowing after Promtail removal (repeat step 3)

NOTE: Promtail may be installed as part of loki-stack or separately. Check both.
1. Loki API returns TaskPlanner log entries 2. kubectl get pods -n monitoring shows NO promtail pods 3. kubectl get pods -n monitoring shows Alloy pods still running 4. Second Loki query after Promtail removal still returns logs Logs confirmed flowing from Alloy to Loki, Promtail DaemonSet removed from cluster - [ ] Alloy DaemonSet has 5 Running pods (one per node) - [ ] Alloy pods show no errors in logs - [ ] Loki API returns TaskPlanner log entries - [ ] Promtail pods no longer exist - [ ] Log flow continues after Promtail removal

<success_criteria>

  1. Alloy DaemonSet running on all 5 nodes
  2. Logs from TaskPlanner appear in Loki within 60 seconds of generation
  3. Promtail DaemonSet completely removed
  4. No log collection gap (Alloy verified before Promtail removal) </success_criteria>
After completion, create `.planning/phases/08-observability-stack/08-02-SUMMARY.md`