Phase 08: Observability Stack - 3 plans in 2 waves - Wave 1: 08-01 (metrics), 08-02 (Alloy) - parallel - Wave 2: 08-03 (verification) - depends on both - Ready for execution Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
6.9 KiB
6.9 KiB
phase, plan, type, wave, depends_on, files_modified, autonomous, must_haves
| phase | plan | type | wave | depends_on | files_modified | autonomous | must_haves | ||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 08-observability-stack | 02 | execute | 1 |
|
true |
|
Purpose: Replace EOL Promtail (March 2026) with Grafana Alloy DaemonSet (OBS-04) Output: Alloy DaemonSet forwarding logs to Loki, Promtail removed
<execution_context> @/home/tho/.claude/get-shit-done/workflows/execute-plan.md @/home/tho/.claude/get-shit-done/templates/summary.md </execution_context>
@.planning/PROJECT.md @.planning/ROADMAP.md @.planning/STATE.md @.planning/phases/08-observability-stack/CONTEXT.md Task 1: Deploy Grafana Alloy via Helm helm/alloy/Chart.yaml helm/alloy/values.yaml 1. Create helm/alloy directory and Chart.yaml as umbrella chart: ```yaml apiVersion: v2 name: alloy description: Grafana Alloy log collector version: 0.1.0 dependencies: - name: alloy version: "0.12.*" repository: https://grafana.github.io/helm-charts ```2. Create helm/alloy/values.yaml with minimal config for Loki forwarding:
```yaml
alloy:
alloy:
configMap:
content: |
// Discover pods and collect logs
discovery.kubernetes "pods" {
role = "pod"
}
// Relabel to extract pod metadata
discovery.relabel "pods" {
targets = discovery.kubernetes.pods.targets
rule {
source_labels = ["__meta_kubernetes_namespace"]
target_label = "namespace"
}
rule {
source_labels = ["__meta_kubernetes_pod_name"]
target_label = "pod"
}
rule {
source_labels = ["__meta_kubernetes_pod_container_name"]
target_label = "container"
}
}
// Collect logs from discovered pods
loki.source.kubernetes "pods" {
targets = discovery.relabel.pods.output
forward_to = [loki.write.default.receiver]
}
// Forward to Loki
loki.write "default" {
endpoint {
url = "http://loki-stack.monitoring.svc.cluster.local:3100/loki/api/v1/push"
}
}
controller:
type: daemonset
serviceAccount:
create: true
```
3. Add Grafana Helm repo and build dependencies:
```bash
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
cd helm/alloy && helm dependency build
```
4. Deploy Alloy to monitoring namespace:
```bash
helm upgrade --install alloy ./helm/alloy -n monitoring --create-namespace
```
5. Verify Alloy pods are running:
```bash
kubectl get pods -n monitoring -l app.kubernetes.io/name=alloy
```
Expected: 5 pods (one per node) in Running state
NOTE:
- Alloy uses River configuration language (not YAML)
- Labels (namespace, pod, container) match existing Promtail labels for query compatibility
- Loki endpoint is cluster-internal: loki-stack.monitoring.svc.cluster.local:3100
1. kubectl get pods -n monitoring -l app.kubernetes.io/name=alloy shows 5 Running pods
2. kubectl logs -n monitoring -l app.kubernetes.io/name=alloy --tail=20 shows no errors
3. Alloy logs show "loki.write" component started successfully
Alloy DaemonSet deployed with 5 pods collecting logs and forwarding to Loki
Task 2: Verify log flow and remove Promtail
(no files - kubectl operations)
1. Generate a test log by restarting TaskPlanner pod:
```bash
kubectl rollout restart deployment taskplaner
```
2. Wait for pod to be ready:
```bash
kubectl rollout status deployment taskplaner --timeout=60s
```
3. Verify logs appear in Loki via LogCLI or curl:
```bash
# Query recent TaskPlanner logs via Loki API
kubectl run --rm -it logtest --image=curlimages/curl --restart=Never -- \
curl -s "http://loki-stack.monitoring.svc.cluster.local:3100/loki/api/v1/query_range" \
--data-urlencode 'query={namespace="default",pod=~"taskplaner.*"}' \
--data-urlencode 'limit=5'
```
Expected: JSON response with "result" containing log entries
4. Once logs confirmed flowing via Alloy, remove Promtail:
```bash
# Find and delete Promtail release
helm list -n monitoring | grep promtail
# If promtail found:
helm uninstall loki-stack-promtail -n monitoring 2>/dev/null || \
helm uninstall promtail -n monitoring 2>/dev/null || \
kubectl delete daemonset -n monitoring -l app=promtail
```
5. Verify Promtail is gone:
```bash
kubectl get pods -n monitoring | grep -i promtail
```
Expected: No promtail pods
6. Verify logs still flowing after Promtail removal (repeat step 3)
NOTE: Promtail may be installed as part of loki-stack or separately. Check both.
1. Loki API returns TaskPlanner log entries
2. kubectl get pods -n monitoring shows NO promtail pods
3. kubectl get pods -n monitoring shows Alloy pods still running
4. Second Loki query after Promtail removal still returns logs
Logs confirmed flowing from Alloy to Loki, Promtail DaemonSet removed from cluster
- [ ] Alloy DaemonSet has 5 Running pods (one per node)
- [ ] Alloy pods show no errors in logs
- [ ] Loki API returns TaskPlanner log entries
- [ ] Promtail pods no longer exist
- [ ] Log flow continues after Promtail removal
<success_criteria>
- Alloy DaemonSet running on all 5 nodes
- Logs from TaskPlanner appear in Loki within 60 seconds of generation
- Promtail DaemonSet completely removed
- No log collection gap (Alloy verified before Promtail removal) </success_criteria>