Phase 08: Observability Stack - 3 plans in 2 waves - Wave 1: 08-01 (metrics), 08-02 (Alloy) - parallel - Wave 2: 08-03 (verification) - depends on both - Ready for execution Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
5.7 KiB
5.7 KiB
phase, plan, type, wave, depends_on, files_modified, autonomous, must_haves
| phase | plan | type | wave | depends_on | files_modified | autonomous | must_haves | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 08-observability-stack | 01 | execute | 1 |
|
true |
|
Purpose: Enable Prometheus to collect application metrics from TaskPlanner (OBS-08, OBS-01) Output: /metrics endpoint returning prom-client default metrics, ServiceMonitor in Helm chart
<execution_context> @/home/tho/.claude/get-shit-done/workflows/execute-plan.md @/home/tho/.claude/get-shit-done/templates/summary.md </execution_context>
@.planning/PROJECT.md @.planning/ROADMAP.md @.planning/STATE.md @.planning/phases/08-observability-stack/CONTEXT.md @package.json @src/routes/health/+server.ts @helm/taskplaner/values.yaml @helm/taskplaner/templates/service.yaml Task 1: Add prom-client and create /metrics endpoint package.json src/lib/server/metrics.ts src/routes/metrics/+server.ts 1. Install prom-client: ```bash npm install prom-client ```2. Create src/lib/server/metrics.ts:
- Import prom-client's Registry, collectDefaultMetrics
- Create a new Registry instance
- Call collectDefaultMetrics({ register: registry }) to collect Node.js process metrics
- Export the registry
- Keep it minimal - just default metrics (memory, CPU, event loop lag)
3. Create src/routes/metrics/+server.ts:
- Import the registry from $lib/server/metrics
- Create GET handler that returns registry.metrics() with Content-Type: text/plain; version=0.0.4
- Handle errors gracefully (return 500 on failure)
- Pattern follows existing /health endpoint structure
NOTE: prom-client is the standard Node.js Prometheus client. Use default metrics only - no custom metrics needed for this phase.
1. npm run build completes without errors
2. npm run dev, then curl http://localhost:5173/metrics returns text starting with "# HELP" or "# TYPE"
3. Response Content-Type header includes "text/plain"
/metrics endpoint returns Prometheus-format metrics including process_cpu_seconds_total, nodejs_heap_size_total_bytes
Task 2: Add ServiceMonitor to Helm chart
helm/taskplaner/templates/servicemonitor.yaml
helm/taskplaner/values.yaml
1. Create helm/taskplaner/templates/servicemonitor.yaml:
```yaml
{{- if .Values.metrics.enabled }}
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: {{ include "taskplaner.fullname" . }}
labels:
{{- include "taskplaner.labels" . | nindent 4 }}
spec:
selector:
matchLabels:
{{- include "taskplaner.selectorLabels" . | nindent 6 }}
endpoints:
- port: http
path: /metrics
interval: {{ .Values.metrics.interval | default "30s" }}
namespaceSelector:
matchNames:
- {{ .Release.Namespace }}
{{- end }}
```
2. Update helm/taskplaner/values.yaml - add metrics section:
```yaml
# Prometheus metrics
metrics:
enabled: true
interval: 30s
```
3. Ensure the service template exposes port named "http" (check existing service.yaml - it likely already does via targetPort: http)
NOTE: The ServiceMonitor uses monitoring.coreos.com/v1 API which kube-prometheus-stack provides. The namespaceSelector ensures Prometheus finds TaskPlanner in the default namespace.
1. helm template ./helm/taskplaner includes ServiceMonitor resource
2. helm template output shows selector matching app.kubernetes.io/name: taskplaner
3. No helm lint errors
ServiceMonitor template renders correctly with selector matching TaskPlanner service, ready for Prometheus to discover
- [ ] npm run build succeeds
- [ ] curl localhost:5173/metrics returns Prometheus-format text
- [ ] helm template ./helm/taskplaner shows ServiceMonitor resource
- [ ] ServiceMonitor selector matches service labels
<success_criteria>
- /metrics endpoint returns Prometheus-format metrics (process metrics, heap size, event loop)
- ServiceMonitor added to Helm chart templates
- ServiceMonitor enabled by default in values.yaml
- Build and type check pass </success_criteria>