--- phase: 08-observability-stack plan: 01 type: execute wave: 1 depends_on: [] files_modified: - package.json - src/routes/metrics/+server.ts - src/lib/server/metrics.ts - helm/taskplaner/templates/servicemonitor.yaml - helm/taskplaner/values.yaml autonomous: true must_haves: truths: - "TaskPlanner /metrics endpoint returns Prometheus-format text" - "ServiceMonitor exists in Helm chart templates" - "Prometheus can discover TaskPlanner via ServiceMonitor" artifacts: - path: "src/routes/metrics/+server.ts" provides: "Prometheus metrics HTTP endpoint" exports: ["GET"] - path: "src/lib/server/metrics.ts" provides: "prom-client registry and metrics definitions" contains: "collectDefaultMetrics" - path: "helm/taskplaner/templates/servicemonitor.yaml" provides: "ServiceMonitor for Prometheus Operator" contains: "kind: ServiceMonitor" key_links: - from: "src/routes/metrics/+server.ts" to: "src/lib/server/metrics.ts" via: "import register" pattern: "import.*register.*from.*metrics" - from: "helm/taskplaner/templates/servicemonitor.yaml" to: "tp-app service" via: "selector matchLabels" pattern: "selector.*matchLabels" --- Add Prometheus metrics endpoint to TaskPlanner and ServiceMonitor for scraping Purpose: Enable Prometheus to collect application metrics from TaskPlanner (OBS-08, OBS-01) Output: /metrics endpoint returning prom-client default metrics, ServiceMonitor in Helm chart @/home/tho/.claude/get-shit-done/workflows/execute-plan.md @/home/tho/.claude/get-shit-done/templates/summary.md @.planning/PROJECT.md @.planning/ROADMAP.md @.planning/STATE.md @.planning/phases/08-observability-stack/CONTEXT.md @package.json @src/routes/health/+server.ts @helm/taskplaner/values.yaml @helm/taskplaner/templates/service.yaml Task 1: Add prom-client and create /metrics endpoint package.json src/lib/server/metrics.ts src/routes/metrics/+server.ts 1. Install prom-client: ```bash npm install prom-client ``` 2. Create src/lib/server/metrics.ts: - Import prom-client's Registry, collectDefaultMetrics - Create a new Registry instance - Call collectDefaultMetrics({ register: registry }) to collect Node.js process metrics - Export the registry - Keep it minimal - just default metrics (memory, CPU, event loop lag) 3. Create src/routes/metrics/+server.ts: - Import the registry from $lib/server/metrics - Create GET handler that returns registry.metrics() with Content-Type: text/plain; version=0.0.4 - Handle errors gracefully (return 500 on failure) - Pattern follows existing /health endpoint structure NOTE: prom-client is the standard Node.js Prometheus client. Use default metrics only - no custom metrics needed for this phase. 1. npm run build completes without errors 2. npm run dev, then curl http://localhost:5173/metrics returns text starting with "# HELP" or "# TYPE" 3. Response Content-Type header includes "text/plain" /metrics endpoint returns Prometheus-format metrics including process_cpu_seconds_total, nodejs_heap_size_total_bytes Task 2: Add ServiceMonitor to Helm chart helm/taskplaner/templates/servicemonitor.yaml helm/taskplaner/values.yaml 1. Create helm/taskplaner/templates/servicemonitor.yaml: ```yaml {{- if .Values.metrics.enabled }} apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: {{ include "taskplaner.fullname" . }} labels: {{- include "taskplaner.labels" . | nindent 4 }} spec: selector: matchLabels: {{- include "taskplaner.selectorLabels" . | nindent 6 }} endpoints: - port: http path: /metrics interval: {{ .Values.metrics.interval | default "30s" }} namespaceSelector: matchNames: - {{ .Release.Namespace }} {{- end }} ``` 2. Update helm/taskplaner/values.yaml - add metrics section: ```yaml # Prometheus metrics metrics: enabled: true interval: 30s ``` 3. Ensure the service template exposes port named "http" (check existing service.yaml - it likely already does via targetPort: http) NOTE: The ServiceMonitor uses monitoring.coreos.com/v1 API which kube-prometheus-stack provides. The namespaceSelector ensures Prometheus finds TaskPlanner in the default namespace. 1. helm template ./helm/taskplaner includes ServiceMonitor resource 2. helm template output shows selector matching app.kubernetes.io/name: taskplaner 3. No helm lint errors ServiceMonitor template renders correctly with selector matching TaskPlanner service, ready for Prometheus to discover - [ ] npm run build succeeds - [ ] curl localhost:5173/metrics returns Prometheus-format text - [ ] helm template ./helm/taskplaner shows ServiceMonitor resource - [ ] ServiceMonitor selector matches service labels 1. /metrics endpoint returns Prometheus-format metrics (process metrics, heap size, event loop) 2. ServiceMonitor added to Helm chart templates 3. ServiceMonitor enabled by default in values.yaml 4. Build and type check pass After completion, create `.planning/phases/08-observability-stack/08-01-SUMMARY.md`