docs(08): create phase plan
Phase 08: Observability Stack - 3 plans in 2 waves - Wave 1: 08-01 (metrics), 08-02 (Alloy) - parallel - Wave 2: 08-03 (verification) - depends on both - Ready for execution Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
174
.planning/phases/08-observability-stack/08-01-PLAN.md
Normal file
174
.planning/phases/08-observability-stack/08-01-PLAN.md
Normal file
@@ -0,0 +1,174 @@
|
||||
---
|
||||
phase: 08-observability-stack
|
||||
plan: 01
|
||||
type: execute
|
||||
wave: 1
|
||||
depends_on: []
|
||||
files_modified:
|
||||
- package.json
|
||||
- src/routes/metrics/+server.ts
|
||||
- src/lib/server/metrics.ts
|
||||
- helm/taskplaner/templates/servicemonitor.yaml
|
||||
- helm/taskplaner/values.yaml
|
||||
autonomous: true
|
||||
|
||||
must_haves:
|
||||
truths:
|
||||
- "TaskPlanner /metrics endpoint returns Prometheus-format text"
|
||||
- "ServiceMonitor exists in Helm chart templates"
|
||||
- "Prometheus can discover TaskPlanner via ServiceMonitor"
|
||||
artifacts:
|
||||
- path: "src/routes/metrics/+server.ts"
|
||||
provides: "Prometheus metrics HTTP endpoint"
|
||||
exports: ["GET"]
|
||||
- path: "src/lib/server/metrics.ts"
|
||||
provides: "prom-client registry and metrics definitions"
|
||||
contains: "collectDefaultMetrics"
|
||||
- path: "helm/taskplaner/templates/servicemonitor.yaml"
|
||||
provides: "ServiceMonitor for Prometheus Operator"
|
||||
contains: "kind: ServiceMonitor"
|
||||
key_links:
|
||||
- from: "src/routes/metrics/+server.ts"
|
||||
to: "src/lib/server/metrics.ts"
|
||||
via: "import register"
|
||||
pattern: "import.*register.*from.*metrics"
|
||||
- from: "helm/taskplaner/templates/servicemonitor.yaml"
|
||||
to: "tp-app service"
|
||||
via: "selector matchLabels"
|
||||
pattern: "selector.*matchLabels"
|
||||
---
|
||||
|
||||
<objective>
|
||||
Add Prometheus metrics endpoint to TaskPlanner and ServiceMonitor for scraping
|
||||
|
||||
Purpose: Enable Prometheus to collect application metrics from TaskPlanner (OBS-08, OBS-01)
|
||||
Output: /metrics endpoint returning prom-client default metrics, ServiceMonitor in Helm chart
|
||||
</objective>
|
||||
|
||||
<execution_context>
|
||||
@/home/tho/.claude/get-shit-done/workflows/execute-plan.md
|
||||
@/home/tho/.claude/get-shit-done/templates/summary.md
|
||||
</execution_context>
|
||||
|
||||
<context>
|
||||
@.planning/PROJECT.md
|
||||
@.planning/ROADMAP.md
|
||||
@.planning/STATE.md
|
||||
@.planning/phases/08-observability-stack/CONTEXT.md
|
||||
@package.json
|
||||
@src/routes/health/+server.ts
|
||||
@helm/taskplaner/values.yaml
|
||||
@helm/taskplaner/templates/service.yaml
|
||||
</context>
|
||||
|
||||
<tasks>
|
||||
|
||||
<task type="auto">
|
||||
<name>Task 1: Add prom-client and create /metrics endpoint</name>
|
||||
<files>
|
||||
package.json
|
||||
src/lib/server/metrics.ts
|
||||
src/routes/metrics/+server.ts
|
||||
</files>
|
||||
<action>
|
||||
1. Install prom-client:
|
||||
```bash
|
||||
npm install prom-client
|
||||
```
|
||||
|
||||
2. Create src/lib/server/metrics.ts:
|
||||
- Import prom-client's Registry, collectDefaultMetrics
|
||||
- Create a new Registry instance
|
||||
- Call collectDefaultMetrics({ register: registry }) to collect Node.js process metrics
|
||||
- Export the registry
|
||||
- Keep it minimal - just default metrics (memory, CPU, event loop lag)
|
||||
|
||||
3. Create src/routes/metrics/+server.ts:
|
||||
- Import the registry from $lib/server/metrics
|
||||
- Create GET handler that returns registry.metrics() with Content-Type: text/plain; version=0.0.4
|
||||
- Handle errors gracefully (return 500 on failure)
|
||||
- Pattern follows existing /health endpoint structure
|
||||
|
||||
NOTE: prom-client is the standard Node.js Prometheus client. Use default metrics only - no custom metrics needed for this phase.
|
||||
</action>
|
||||
<verify>
|
||||
1. npm run build completes without errors
|
||||
2. npm run dev, then curl http://localhost:5173/metrics returns text starting with "# HELP" or "# TYPE"
|
||||
3. Response Content-Type header includes "text/plain"
|
||||
</verify>
|
||||
<done>
|
||||
/metrics endpoint returns Prometheus-format metrics including process_cpu_seconds_total, nodejs_heap_size_total_bytes
|
||||
</done>
|
||||
</task>
|
||||
|
||||
<task type="auto">
|
||||
<name>Task 2: Add ServiceMonitor to Helm chart</name>
|
||||
<files>
|
||||
helm/taskplaner/templates/servicemonitor.yaml
|
||||
helm/taskplaner/values.yaml
|
||||
</files>
|
||||
<action>
|
||||
1. Create helm/taskplaner/templates/servicemonitor.yaml:
|
||||
```yaml
|
||||
{{- if .Values.metrics.enabled }}
|
||||
apiVersion: monitoring.coreos.com/v1
|
||||
kind: ServiceMonitor
|
||||
metadata:
|
||||
name: {{ include "taskplaner.fullname" . }}
|
||||
labels:
|
||||
{{- include "taskplaner.labels" . | nindent 4 }}
|
||||
spec:
|
||||
selector:
|
||||
matchLabels:
|
||||
{{- include "taskplaner.selectorLabels" . | nindent 6 }}
|
||||
endpoints:
|
||||
- port: http
|
||||
path: /metrics
|
||||
interval: {{ .Values.metrics.interval | default "30s" }}
|
||||
namespaceSelector:
|
||||
matchNames:
|
||||
- {{ .Release.Namespace }}
|
||||
{{- end }}
|
||||
```
|
||||
|
||||
2. Update helm/taskplaner/values.yaml - add metrics section:
|
||||
```yaml
|
||||
# Prometheus metrics
|
||||
metrics:
|
||||
enabled: true
|
||||
interval: 30s
|
||||
```
|
||||
|
||||
3. Ensure the service template exposes port named "http" (check existing service.yaml - it likely already does via targetPort: http)
|
||||
|
||||
NOTE: The ServiceMonitor uses monitoring.coreos.com/v1 API which kube-prometheus-stack provides. The namespaceSelector ensures Prometheus finds TaskPlanner in the default namespace.
|
||||
</action>
|
||||
<verify>
|
||||
1. helm template ./helm/taskplaner includes ServiceMonitor resource
|
||||
2. helm template output shows selector matching app.kubernetes.io/name: taskplaner
|
||||
3. No helm lint errors
|
||||
</verify>
|
||||
<done>
|
||||
ServiceMonitor template renders correctly with selector matching TaskPlanner service, ready for Prometheus to discover
|
||||
</done>
|
||||
</task>
|
||||
|
||||
</tasks>
|
||||
|
||||
<verification>
|
||||
- [ ] npm run build succeeds
|
||||
- [ ] curl localhost:5173/metrics returns Prometheus-format text
|
||||
- [ ] helm template ./helm/taskplaner shows ServiceMonitor resource
|
||||
- [ ] ServiceMonitor selector matches service labels
|
||||
</verification>
|
||||
|
||||
<success_criteria>
|
||||
1. /metrics endpoint returns Prometheus-format metrics (process metrics, heap size, event loop)
|
||||
2. ServiceMonitor added to Helm chart templates
|
||||
3. ServiceMonitor enabled by default in values.yaml
|
||||
4. Build and type check pass
|
||||
</success_criteria>
|
||||
|
||||
<output>
|
||||
After completion, create `.planning/phases/08-observability-stack/08-01-SUMMARY.md`
|
||||
</output>
|
||||
Reference in New Issue
Block a user