Files
taskplaner/.planning/research/STACK-v2-cicd-observability.md
Thomas Richter 5dbabe6a2d docs: complete v2.0 CI/CD and observability research
Files:
- STACK-v2-cicd-observability.md (ArgoCD, Prometheus, Loki, Alloy)
- FEATURES.md (updated with CI/CD and observability section)
- ARCHITECTURE.md (updated with v2.0 integration architecture)
- PITFALLS-CICD-OBSERVABILITY.md (14 critical/moderate/minor pitfalls)
- SUMMARY-v2-cicd-observability.md (synthesis with roadmap implications)

Key findings:
- Stack: kube-prometheus-stack + Loki monolithic + Alloy (Promtail EOL March 2026)
- Architecture: 3-phase approach - GitOps first, observability second, CI tests last
- Critical pitfall: ArgoCD TLS redirect loop, Loki disk exhaustion, k3s metrics config

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 03:29:23 +01:00

584 lines
17 KiB
Markdown

# Technology Stack: CI/CD Testing, ArgoCD GitOps, and Observability
**Project:** TaskPlanner v2.0 Production Operations
**Researched:** 2026-02-03
**Scope:** Stack additions for existing k3s-deployed SvelteKit app
## Executive Summary
This research covers three areas: (1) adding tests to the existing Gitea Actions pipeline, (2) ArgoCD for GitOps deployment automation, and (3) Prometheus/Grafana/Loki observability. The existing setup already has ArgoCD configured; research focuses on validating that configuration and adding the observability stack.
**Key finding:** Promtail is EOL on 2026-03-02. Use Grafana Alloy instead for log collection.
---
## 1. CI/CD Testing Stack
### Recommended Stack
| Component | Version | Purpose | Rationale |
|-----------|---------|---------|-----------|
| Playwright | ^1.58.1 (existing) | E2E testing | Already configured, comprehensive browser automation |
| Vitest | ^3.0.0 | Unit/component tests | Official Svelte recommendation for Vite-based projects |
| @testing-library/svelte | ^5.0.0 | Component testing utilities | Streamlined component assertions |
| mcr.microsoft.com/playwright | v1.58.1 | CI browser execution | Pre-installed browsers, eliminates install step |
### Why This Stack
**Playwright (keep existing):** Already configured with `playwright.config.ts` and `tests/docker-deployment.spec.ts`. The existing tests cover critical paths: health endpoint, CSRF-protected form submissions, and data persistence. Extend rather than replace.
**Vitest (add):** Svelte officially recommends Vitest for unit and component testing when using Vite (which SvelteKit uses). Vitest shares Vite's config, eliminating configuration overhead. Jest muscle memory transfers directly.
**NOT recommended:**
- Jest: Requires separate configuration, slower than Vitest, no Vite integration
- Cypress: Overlaps with Playwright; adding both creates maintenance burden
- @vitest/browser with Playwright: Adds complexity; save for later if jsdom proves insufficient
### Gitea Actions Workflow Updates
The existing workflow at `.gitea/workflows/build.yaml` needs a test stage. Gitea Actions uses GitHub Actions syntax.
**Recommended workflow structure:**
```yaml
name: Build and Push
on:
push:
branches: [master, main]
pull_request:
branches: [master, main]
env:
REGISTRY: git.kube2.tricnet.de
IMAGE_NAME: tho/taskplaner
jobs:
test:
runs-on: ubuntu-latest
container:
image: mcr.microsoft.com/playwright:v1.58.1-noble
steps:
- uses: actions/checkout@v4
- name: Install dependencies
run: npm ci
- name: Run type check
run: npm run check
- name: Run unit tests
run: npm run test:unit
- name: Run E2E tests
run: npm run test:e2e
env:
CI: true
build:
needs: test
runs-on: ubuntu-latest
if: github.event_name != 'pull_request'
steps:
# ... existing build steps ...
```
**Key decisions:**
- Use Playwright Docker image to avoid browser installation (saves 2-3 minutes)
- Run tests before build to fail fast
- Only build/push on push to master, not PRs
- Type checking (`svelte-check`) catches errors before runtime
### Package.json Scripts to Add
```json
{
"scripts": {
"test": "npm run test:unit && npm run test:e2e",
"test:unit": "vitest run",
"test:unit:watch": "vitest",
"test:e2e": "playwright test",
"test:e2e:docker": "BASE_URL=http://localhost:3000 playwright test tests/docker-deployment.spec.ts"
}
}
```
### Installation
```bash
# Add Vitest and testing utilities
npm install -D vitest @testing-library/svelte jsdom
```
### Vitest Configuration
Create `vitest.config.ts`:
```typescript
import { defineConfig } from 'vitest/config';
import { sveltekit } from '@sveltejs/kit/vite';
export default defineConfig({
plugins: [sveltekit()],
test: {
include: ['src/**/*.{test,spec}.{js,ts}'],
environment: 'jsdom',
globals: true,
setupFiles: ['./src/test-setup.ts']
}
});
```
### Confidence: HIGH
Sources:
- [Svelte Testing Documentation](https://svelte.dev/docs/svelte/testing) - Official recommendation for Vitest
- [Playwright CI Setup](https://playwright.dev/docs/ci-intro) - Docker image and CI best practices
- Existing `playwright.config.ts` in project
---
## 2. ArgoCD GitOps Stack
### Current State
ArgoCD is already configured in `argocd/application.yaml`. The configuration is correct and follows best practices:
```yaml
syncPolicy:
automated:
prune: true # Removes resources deleted from Git
selfHeal: true # Reverts manual changes
```
### Recommended Stack
| Component | Version | Purpose | Rationale |
|-----------|---------|---------|-----------|
| ArgoCD Helm Chart | 9.4.0 | GitOps controller | Latest stable, deploys ArgoCD v3.3.0 |
### What's Already Done (No Changes Needed)
1. **Application manifest:** `argocd/application.yaml` correctly points to `helm/taskplaner`
2. **Auto-sync enabled:** `automated.prune` and `selfHeal` are configured
3. **Git-based image tags:** Pipeline updates `values.yaml` with new image tag
4. **Namespace creation:** `CreateNamespace=true` is set
### What May Need Verification
1. **ArgoCD installation:** Verify ArgoCD is actually deployed on the k3s cluster
2. **Repository credentials:** If the Gitea repo is private, ArgoCD needs credentials
3. **Registry secret:** The `gitea-registry-secret` placeholder needs real credentials
### Installation (if ArgoCD not yet installed)
```bash
# Add ArgoCD Helm repository
helm repo add argo https://argoproj.github.io/argo-helm
helm repo update
# Install ArgoCD (minimal for single-node k3s)
helm install argocd argo/argo-cd \
--namespace argocd \
--create-namespace \
--set server.service.type=ClusterIP \
--set configs.params.server\.insecure=true # If behind Traefik TLS termination
```
### Apply Application
```bash
kubectl apply -f argocd/application.yaml
```
### NOT Recommended
- **ArgoCD Image Updater:** Overkill for single-app deployment; the current approach of updating values.yaml in Git is simpler and provides better audit trail
- **ApplicationSets:** Unnecessary for single environment
- **App of Apps pattern:** Unnecessary complexity for one application
### Confidence: HIGH
Sources:
- [ArgoCD Helm Chart on Artifact Hub](https://artifacthub.io/packages/helm/argo/argo-cd) - Version 9.4.0 confirmed
- [ArgoCD Helm GitHub Releases](https://github.com/argoproj/argo-helm/releases) - Release notes
- Existing `argocd/application.yaml` in project
---
## 3. Observability Stack
### Recommended Stack
| Component | Chart | Version | Purpose |
|-----------|-------|---------|---------|
| kube-prometheus-stack | prometheus-community/kube-prometheus-stack | 81.4.2 | Prometheus + Grafana + Alertmanager |
| Loki | grafana/loki | 6.51.0 | Log aggregation (monolithic mode) |
| Grafana Alloy | grafana/alloy | 1.5.3 | Log collection agent |
### Why This Stack
**kube-prometheus-stack (not standalone Prometheus):** Single chart deploys Prometheus, Grafana, Alertmanager, node-exporter, and kube-state-metrics. Pre-configured with Kubernetes dashboards. This is the standard approach.
**Loki (not ELK/Elasticsearch):** "Like Prometheus, but for logs." Integrates natively with Grafana. Much lower resource footprint than Elasticsearch. Uses same label-based querying as Prometheus.
**Grafana Alloy (not Promtail):** CRITICAL - Promtail reaches End-of-Life on 2026-03-02 (next month). Grafana Alloy is the official replacement. It's based on OpenTelemetry Collector and supports logs, metrics, and traces in one agent.
### NOT Recommended
- **Promtail:** EOL 2026-03-02. Do not install; use Alloy
- **loki-stack Helm chart:** Deprecated, no longer maintained
- **Elasticsearch/ELK:** Resource-heavy, complex, overkill for single-user app
- **Loki microservices mode:** Requires 3+ nodes, object storage; overkill for personal app
- **Separate Prometheus + Grafana charts:** kube-prometheus-stack bundles them correctly
### Architecture
```
+------------------+
| Grafana |
| (Dashboards/UI) |
+--------+---------+
|
+--------------------+--------------------+
| |
+--------v---------+ +----------v---------+
| Prometheus | | Loki |
| (Metrics) | | (Logs) |
+--------+---------+ +----------+---------+
| |
+--------------+---------------+ |
| | | |
+-----v-----+ +-----v-----+ +------v------+ +--------v---------+
| node- | | kube- | | TaskPlanner | | Grafana Alloy |
| exporter | | state- | | /metrics | | (Log Shipper) |
| | | metrics | | | | |
+-----------+ +-----------+ +-------------+ +------------------+
```
### Installation
```bash
# Add Helm repositories
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
# Create monitoring namespace
kubectl create namespace monitoring
# Install kube-prometheus-stack
helm install prometheus prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--values prometheus-values.yaml
# Install Loki (monolithic mode for single-node)
helm install loki grafana/loki \
--namespace monitoring \
--values loki-values.yaml
# Install Alloy for log collection
helm install alloy grafana/alloy \
--namespace monitoring \
--values alloy-values.yaml
```
### Recommended Values Files
#### prometheus-values.yaml (minimal for k3s single-node)
```yaml
# Reduce resource usage for single-node k3s
prometheus:
prometheusSpec:
retention: 15d
resources:
requests:
cpu: 200m
memory: 512Mi
limits:
cpu: 1000m
memory: 2Gi
storageSpec:
volumeClaimTemplate:
spec:
storageClassName: longhorn # Use existing Longhorn
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 20Gi
alertmanager:
alertmanagerSpec:
resources:
requests:
cpu: 50m
memory: 64Mi
limits:
cpu: 200m
memory: 256Mi
storage:
volumeClaimTemplate:
spec:
storageClassName: longhorn
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 5Gi
grafana:
persistence:
enabled: true
storageClassName: longhorn
size: 5Gi
# Grafana will be exposed via Traefik
ingress:
enabled: true
ingressClassName: traefik
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
hosts:
- grafana.kube2.tricnet.de
tls:
- secretName: grafana-tls
hosts:
- grafana.kube2.tricnet.de
# Disable components not needed for single-node
kubeControllerManager:
enabled: false # k3s bundles this differently
kubeScheduler:
enabled: false # k3s bundles this differently
kubeProxy:
enabled: false # k3s uses different proxy
```
#### loki-values.yaml (monolithic mode)
```yaml
deploymentMode: SingleBinary
loki:
auth_enabled: false
commonConfig:
replication_factor: 1
storage:
type: filesystem
schemaConfig:
configs:
- from: "2024-01-01"
store: tsdb
object_store: filesystem
schema: v13
index:
prefix: loki_index_
period: 24h
singleBinary:
replicas: 1
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 1Gi
persistence:
enabled: true
storageClass: longhorn
size: 10Gi
# Disable components not needed for monolithic
backend:
replicas: 0
read:
replicas: 0
write:
replicas: 0
# Gateway not needed for internal access
gateway:
enabled: false
```
#### alloy-values.yaml
```yaml
alloy:
configMap:
content: |-
// Discover and collect logs from all pods
discovery.kubernetes "pods" {
role = "pod"
}
discovery.relabel "pods" {
targets = discovery.kubernetes.pods.targets
rule {
source_labels = ["__meta_kubernetes_namespace"]
target_label = "namespace"
}
rule {
source_labels = ["__meta_kubernetes_pod_name"]
target_label = "pod"
}
rule {
source_labels = ["__meta_kubernetes_pod_container_name"]
target_label = "container"
}
}
loki.source.kubernetes "pods" {
targets = discovery.relabel.pods.output
forward_to = [loki.write.local.receiver]
}
loki.write "local" {
endpoint {
url = "http://loki.monitoring.svc:3100/loki/api/v1/push"
}
}
controller:
type: daemonset
resources:
requests:
cpu: 50m
memory: 64Mi
limits:
cpu: 200m
memory: 256Mi
```
### TaskPlanner Metrics Endpoint
The app needs a `/metrics` endpoint for Prometheus to scrape. SvelteKit options:
1. **prom-client library** (recommended): Standard Prometheus client for Node.js
2. **Custom endpoint**: Simple counter/gauge implementation
Add to `package.json`:
```bash
npm install prom-client
```
Add ServiceMonitor for Prometheus to scrape TaskPlanner:
```yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: taskplaner
namespace: monitoring
labels:
release: prometheus # Must match Prometheus selector
spec:
selector:
matchLabels:
app.kubernetes.io/name: taskplaner
namespaceSelector:
matchNames:
- default
endpoints:
- port: http
path: /metrics
interval: 30s
```
### Resource Summary
Total additional resource requirements for observability:
| Component | CPU Request | Memory Request | Storage |
|-----------|-------------|----------------|---------|
| Prometheus | 200m | 512Mi | 20Gi |
| Alertmanager | 50m | 64Mi | 5Gi |
| Grafana | 100m | 128Mi | 5Gi |
| Loki | 100m | 256Mi | 10Gi |
| Alloy (per node) | 50m | 64Mi | - |
| **Total** | ~500m | ~1Gi | 40Gi |
This fits comfortably on a single k3s node with 4+ cores and 8GB+ RAM.
### Confidence: HIGH
Sources:
- [kube-prometheus-stack on Artifact Hub](https://artifacthub.io/packages/helm/prometheus-community/kube-prometheus-stack) - Version 81.4.2
- [Grafana Loki Helm Installation](https://grafana.com/docs/loki/latest/setup/install/helm/) - Monolithic mode guidance
- [Grafana Alloy Kubernetes Deployment](https://grafana.com/docs/alloy/latest/set-up/install/kubernetes/) - Alloy setup
- [Promtail Deprecation Notice](https://grafana.com/docs/loki/latest/send-data/promtail/installation/) - EOL 2026-03-02
- [Migrate from Promtail to Alloy](https://grafana.com/docs/alloy/latest/set-up/migrate/from-promtail/) - Migration guide
---
## Summary: What to Install
### Immediate Actions
| Category | Add | Version | Notes |
|----------|-----|---------|-------|
| Testing | vitest | ^3.0.0 | Unit tests |
| Testing | @testing-library/svelte | ^5.0.0 | Component testing |
| Metrics | prom-client | ^15.0.0 | Prometheus metrics from app |
### Helm Charts to Deploy
| Chart | Repository | Version | Namespace |
|-------|------------|---------|-----------|
| kube-prometheus-stack | prometheus-community | 81.4.2 | monitoring |
| loki | grafana | 6.51.0 | monitoring |
| alloy | grafana | 1.5.3 | monitoring |
### Already Configured (Verify, Don't Re-install)
| Component | Status | Action |
|-----------|--------|--------|
| ArgoCD Application | Configured in `argocd/application.yaml` | Verify ArgoCD is running |
| Playwright | Configured in `playwright.config.ts` | Keep, extend tests |
### Do NOT Install
| Component | Reason |
|-----------|--------|
| Promtail | EOL 2026-03-02, use Alloy instead |
| loki-stack chart | Deprecated, unmaintained |
| Elasticsearch/ELK | Overkill, resource-heavy |
| Jest | Vitest is better for Vite projects |
| ArgoCD Image Updater | Current Git-based approach is simpler |
---
## Helm Repository Commands
```bash
# Add all needed repositories
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo add grafana https://grafana.github.io/helm-charts
helm repo add argo https://argoproj.github.io/argo-helm
helm repo update
# Verify
helm search repo prometheus-community/kube-prometheus-stack
helm search repo grafana/loki
helm search repo grafana/alloy
helm search repo argo/argo-cd
```
---
## Sources
### Official Documentation
- [Svelte Testing](https://svelte.dev/docs/svelte/testing)
- [Playwright CI Setup](https://playwright.dev/docs/ci-intro)
- [ArgoCD Helm Chart](https://artifacthub.io/packages/helm/argo/argo-cd)
- [kube-prometheus-stack](https://artifacthub.io/packages/helm/prometheus-community/kube-prometheus-stack)
- [Grafana Loki Helm](https://grafana.com/docs/loki/latest/setup/install/helm/)
- [Grafana Alloy](https://grafana.com/docs/alloy/latest/set-up/install/kubernetes/)
### Critical Updates
- [Promtail EOL Notice](https://grafana.com/docs/loki/latest/send-data/promtail/installation/) - EOL 2026-03-02
- [Promtail to Alloy Migration](https://grafana.com/docs/alloy/latest/set-up/migrate/from-promtail/)