Files
taskplaner/.planning/research/STACK-v2-cicd-observability.md
Thomas Richter 5dbabe6a2d docs: complete v2.0 CI/CD and observability research
Files:
- STACK-v2-cicd-observability.md (ArgoCD, Prometheus, Loki, Alloy)
- FEATURES.md (updated with CI/CD and observability section)
- ARCHITECTURE.md (updated with v2.0 integration architecture)
- PITFALLS-CICD-OBSERVABILITY.md (14 critical/moderate/minor pitfalls)
- SUMMARY-v2-cicd-observability.md (synthesis with roadmap implications)

Key findings:
- Stack: kube-prometheus-stack + Loki monolithic + Alloy (Promtail EOL March 2026)
- Architecture: 3-phase approach - GitOps first, observability second, CI tests last
- Critical pitfall: ArgoCD TLS redirect loop, Loki disk exhaustion, k3s metrics config

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 03:29:23 +01:00

17 KiB

Technology Stack: CI/CD Testing, ArgoCD GitOps, and Observability

Project: TaskPlanner v2.0 Production Operations Researched: 2026-02-03 Scope: Stack additions for existing k3s-deployed SvelteKit app

Executive Summary

This research covers three areas: (1) adding tests to the existing Gitea Actions pipeline, (2) ArgoCD for GitOps deployment automation, and (3) Prometheus/Grafana/Loki observability. The existing setup already has ArgoCD configured; research focuses on validating that configuration and adding the observability stack.

Key finding: Promtail is EOL on 2026-03-02. Use Grafana Alloy instead for log collection.


1. CI/CD Testing Stack

Component Version Purpose Rationale
Playwright ^1.58.1 (existing) E2E testing Already configured, comprehensive browser automation
Vitest ^3.0.0 Unit/component tests Official Svelte recommendation for Vite-based projects
@testing-library/svelte ^5.0.0 Component testing utilities Streamlined component assertions
mcr.microsoft.com/playwright v1.58.1 CI browser execution Pre-installed browsers, eliminates install step

Why This Stack

Playwright (keep existing): Already configured with playwright.config.ts and tests/docker-deployment.spec.ts. The existing tests cover critical paths: health endpoint, CSRF-protected form submissions, and data persistence. Extend rather than replace.

Vitest (add): Svelte officially recommends Vitest for unit and component testing when using Vite (which SvelteKit uses). Vitest shares Vite's config, eliminating configuration overhead. Jest muscle memory transfers directly.

NOT recommended:

  • Jest: Requires separate configuration, slower than Vitest, no Vite integration
  • Cypress: Overlaps with Playwright; adding both creates maintenance burden
  • @vitest/browser with Playwright: Adds complexity; save for later if jsdom proves insufficient

Gitea Actions Workflow Updates

The existing workflow at .gitea/workflows/build.yaml needs a test stage. Gitea Actions uses GitHub Actions syntax.

Recommended workflow structure:

name: Build and Push

on:
  push:
    branches: [master, main]
  pull_request:
    branches: [master, main]

env:
  REGISTRY: git.kube2.tricnet.de
  IMAGE_NAME: tho/taskplaner

jobs:
  test:
    runs-on: ubuntu-latest
    container:
      image: mcr.microsoft.com/playwright:v1.58.1-noble
    steps:
      - uses: actions/checkout@v4

      - name: Install dependencies
        run: npm ci

      - name: Run type check
        run: npm run check

      - name: Run unit tests
        run: npm run test:unit

      - name: Run E2E tests
        run: npm run test:e2e
        env:
          CI: true

  build:
    needs: test
    runs-on: ubuntu-latest
    if: github.event_name != 'pull_request'
    steps:
      # ... existing build steps ...

Key decisions:

  • Use Playwright Docker image to avoid browser installation (saves 2-3 minutes)
  • Run tests before build to fail fast
  • Only build/push on push to master, not PRs
  • Type checking (svelte-check) catches errors before runtime

Package.json Scripts to Add

{
  "scripts": {
    "test": "npm run test:unit && npm run test:e2e",
    "test:unit": "vitest run",
    "test:unit:watch": "vitest",
    "test:e2e": "playwright test",
    "test:e2e:docker": "BASE_URL=http://localhost:3000 playwright test tests/docker-deployment.spec.ts"
  }
}

Installation

# Add Vitest and testing utilities
npm install -D vitest @testing-library/svelte jsdom

Vitest Configuration

Create vitest.config.ts:

import { defineConfig } from 'vitest/config';
import { sveltekit } from '@sveltejs/kit/vite';

export default defineConfig({
  plugins: [sveltekit()],
  test: {
    include: ['src/**/*.{test,spec}.{js,ts}'],
    environment: 'jsdom',
    globals: true,
    setupFiles: ['./src/test-setup.ts']
  }
});

Confidence: HIGH

Sources:


2. ArgoCD GitOps Stack

Current State

ArgoCD is already configured in argocd/application.yaml. The configuration is correct and follows best practices:

syncPolicy:
  automated:
    prune: true      # Removes resources deleted from Git
    selfHeal: true   # Reverts manual changes
Component Version Purpose Rationale
ArgoCD Helm Chart 9.4.0 GitOps controller Latest stable, deploys ArgoCD v3.3.0

What's Already Done (No Changes Needed)

  1. Application manifest: argocd/application.yaml correctly points to helm/taskplaner
  2. Auto-sync enabled: automated.prune and selfHeal are configured
  3. Git-based image tags: Pipeline updates values.yaml with new image tag
  4. Namespace creation: CreateNamespace=true is set

What May Need Verification

  1. ArgoCD installation: Verify ArgoCD is actually deployed on the k3s cluster
  2. Repository credentials: If the Gitea repo is private, ArgoCD needs credentials
  3. Registry secret: The gitea-registry-secret placeholder needs real credentials

Installation (if ArgoCD not yet installed)

# Add ArgoCD Helm repository
helm repo add argo https://argoproj.github.io/argo-helm
helm repo update

# Install ArgoCD (minimal for single-node k3s)
helm install argocd argo/argo-cd \
  --namespace argocd \
  --create-namespace \
  --set server.service.type=ClusterIP \
  --set configs.params.server\.insecure=true  # If behind Traefik TLS termination

Apply Application

kubectl apply -f argocd/application.yaml
  • ArgoCD Image Updater: Overkill for single-app deployment; the current approach of updating values.yaml in Git is simpler and provides better audit trail
  • ApplicationSets: Unnecessary for single environment
  • App of Apps pattern: Unnecessary complexity for one application

Confidence: HIGH

Sources:


3. Observability Stack

Component Chart Version Purpose
kube-prometheus-stack prometheus-community/kube-prometheus-stack 81.4.2 Prometheus + Grafana + Alertmanager
Loki grafana/loki 6.51.0 Log aggregation (monolithic mode)
Grafana Alloy grafana/alloy 1.5.3 Log collection agent

Why This Stack

kube-prometheus-stack (not standalone Prometheus): Single chart deploys Prometheus, Grafana, Alertmanager, node-exporter, and kube-state-metrics. Pre-configured with Kubernetes dashboards. This is the standard approach.

Loki (not ELK/Elasticsearch): "Like Prometheus, but for logs." Integrates natively with Grafana. Much lower resource footprint than Elasticsearch. Uses same label-based querying as Prometheus.

Grafana Alloy (not Promtail): CRITICAL - Promtail reaches End-of-Life on 2026-03-02 (next month). Grafana Alloy is the official replacement. It's based on OpenTelemetry Collector and supports logs, metrics, and traces in one agent.

  • Promtail: EOL 2026-03-02. Do not install; use Alloy
  • loki-stack Helm chart: Deprecated, no longer maintained
  • Elasticsearch/ELK: Resource-heavy, complex, overkill for single-user app
  • Loki microservices mode: Requires 3+ nodes, object storage; overkill for personal app
  • Separate Prometheus + Grafana charts: kube-prometheus-stack bundles them correctly

Architecture

                                    +------------------+
                                    |     Grafana      |
                                    | (Dashboards/UI)  |
                                    +--------+---------+
                                             |
                        +--------------------+--------------------+
                        |                                         |
               +--------v---------+                    +----------v---------+
               |    Prometheus    |                    |       Loki         |
               |    (Metrics)     |                    |      (Logs)        |
               +--------+---------+                    +----------+---------+
                        |                                         |
         +--------------+---------------+                         |
         |              |               |                         |
   +-----v-----+  +-----v-----+  +------v------+         +--------v---------+
   |  node-    |  |  kube-    |  | TaskPlanner |         |   Grafana Alloy  |
   |  exporter |  |  state-   |  |   /metrics  |         |  (Log Shipper)   |
   |           |  |  metrics  |  |             |         |                  |
   +-----------+  +-----------+  +-------------+         +------------------+

Installation

# Add Helm repositories
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

# Create monitoring namespace
kubectl create namespace monitoring

# Install kube-prometheus-stack
helm install prometheus prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --values prometheus-values.yaml

# Install Loki (monolithic mode for single-node)
helm install loki grafana/loki \
  --namespace monitoring \
  --values loki-values.yaml

# Install Alloy for log collection
helm install alloy grafana/alloy \
  --namespace monitoring \
  --values alloy-values.yaml

prometheus-values.yaml (minimal for k3s single-node)

# Reduce resource usage for single-node k3s
prometheus:
  prometheusSpec:
    retention: 15d
    resources:
      requests:
        cpu: 200m
        memory: 512Mi
      limits:
        cpu: 1000m
        memory: 2Gi
    storageSpec:
      volumeClaimTemplate:
        spec:
          storageClassName: longhorn  # Use existing Longhorn
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 20Gi

alertmanager:
  alertmanagerSpec:
    resources:
      requests:
        cpu: 50m
        memory: 64Mi
      limits:
        cpu: 200m
        memory: 256Mi
    storage:
      volumeClaimTemplate:
        spec:
          storageClassName: longhorn
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 5Gi

grafana:
  persistence:
    enabled: true
    storageClassName: longhorn
    size: 5Gi
  # Grafana will be exposed via Traefik
  ingress:
    enabled: true
    ingressClassName: traefik
    annotations:
      cert-manager.io/cluster-issuer: letsencrypt-prod
    hosts:
      - grafana.kube2.tricnet.de
    tls:
      - secretName: grafana-tls
        hosts:
          - grafana.kube2.tricnet.de

# Disable components not needed for single-node
kubeControllerManager:
  enabled: false  # k3s bundles this differently
kubeScheduler:
  enabled: false  # k3s bundles this differently
kubeProxy:
  enabled: false  # k3s uses different proxy

loki-values.yaml (monolithic mode)

deploymentMode: SingleBinary

loki:
  auth_enabled: false
  commonConfig:
    replication_factor: 1
  storage:
    type: filesystem
  schemaConfig:
    configs:
      - from: "2024-01-01"
        store: tsdb
        object_store: filesystem
        schema: v13
        index:
          prefix: loki_index_
          period: 24h

singleBinary:
  replicas: 1
  resources:
    requests:
      cpu: 100m
      memory: 256Mi
    limits:
      cpu: 500m
      memory: 1Gi
  persistence:
    enabled: true
    storageClass: longhorn
    size: 10Gi

# Disable components not needed for monolithic
backend:
  replicas: 0
read:
  replicas: 0
write:
  replicas: 0

# Gateway not needed for internal access
gateway:
  enabled: false

alloy-values.yaml

alloy:
  configMap:
    content: |-
      // Discover and collect logs from all pods
      discovery.kubernetes "pods" {
        role = "pod"
      }

      discovery.relabel "pods" {
        targets = discovery.kubernetes.pods.targets

        rule {
          source_labels = ["__meta_kubernetes_namespace"]
          target_label  = "namespace"
        }
        rule {
          source_labels = ["__meta_kubernetes_pod_name"]
          target_label  = "pod"
        }
        rule {
          source_labels = ["__meta_kubernetes_pod_container_name"]
          target_label  = "container"
        }
      }

      loki.source.kubernetes "pods" {
        targets    = discovery.relabel.pods.output
        forward_to = [loki.write.local.receiver]
      }

      loki.write "local" {
        endpoint {
          url = "http://loki.monitoring.svc:3100/loki/api/v1/push"
        }
      }

controller:
  type: daemonset

resources:
  requests:
    cpu: 50m
    memory: 64Mi
  limits:
    cpu: 200m
    memory: 256Mi

TaskPlanner Metrics Endpoint

The app needs a /metrics endpoint for Prometheus to scrape. SvelteKit options:

  1. prom-client library (recommended): Standard Prometheus client for Node.js
  2. Custom endpoint: Simple counter/gauge implementation

Add to package.json:

npm install prom-client

Add ServiceMonitor for Prometheus to scrape TaskPlanner:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: taskplaner
  namespace: monitoring
  labels:
    release: prometheus  # Must match Prometheus selector
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: taskplaner
  namespaceSelector:
    matchNames:
      - default
  endpoints:
    - port: http
      path: /metrics
      interval: 30s

Resource Summary

Total additional resource requirements for observability:

Component CPU Request Memory Request Storage
Prometheus 200m 512Mi 20Gi
Alertmanager 50m 64Mi 5Gi
Grafana 100m 128Mi 5Gi
Loki 100m 256Mi 10Gi
Alloy (per node) 50m 64Mi -
Total ~500m ~1Gi 40Gi

This fits comfortably on a single k3s node with 4+ cores and 8GB+ RAM.

Confidence: HIGH

Sources:


Summary: What to Install

Immediate Actions

Category Add Version Notes
Testing vitest ^3.0.0 Unit tests
Testing @testing-library/svelte ^5.0.0 Component testing
Metrics prom-client ^15.0.0 Prometheus metrics from app

Helm Charts to Deploy

Chart Repository Version Namespace
kube-prometheus-stack prometheus-community 81.4.2 monitoring
loki grafana 6.51.0 monitoring
alloy grafana 1.5.3 monitoring

Already Configured (Verify, Don't Re-install)

Component Status Action
ArgoCD Application Configured in argocd/application.yaml Verify ArgoCD is running
Playwright Configured in playwright.config.ts Keep, extend tests

Do NOT Install

Component Reason
Promtail EOL 2026-03-02, use Alloy instead
loki-stack chart Deprecated, unmaintained
Elasticsearch/ELK Overkill, resource-heavy
Jest Vitest is better for Vite projects
ArgoCD Image Updater Current Git-based approach is simpler

Helm Repository Commands

# Add all needed repositories
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo add grafana https://grafana.github.io/helm-charts
helm repo add argo https://argoproj.github.io/argo-helm
helm repo update

# Verify
helm search repo prometheus-community/kube-prometheus-stack
helm search repo grafana/loki
helm search repo grafana/alloy
helm search repo argo/argo-cd

Sources

Official Documentation

Critical Updates