Files

Thomas Richter 5dbabe6a2d docs: complete v2.0 CI/CD and observability research

Files:
- STACK-v2-cicd-observability.md (ArgoCD, Prometheus, Loki, Alloy)
- FEATURES.md (updated with CI/CD and observability section)
- ARCHITECTURE.md (updated with v2.0 integration architecture)
- PITFALLS-CICD-OBSERVABILITY.md (14 critical/moderate/minor pitfalls)
- SUMMARY-v2-cicd-observability.md (synthesis with roadmap implications)

Key findings:
- Stack: kube-prometheus-stack + Loki monolithic + Alloy (Promtail EOL March 2026)
- Architecture: 3-phase approach - GitOps first, observability second, CI tests last
- Critical pitfall: ArgoCD TLS redirect loop, Loki disk exhaustion, k3s metrics config

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-02-03 03:29:23 +01:00

17 KiB

Raw Blame History

Technology Stack: CI/CD Testing, ArgoCD GitOps, and Observability

Project: TaskPlanner v2.0 Production Operations Researched: 2026-02-03 Scope: Stack additions for existing k3s-deployed SvelteKit app

Executive Summary

This research covers three areas: (1) adding tests to the existing Gitea Actions pipeline, (2) ArgoCD for GitOps deployment automation, and (3) Prometheus/Grafana/Loki observability. The existing setup already has ArgoCD configured; research focuses on validating that configuration and adding the observability stack.

Key finding: Promtail is EOL on 2026-03-02. Use Grafana Alloy instead for log collection.

1. CI/CD Testing Stack

Recommended Stack

Component	Version	Purpose	Rationale
Playwright	^1.58.1 (existing)	E2E testing	Already configured, comprehensive browser automation
Vitest	^3.0.0	Unit/component tests	Official Svelte recommendation for Vite-based projects
@testing-library/svelte	^5.0.0	Component testing utilities	Streamlined component assertions
mcr.microsoft.com/playwright	v1.58.1	CI browser execution	Pre-installed browsers, eliminates install step

Why This Stack

Playwright (keep existing): Already configured with playwright.config.ts and tests/docker-deployment.spec.ts. The existing tests cover critical paths: health endpoint, CSRF-protected form submissions, and data persistence. Extend rather than replace.

Vitest (add): Svelte officially recommends Vitest for unit and component testing when using Vite (which SvelteKit uses). Vitest shares Vite's config, eliminating configuration overhead. Jest muscle memory transfers directly.

NOT recommended:

Jest: Requires separate configuration, slower than Vitest, no Vite integration
Cypress: Overlaps with Playwright; adding both creates maintenance burden
@vitest/browser with Playwright: Adds complexity; save for later if jsdom proves insufficient

Gitea Actions Workflow Updates

The existing workflow at .gitea/workflows/build.yaml needs a test stage. Gitea Actions uses GitHub Actions syntax.

Recommended workflow structure:

name: Build and Push

on:
  push:
    branches: [master, main]
  pull_request:
    branches: [master, main]

env:
  REGISTRY: git.kube2.tricnet.de
  IMAGE_NAME: tho/taskplaner

jobs:
  test:
    runs-on: ubuntu-latest
    container:
      image: mcr.microsoft.com/playwright:v1.58.1-noble
    steps:
      - uses: actions/checkout@v4

      - name: Install dependencies
        run: npm ci

      - name: Run type check
        run: npm run check

      - name: Run unit tests
        run: npm run test:unit

      - name: Run E2E tests
        run: npm run test:e2e
        env:
          CI: true

  build:
    needs: test
    runs-on: ubuntu-latest
    if: github.event_name != 'pull_request'
    steps:
      # ... existing build steps ...

Key decisions:

Use Playwright Docker image to avoid browser installation (saves 2-3 minutes)
Run tests before build to fail fast
Only build/push on push to master, not PRs
Type checking (svelte-check) catches errors before runtime

Package.json Scripts to Add

{
  "scripts": {
    "test": "npm run test:unit && npm run test:e2e",
    "test:unit": "vitest run",
    "test:unit:watch": "vitest",
    "test:e2e": "playwright test",
    "test:e2e:docker": "BASE_URL=http://localhost:3000 playwright test tests/docker-deployment.spec.ts"
  }
}

Installation

# Add Vitest and testing utilities
npm install -D vitest @testing-library/svelte jsdom

Vitest Configuration

Create vitest.config.ts:

import { defineConfig } from 'vitest/config';
import { sveltekit } from '@sveltejs/kit/vite';

export default defineConfig({
  plugins: [sveltekit()],
  test: {
    include: ['src/**/*.{test,spec}.{js,ts}'],
    environment: 'jsdom',
    globals: true,
    setupFiles: ['./src/test-setup.ts']
  }
});

Confidence: HIGH

Sources:

Svelte Testing Documentation - Official recommendation for Vitest
Playwright CI Setup - Docker image and CI best practices
Existing playwright.config.ts in project

2. ArgoCD GitOps Stack

Current State

ArgoCD is already configured in argocd/application.yaml. The configuration is correct and follows best practices:

syncPolicy:
  automated:
    prune: true      # Removes resources deleted from Git
    selfHeal: true   # Reverts manual changes

Recommended Stack

Component	Version	Purpose	Rationale
ArgoCD Helm Chart	9.4.0	GitOps controller	Latest stable, deploys ArgoCD v3.3.0

What's Already Done (No Changes Needed)

Application manifest: argocd/application.yaml correctly points to helm/taskplaner
Auto-sync enabled: automated.prune and selfHeal are configured
Git-based image tags: Pipeline updates values.yaml with new image tag
Namespace creation: CreateNamespace=true is set

What May Need Verification

ArgoCD installation: Verify ArgoCD is actually deployed on the k3s cluster
Repository credentials: If the Gitea repo is private, ArgoCD needs credentials
Registry secret: The gitea-registry-secret placeholder needs real credentials

Installation (if ArgoCD not yet installed)

# Add ArgoCD Helm repository
helm repo add argo https://argoproj.github.io/argo-helm
helm repo update

# Install ArgoCD (minimal for single-node k3s)
helm install argocd argo/argo-cd \
  --namespace argocd \
  --create-namespace \
  --set server.service.type=ClusterIP \
  --set configs.params.server\.insecure=true  # If behind Traefik TLS termination

Apply Application

kubectl apply -f argocd/application.yaml

NOT Recommended

ArgoCD Image Updater: Overkill for single-app deployment; the current approach of updating values.yaml in Git is simpler and provides better audit trail
ApplicationSets: Unnecessary for single environment
App of Apps pattern: Unnecessary complexity for one application

Confidence: HIGH

Sources:

ArgoCD Helm Chart on Artifact Hub - Version 9.4.0 confirmed
ArgoCD Helm GitHub Releases - Release notes
Existing argocd/application.yaml in project

3. Observability Stack

Recommended Stack

Component	Chart	Version	Purpose
kube-prometheus-stack	prometheus-community/kube-prometheus-stack	81.4.2	Prometheus + Grafana + Alertmanager
Loki	grafana/loki	6.51.0	Log aggregation (monolithic mode)
Grafana Alloy	grafana/alloy	1.5.3	Log collection agent

Why This Stack

kube-prometheus-stack (not standalone Prometheus): Single chart deploys Prometheus, Grafana, Alertmanager, node-exporter, and kube-state-metrics. Pre-configured with Kubernetes dashboards. This is the standard approach.

Loki (not ELK/Elasticsearch): "Like Prometheus, but for logs." Integrates natively with Grafana. Much lower resource footprint than Elasticsearch. Uses same label-based querying as Prometheus.

Grafana Alloy (not Promtail): CRITICAL - Promtail reaches End-of-Life on 2026-03-02 (next month). Grafana Alloy is the official replacement. It's based on OpenTelemetry Collector and supports logs, metrics, and traces in one agent.

NOT Recommended

Promtail: EOL 2026-03-02. Do not install; use Alloy
loki-stack Helm chart: Deprecated, no longer maintained
Elasticsearch/ELK: Resource-heavy, complex, overkill for single-user app
Loki microservices mode: Requires 3+ nodes, object storage; overkill for personal app
Separate Prometheus + Grafana charts: kube-prometheus-stack bundles them correctly

Architecture

                                    +------------------+
                                    |     Grafana      |
                                    | (Dashboards/UI)  |
                                    +--------+---------+
                                             |
                        +--------------------+--------------------+
                        |                                         |
               +--------v---------+                    +----------v---------+
               |    Prometheus    |                    |       Loki         |
               |    (Metrics)     |                    |      (Logs)        |
               +--------+---------+                    +----------+---------+
                        |                                         |
         +--------------+---------------+                         |
         |              |               |                         |
   +-----v-----+  +-----v-----+  +------v------+         +--------v---------+
   |  node-    |  |  kube-    |  | TaskPlanner |         |   Grafana Alloy  |
   |  exporter |  |  state-   |  |   /metrics  |         |  (Log Shipper)   |
   |           |  |  metrics  |  |             |         |                  |
   +-----------+  +-----------+  +-------------+         +------------------+

Installation

# Add Helm repositories
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

# Create monitoring namespace
kubectl create namespace monitoring

# Install kube-prometheus-stack
helm install prometheus prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --values prometheus-values.yaml

# Install Loki (monolithic mode for single-node)
helm install loki grafana/loki \
  --namespace monitoring \
  --values loki-values.yaml

# Install Alloy for log collection
helm install alloy grafana/alloy \
  --namespace monitoring \
  --values alloy-values.yaml

Recommended Values Files

prometheus-values.yaml (minimal for k3s single-node)

# Reduce resource usage for single-node k3s
prometheus:
  prometheusSpec:
    retention: 15d
    resources:
      requests:
        cpu: 200m
        memory: 512Mi
      limits:
        cpu: 1000m
        memory: 2Gi
    storageSpec:
      volumeClaimTemplate:
        spec:
          storageClassName: longhorn  # Use existing Longhorn
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 20Gi

alertmanager:
  alertmanagerSpec:
    resources:
      requests:
        cpu: 50m
        memory: 64Mi
      limits:
        cpu: 200m
        memory: 256Mi
    storage:
      volumeClaimTemplate:
        spec:
          storageClassName: longhorn
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 5Gi

grafana:
  persistence:
    enabled: true
    storageClassName: longhorn
    size: 5Gi
  # Grafana will be exposed via Traefik
  ingress:
    enabled: true
    ingressClassName: traefik
    annotations:
      cert-manager.io/cluster-issuer: letsencrypt-prod
    hosts:
      - grafana.kube2.tricnet.de
    tls:
      - secretName: grafana-tls
        hosts:
          - grafana.kube2.tricnet.de

# Disable components not needed for single-node
kubeControllerManager:
  enabled: false  # k3s bundles this differently
kubeScheduler:
  enabled: false  # k3s bundles this differently
kubeProxy:
  enabled: false  # k3s uses different proxy

loki-values.yaml (monolithic mode)

deploymentMode: SingleBinary

loki:
  auth_enabled: false
  commonConfig:
    replication_factor: 1
  storage:
    type: filesystem
  schemaConfig:
    configs:
      - from: "2024-01-01"
        store: tsdb
        object_store: filesystem
        schema: v13
        index:
          prefix: loki_index_
          period: 24h

singleBinary:
  replicas: 1
  resources:
    requests:
      cpu: 100m
      memory: 256Mi
    limits:
      cpu: 500m
      memory: 1Gi
  persistence:
    enabled: true
    storageClass: longhorn
    size: 10Gi

# Disable components not needed for monolithic
backend:
  replicas: 0
read:
  replicas: 0
write:
  replicas: 0

# Gateway not needed for internal access
gateway:
  enabled: false

alloy-values.yaml

alloy:
  configMap:
    content: |-
      // Discover and collect logs from all pods
      discovery.kubernetes "pods" {
        role = "pod"
      }

      discovery.relabel "pods" {
        targets = discovery.kubernetes.pods.targets

        rule {
          source_labels = ["__meta_kubernetes_namespace"]
          target_label  = "namespace"
        }
        rule {
          source_labels = ["__meta_kubernetes_pod_name"]
          target_label  = "pod"
        }
        rule {
          source_labels = ["__meta_kubernetes_pod_container_name"]
          target_label  = "container"
        }
      }

      loki.source.kubernetes "pods" {
        targets    = discovery.relabel.pods.output
        forward_to = [loki.write.local.receiver]
      }

      loki.write "local" {
        endpoint {
          url = "http://loki.monitoring.svc:3100/loki/api/v1/push"
        }
      }

controller:
  type: daemonset

resources:
  requests:
    cpu: 50m
    memory: 64Mi
  limits:
    cpu: 200m
    memory: 256Mi

TaskPlanner Metrics Endpoint

The app needs a /metrics endpoint for Prometheus to scrape. SvelteKit options:

prom-client library (recommended): Standard Prometheus client for Node.js
Custom endpoint: Simple counter/gauge implementation

Add to package.json:

npm install prom-client

Add ServiceMonitor for Prometheus to scrape TaskPlanner:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: taskplaner
  namespace: monitoring
  labels:
    release: prometheus  # Must match Prometheus selector
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: taskplaner
  namespaceSelector:
    matchNames:
      - default
  endpoints:
    - port: http
      path: /metrics
      interval: 30s

Resource Summary

Total additional resource requirements for observability:

Component	CPU Request	Memory Request	Storage
Prometheus	200m	512Mi	20Gi
Alertmanager	50m	64Mi	5Gi
Grafana	100m	128Mi	5Gi
Loki	100m	256Mi	10Gi
Alloy (per node)	50m	64Mi	-
Total	~500m	~1Gi	40Gi

This fits comfortably on a single k3s node with 4+ cores and 8GB+ RAM.

Confidence: HIGH

Sources:

kube-prometheus-stack on Artifact Hub - Version 81.4.2
Grafana Loki Helm Installation - Monolithic mode guidance
Grafana Alloy Kubernetes Deployment - Alloy setup
Promtail Deprecation Notice - EOL 2026-03-02
Migrate from Promtail to Alloy - Migration guide

Summary: What to Install

Immediate Actions

Category	Add	Version	Notes
Testing	vitest	^3.0.0	Unit tests
Testing	@testing-library/svelte	^5.0.0	Component testing
Metrics	prom-client	^15.0.0	Prometheus metrics from app

Helm Charts to Deploy

Chart	Repository	Version	Namespace
kube-prometheus-stack	prometheus-community	81.4.2	monitoring
loki	grafana	6.51.0	monitoring
alloy	grafana	1.5.3	monitoring

Already Configured (Verify, Don't Re-install)

Component	Status	Action
ArgoCD Application	Configured in `argocd/application.yaml`	Verify ArgoCD is running
Playwright	Configured in `playwright.config.ts`	Keep, extend tests

Do NOT Install

Component	Reason
Promtail	EOL 2026-03-02, use Alloy instead
loki-stack chart	Deprecated, unmaintained
Elasticsearch/ELK	Overkill, resource-heavy
Jest	Vitest is better for Vite projects
ArgoCD Image Updater	Current Git-based approach is simpler

Helm Repository Commands

# Add all needed repositories
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo add grafana https://grafana.github.io/helm-charts
helm repo add argo https://argoproj.github.io/argo-helm
helm repo update

# Verify
helm search repo prometheus-community/kube-prometheus-stack
helm search repo grafana/loki
helm search repo grafana/alloy
helm search repo argo/argo-cd

Sources

Official Documentation

Critical Updates

Promtail EOL Notice - EOL 2026-03-02
Promtail to Alloy Migration

17 KiB Raw Blame History

Technology Stack: CI/CD Testing, ArgoCD GitOps, and Observability

Executive Summary

1. CI/CD Testing Stack

Recommended Stack

Why This Stack

Gitea Actions Workflow Updates

Package.json Scripts to Add

Installation

Vitest Configuration

Confidence: HIGH

2. ArgoCD GitOps Stack

Current State

Recommended Stack

What's Already Done (No Changes Needed)

What May Need Verification

Installation (if ArgoCD not yet installed)

Apply Application

NOT Recommended

Confidence: HIGH

3. Observability Stack

Recommended Stack

Why This Stack

NOT Recommended

Architecture

Installation

Recommended Values Files

prometheus-values.yaml (minimal for k3s single-node)

loki-values.yaml (monolithic mode)

alloy-values.yaml

TaskPlanner Metrics Endpoint

Resource Summary

Confidence: HIGH

Summary: What to Install

Immediate Actions

Helm Charts to Deploy

Already Configured (Verify, Don't Re-install)

Do NOT Install

Helm Repository Commands

Sources

Official Documentation

Critical Updates

17 KiB

Raw Blame History