Files
taskplaner/.planning/research/ARCHITECTURE.md
Thomas Richter 5dbabe6a2d docs: complete v2.0 CI/CD and observability research
Files:
- STACK-v2-cicd-observability.md (ArgoCD, Prometheus, Loki, Alloy)
- FEATURES.md (updated with CI/CD and observability section)
- ARCHITECTURE.md (updated with v2.0 integration architecture)
- PITFALLS-CICD-OBSERVABILITY.md (14 critical/moderate/minor pitfalls)
- SUMMARY-v2-cicd-observability.md (synthesis with roadmap implications)

Key findings:
- Stack: kube-prometheus-stack + Loki monolithic + Alloy (Promtail EOL March 2026)
- Architecture: 3-phase approach - GitOps first, observability second, CI tests last
- Critical pitfall: ArgoCD TLS redirect loop, Loki disk exhaustion, k3s metrics config

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 03:29:23 +01:00

40 KiB

Architecture Research

Domain: Personal task/notes web application with image attachments Researched: 2026-01-29 Confidence: HIGH

Standard Architecture

System Overview

+------------------------------------------------------------------+
|                        CLIENT LAYER                               |
|  +-------------------+  +-------------------+  +----------------+ |
|  |  Desktop Browser  |  |  Mobile Browser   |  |  PWA (future)  | |
|  +--------+----------+  +--------+----------+  +-------+--------+ |
|           |                      |                     |          |
+-----------+----------------------+---------------------+----------+
            |                      |                     |
            v                      v                     v
+------------------------------------------------------------------+
|                       PRESENTATION LAYER                          |
|  +------------------------------------------------------------+  |
|  |                    Web Frontend (SPA)                       |  |
|  |  +--------+  +--------+  +--------+  +--------+  +--------+ |  |
|  |  | Notes  |  | Tasks  |  | Search |  | Tags   |  | Upload | |  |
|  |  | View   |  | View   |  | View   |  | View   |  | View   | |  |
|  |  +--------+  +--------+  +--------+  +--------+  +--------+ |  |
|  +------------------------------+-----------------------------+   |
+---------------------------------|--------------------------------+
                                  | HTTP/REST
                                  v
+------------------------------------------------------------------+
|                       APPLICATION LAYER                           |
|  +------------------------------------------------------------+  |
|  |                    REST API (Monolith)                      |  |
|  |  +------------+  +------------+  +------------+             |  |
|  |  | Notes API  |  | Tasks API  |  | Search API |             |  |
|  |  +------------+  +------------+  +------------+             |  |
|  |  +------------+  +------------+  +------------+             |  |
|  |  | Tags API   |  | Upload API |  | Auth API   |             |  |
|  |  +------------+  +------------+  +------------+             |  |
|  +------------------------------+-----------------------------+   |
+---------------------------------|--------------------------------+
                                  |
            +---------------------+---------------------+
            |                     |                     |
            v                     v                     v
+------------------------------------------------------------------+
|                        DATA LAYER                                 |
|  +----------------+  +----------------+  +------------------+     |
|  |    SQLite      |  |  File Storage  |  |   FTS5 Index     |     |
|  |   (primary)    |  |   (images)     |  |  (full-text)     |     |
|  +----------------+  +----------------+  +------------------+     |
+------------------------------------------------------------------+

Component Responsibilities

Component Responsibility Typical Implementation
Web Frontend UI rendering, user interaction, client-side state React/Vue/Svelte SPA
REST API Business logic, validation, orchestration Node.js/Go/Python monolith
Notes API CRUD operations for thoughts/notes API route handler
Tasks API CRUD for tasks, status transitions API route handler
Search API Full-text search across notes/tasks Wraps FTS5 queries
Tags API Tag management, note-tag associations API route handler
Upload API Image upload, validation, storage Handles multipart forms
Auth API Session management (single user) Simple token/session
SQLite Primary data persistence Single file database
File Storage Binary file storage (images) Docker volume mount
FTS5 Index Full-text search capabilities SQLite virtual table
project/
+-- docker/
|   +-- Dockerfile           # Multi-stage build for frontend + backend
|   +-- docker-compose.yml   # Service orchestration
|   +-- nginx.conf           # Reverse proxy config (optional)
+-- backend/
|   +-- cmd/
|   |   +-- server/
|   |       +-- main.go      # Entry point
|   +-- internal/
|   |   +-- api/             # HTTP handlers
|   |   |   +-- notes.go
|   |   |   +-- tasks.go
|   |   |   +-- tags.go
|   |   |   +-- search.go
|   |   |   +-- upload.go
|   |   +-- models/          # Domain entities
|   |   |   +-- note.go
|   |   |   +-- task.go
|   |   |   +-- tag.go
|   |   |   +-- attachment.go
|   |   +-- repository/      # Data access
|   |   |   +-- sqlite.go
|   |   |   +-- notes_repo.go
|   |   |   +-- tasks_repo.go
|   |   +-- service/         # Business logic
|   |   |   +-- notes_svc.go
|   |   |   +-- search_svc.go
|   |   +-- storage/         # File storage abstraction
|   |       +-- local.go
|   +-- migrations/          # Database migrations
|   +-- go.mod
+-- frontend/
|   +-- src/
|   |   +-- components/      # Reusable UI components
|   |   +-- pages/           # Route-level views
|   |   +-- stores/          # Client state management
|   |   +-- api/             # Backend API client
|   |   +-- utils/           # Helpers
|   +-- public/
|   +-- package.json
+-- data/                    # Mounted volume (gitignored)
|   +-- app.db               # SQLite database
|   +-- uploads/             # Image storage
+-- .planning/               # Project planning docs

Structure Rationale

  • Monorepo with backend/frontend split: Keeps deployment simple (single container possible) while maintaining clear separation
  • internal/ in Go: Prevents external packages from importing internals; enforces encapsulation
  • Repository pattern: Abstracts SQLite access, enables future database swap if needed
  • Service layer: Business logic separated from HTTP handlers for testability
  • data/ volume: Single mount point for all persistent data (database + files)

Architectural Patterns

Pattern 1: Modular Monolith

What: Single deployable unit with clear internal module boundaries. Each domain (notes, tasks, tags, search) has its own package but shares the same database and process.

When to use: Single-user or small-team applications where operational simplicity matters more than independent scaling.

Trade-offs:

  • Pro: Simple deployment, easy debugging, no network overhead between modules
  • Pro: Single database transaction across domains when needed
  • Con: All modules must use same language/runtime
  • Con: Cannot scale modules independently (not needed for single user)

Example:

// internal/api/routes.go
func SetupRoutes(r *mux.Router, services *Services) {
    // Each domain gets its own route group
    notes := r.PathPrefix("/api/notes").Subrouter()
    notes.HandleFunc("", services.Notes.List).Methods("GET")
    notes.HandleFunc("", services.Notes.Create).Methods("POST")

    tasks := r.PathPrefix("/api/tasks").Subrouter()
    tasks.HandleFunc("", services.Tasks.List).Methods("GET")
    // Clear boundaries, but same process
}

Pattern 2: Repository Pattern for Data Access

What: Abstract data access behind interfaces. Repositories handle all database queries; services call repositories, not raw SQL.

When to use: Always for anything beyond trivial apps. Enables testing with mocks and future database changes.

Trade-offs:

  • Pro: Testable services (mock repositories)
  • Pro: Database-agnostic business logic
  • Pro: Query logic centralized
  • Con: Additional abstraction layer
  • Con: Can become overly complex if over-engineered

Example:

// internal/repository/notes_repo.go
type NotesRepository interface {
    Create(ctx context.Context, note *models.Note) error
    GetByID(ctx context.Context, id string) (*models.Note, error)
    List(ctx context.Context, opts ListOptions) ([]*models.Note, error)
    Search(ctx context.Context, query string) ([]*models.Note, error)
}

type sqliteNotesRepo struct {
    db *sql.DB
}

func (r *sqliteNotesRepo) Search(ctx context.Context, query string) ([]*models.Note, error) {
    // FTS5 search query
    rows, err := r.db.QueryContext(ctx, `
        SELECT n.id, n.title, n.body, n.created_at
        FROM notes n
        JOIN notes_fts ON notes_fts.rowid = n.id
        WHERE notes_fts MATCH ?
        ORDER BY rank
    `, query)
    // ...
}

Pattern 3: Content-Addressable Image Storage

What: Store images using content hash (MD5/SHA256) as filename. Prevents duplicates and enables cache-forever headers.

When to use: Any app storing user-uploaded images where deduplication and caching matter.

Trade-offs:

  • Pro: Automatic deduplication
  • Pro: Cache-forever possible (hash changes if content changes)
  • Pro: Simple to verify integrity
  • Con: Need reference counting for deletion
  • Con: Slightly more complex upload logic

Example:

// internal/storage/local.go
func (s *LocalStorage) Store(ctx context.Context, file io.Reader) (string, error) {
    // Hash while copying to temp file
    hasher := sha256.New()
    tmp, _ := os.CreateTemp(s.uploadDir, "upload-*")
    defer tmp.Close()

    _, err := io.Copy(io.MultiWriter(tmp, hasher), file)
    if err != nil {
        return "", err
    }

    hash := hex.EncodeToString(hasher.Sum(nil))
    finalPath := filepath.Join(s.uploadDir, hash[:2], hash)

    // Move to final location (subdirs by first 2 chars prevent too many files in one dir)
    os.MkdirAll(filepath.Dir(finalPath), 0755)
    os.Rename(tmp.Name(), finalPath)

    return hash, nil
}

Data Flow

Request Flow

[User Action: Create Note]
    |
    v
[Frontend Component] --HTTP POST /api/notes--> [Notes Handler]
    |                                               |
    | (optimistic UI update)                        v
    |                                          [Notes Service]
    |                                               |
    |                                               v
    |                                          [Notes Repository]
    |                                               |
    |                                               v
    |                                          [SQLite INSERT]
    |                                               |
    |                                               v
    |                                          [FTS5 trigger auto-updates index]
    |                                               v
    v                                               v
[UI shows new note] <--JSON response-- [Return created note]

Image Upload Flow

[User: Attach Image]
    |
    v
[Frontend: file input] --multipart POST /api/upload--> [Upload Handler]
    |                                                       |
    | (show progress)                                       v
    |                                                  [Validate: type, size]
    |                                                       |
    |                                                       v
    |                                                  [Hash content]
    |                                                       |
    |                                                       v
    |                                                  [Store to /data/uploads/{hash}]
    |                                                       |
    |                                                       v
    |                                                  [Create attachment record in DB]
    |                                                       |
    v                                                       v
[Insert image into note] <--{attachment_id, url}-- [Return attachment metadata]

Search Flow

[User: Type search query]
    |
    v
[Frontend: debounced input] --GET /api/search?q=...--> [Search Handler]
    |                                                       |
    | (show loading)                                        v
    |                                                  [Search Service]
    |                                                       |
    |                                                       v
    |                                                  [Query FTS5 virtual table]
    |                                                       |
    |                                                       v
    |                                                  [JOIN with notes/tasks tables]
    |                                                       |
    |                                                       v
    |                                                  [Apply ranking (bm25)]
    |                                                       |
    v                                                       v
[Display ranked results] <--JSON array-- [Return ranked results with snippets]

Key Data Flows

  1. Note/Task CRUD: Frontend -> API Handler -> Service -> Repository -> SQLite. FTS5 index auto-updates via triggers.
  2. Image Upload: Frontend -> Upload Handler -> File Storage (hash-based) -> DB record. Returns URL for embedding.
  3. Full-Text Search: Frontend -> Search Handler -> FTS5 Query -> Ranked results with snippets.
  4. Tag Association: Many-to-many through junction table. Tag changes trigger re-index if needed.

Scaling Considerations

Scale Architecture Adjustments
1 user (target) Single SQLite file, local file storage, single container. Current design is perfect.
2-10 users Still works fine. SQLite handles concurrent reads well. May want WAL mode for better write concurrency.
10-100 users Consider PostgreSQL for better write concurrency. Move files to S3-compatible storage (MinIO or Garage for self-hosted).
100+ users Out of scope for personal app. Would need auth system, PostgreSQL, object storage, potentially message queue for uploads.

Scaling Priorities (For Future)

  1. First bottleneck: SQLite write contention (if ever). Fix: WAL mode (simple) or PostgreSQL (more complex).
  2. Second bottleneck: File storage if hosting many large images. Fix: Object storage with content-addressing.

Note: For a single-user personal app, these scaling considerations are theoretical. SQLite with WAL mode can handle thousands of notes and tasks without issue.

Anti-Patterns

Anti-Pattern 1: Storing Images in Database

What people do: Store image bytes directly in SQLite as BLOBs.

Why it's wrong:

  • Bloats database file significantly
  • Slows down backups (entire DB must be copied for any change)
  • Cannot leverage filesystem caching
  • Makes database migrations more complex

Do this instead: Store images on filesystem, store path/hash reference in database. Use content-addressable storage (hash as filename) for deduplication.

Anti-Pattern 2: No Full-Text Search Index

What people do: Use LIKE '%query%' for search.

Why it's wrong:

  • Full table scan for every search
  • Cannot rank by relevance
  • No word stemming or tokenization
  • Gets unusably slow with a few thousand notes

Do this instead: Use SQLite FTS5 from the start. It's built-in, requires no external dependencies, and handles relevance ranking.

Anti-Pattern 3: Microservices for Single User

What people do: Split notes, tasks, search, auth into separate services "for scalability."

Why it's wrong:

  • Massive operational overhead for no benefit
  • Network latency between services
  • Distributed transactions become complex
  • Debugging across services is painful
  • 2024-2025 industry trend: many teams consolidating microservices back to monoliths

Do this instead: Build a well-structured modular monolith. Clear internal boundaries, single deployment. Extract services later only if needed (you won't need to for a personal app).

Anti-Pattern 4: Overengineering Auth for Single User

What people do: Implement full OAuth2/OIDC, JWT refresh tokens, role-based access control.

Why it's wrong:

  • Single user doesn't need roles
  • Complexity adds attack surface
  • More code to maintain
  • Personal app accessible only on your network

Do this instead: Simple session-based auth with a password. Consider basic HTTP auth behind a reverse proxy, or even IP-based allowlisting if only accessible from home network.

Integration Points

External Services

Service Integration Pattern Notes
None required N/A Self-hosted, no external dependencies by design
Optional: Reverse Proxy HTTP Nginx/Traefik for HTTPS termination if exposed to internet
Optional: Backup File copy Simple rsync/backup of data/ directory contains everything

Internal Boundaries

Boundary Communication Notes
Frontend <-> Backend REST/JSON over HTTP OpenAPI spec recommended for documentation
API Handlers <-> Services Direct function calls Same process, no serialization
Services <-> Repositories Interface calls Enables mocking in tests
Services <-> File Storage Interface calls Abstracts local vs future S3

Database Schema Overview

-- Core entities
CREATE TABLE notes (
    id TEXT PRIMARY KEY,
    title TEXT,
    body TEXT NOT NULL,
    type TEXT CHECK(type IN ('note', 'task')) NOT NULL,
    status TEXT CHECK(status IN ('open', 'done', 'archived')),  -- for tasks
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
    updated_at DATETIME DEFAULT CURRENT_TIMESTAMP
);

-- Full-text search (virtual table synced via triggers)
CREATE VIRTUAL TABLE notes_fts USING fts5(
    title,
    body,
    content='notes',
    content_rowid='rowid'
);

-- Tags (many-to-many)
CREATE TABLE tags (
    id TEXT PRIMARY KEY,
    name TEXT UNIQUE NOT NULL
);

CREATE TABLE note_tags (
    note_id TEXT REFERENCES notes(id) ON DELETE CASCADE,
    tag_id TEXT REFERENCES tags(id) ON DELETE CASCADE,
    PRIMARY KEY (note_id, tag_id)
);

-- Attachments (images)
CREATE TABLE attachments (
    id TEXT PRIMARY KEY,
    note_id TEXT REFERENCES notes(id) ON DELETE CASCADE,
    hash TEXT NOT NULL,           -- content hash, also filename
    filename TEXT,                -- original filename
    mime_type TEXT NOT NULL,
    size_bytes INTEGER,
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);

-- Indexes
CREATE INDEX idx_notes_type ON notes(type);
CREATE INDEX idx_notes_created ON notes(created_at DESC);
CREATE INDEX idx_attachments_note ON attachments(note_id);
CREATE INDEX idx_attachments_hash ON attachments(hash);

Build Order Implications

Based on component dependencies, suggested implementation order:

  1. Phase 1: Data Foundation

    • SQLite setup with migrations
    • Basic schema (notes table)
    • Repository layer for notes
    • No FTS5 yet (add later)
  2. Phase 2: Core API

    • REST API handlers for notes CRUD
    • Service layer
    • Basic frontend with note list/create/edit
  3. Phase 3: Tasks Differentiation

    • Add type column (note vs task)
    • Task-specific status handling
    • Frontend task views
  4. Phase 4: Tags

    • Tags table and junction table
    • Tag CRUD API
    • Tag filtering in frontend
  5. Phase 5: Image Attachments

    • File storage abstraction
    • Upload API with validation
    • Attachment records in DB
    • Frontend image upload/display
  6. Phase 6: Search

    • FTS5 virtual table and triggers
    • Search API with ranking
    • Search UI with highlighting
  7. Phase 7: Containerization

    • Dockerfile (multi-stage)
    • docker-compose.yml
    • Volume mounts for data persistence

Rationale: This order ensures each phase builds on working foundation. Notes must work before tasks (which are just notes with extra fields). Tags and attachments can be added independently. Search comes later as it indexes existing content. Containerization last so development is simpler.

Sources


Architecture research for: Personal task/notes web application Researched: 2026-01-29


v2.0 Architecture: CI/CD and Observability Integration

Domain: GitOps CI/CD and Observability Stack Researched: 2026-02-03 Confidence: HIGH (verified with official documentation)

Executive Summary

This section details how ArgoCD, Prometheus, Grafana, and Loki integrate with the existing k3s/Gitea/Traefik architecture. The integration follows established patterns for self-hosted Kubernetes observability stacks, with specific considerations for k3s's lightweight nature and Traefik as the ingress controller.

Key insight: The existing CI/CD foundation (Gitea Actions + ArgoCD Application) is already in place. This milestone adds observability and operational automation rather than building from scratch.

Current Architecture Overview

                                    Internet
                                        |
                                   [Traefik]
                                   (Ingress)
                                        |
              +-------------------------+-------------------------+
              |                         |                         |
        task.kube2          git.kube2               (future)
        .tricnet.de         .tricnet.de         argocd/grafana
              |                         |
        [TaskPlaner]              [Gitea]
         (default ns)           + Actions
              |                  Runner
              |                         |
        [Longhorn PVC]                  |
         (data store)                   |
                                        v
                            [Container Registry]
                             git.kube2.tricnet.de

Existing Components

Component Namespace Purpose Status
k3s - Kubernetes distribution Running
Traefik kube-system Ingress controller Running
Longhorn longhorn-system Persistent storage Running
cert-manager cert-manager TLS certificates Running
Gitea gitea (assumed) Git hosting + CI Running
TaskPlaner default Application Running
ArgoCD Application argocd GitOps deployment Defined (may need install)

Existing CI/CD Pipeline

From .gitea/workflows/build.yaml:

  1. Push to master triggers Gitea Actions
  2. Build Docker image with BuildX
  3. Push to Gitea Container Registry
  4. Update Helm values.yaml with new image tag
  5. Commit with [skip ci]
  6. ArgoCD detects change and syncs

Current gap: ArgoCD may not be installed yet (Application manifest exists but needs ArgoCD server).

Integration Architecture

Target State

                                    Internet
                                        |
                                   [Traefik]
                                   (Ingress)
                                        |
     +----------+----------+----------+----------+----------+
     |          |          |          |          |          |
   task.*    git.*     argocd.*   grafana.*   (internal)
     |          |          |          |          |
[TaskPlaner] [Gitea]   [ArgoCD]  [Grafana] [Prometheus]
     |          |          |          |      [Loki]
     |          |          |          |      [Alloy]
     |          +---webhook--->       |          |
     |                     |          |          |
     +------ metrics ------+----------+--------->+
     +------ logs ---------+---------[Alloy]---->+ (to Loki)

Namespace Strategy

Namespace Components Rationale
argocd ArgoCD server, repo-server, application-controller Standard convention; ClusterRoleBinding expects this
monitoring Prometheus, Grafana, Alertmanager Consolidate observability; kube-prometheus-stack default
loki Loki, Alloy (DaemonSet) Separate from metrics for resource isolation
default TaskPlaner Existing app deployment
gitea Gitea + Actions Runner Assumed existing

Alternative considered: All observability in single namespace Decision: Separate monitoring and loki because:

  • Different scaling characteristics (Alloy is DaemonSet, Prometheus is StatefulSet)
  • Easier resource quota management
  • Standard community practice

Component Integration Details

1. ArgoCD Integration

Installation Method: Helm chart from argo/argo-cd

Integration Points:

Integration How Configuration
Gitea Repository HTTPS clone Repository credential in argocd-secret
Gitea Webhook POST to /api/webhook Reduces sync delay from 3min to seconds
Traefik Ingress IngressRoute or Ingress server.insecure=true to avoid redirect loops
TLS cert-manager annotation Let's Encrypt via existing cluster-issuer

Critical Configuration:

# Helm values for ArgoCD with Traefik
configs:
  params:
    server.insecure: true  # Required: Traefik handles TLS

server:
  ingress:
    enabled: true
    ingressClassName: traefik
    annotations:
      cert-manager.io/cluster-issuer: letsencrypt-prod
    hosts:
      - argocd.kube2.tricnet.de
    tls:
      - secretName: argocd-tls
        hosts:
          - argocd.kube2.tricnet.de

Webhook Setup for Gitea:

  1. In ArgoCD secret, set webhook.gogs.secret (Gitea uses Gogs-compatible webhooks)
  2. In Gitea repository settings, add webhook:
    • URL: https://argocd.kube2.tricnet.de/api/webhook
    • Content type: application/json
    • Secret: Same as configured in ArgoCD

Known Limitation: Webhooks work for Applications but not ApplicationSets with Gitea.

2. Prometheus/Grafana Integration (kube-prometheus-stack)

Installation Method: Helm chart prometheus-community/kube-prometheus-stack

Integration Points:

Integration How Configuration
k3s metrics Exposed kube-* endpoints k3s config modification required
Traefik metrics ServiceMonitor Traefik exposes :9100/metrics
TaskPlaner metrics ServiceMonitor (future) App must expose /metrics endpoint
Grafana UI Traefik Ingress Standard Kubernetes Ingress

Critical k3s Configuration:

k3s binds controller-manager, scheduler, and proxy to localhost by default. For Prometheus scraping, expose on 0.0.0.0.

Create/modify /etc/rancher/k3s/config.yaml:

kube-controller-manager-arg:
  - "bind-address=0.0.0.0"
kube-proxy-arg:
  - "metrics-bind-address=0.0.0.0"
kube-scheduler-arg:
  - "bind-address=0.0.0.0"

Then restart k3s: sudo systemctl restart k3s

k3s-specific Helm values:

# Disable etcd monitoring (k3s uses sqlite, not etcd)
defaultRules:
  rules:
    etcd: false

kubeEtcd:
  enabled: false

# Fix endpoint discovery for k3s
kubeControllerManager:
  enabled: true
  endpoints:
    - <k3s-server-ip>
  service:
    enabled: true
    port: 10257
    targetPort: 10257

kubeScheduler:
  enabled: true
  endpoints:
    - <k3s-server-ip>
  service:
    enabled: true
    port: 10259
    targetPort: 10259

kubeProxy:
  enabled: true
  endpoints:
    - <k3s-server-ip>
  service:
    enabled: true
    port: 10249
    targetPort: 10249

# Grafana ingress
grafana:
  ingress:
    enabled: true
    ingressClassName: traefik
    annotations:
      cert-manager.io/cluster-issuer: letsencrypt-prod
    hosts:
      - grafana.kube2.tricnet.de
    tls:
      - secretName: grafana-tls
        hosts:
          - grafana.kube2.tricnet.de

ServiceMonitor for TaskPlaner (future):

Once TaskPlaner exposes /metrics:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: taskplaner
  namespace: monitoring
  labels:
    release: prometheus  # Must match kube-prometheus-stack release
spec:
  namespaceSelector:
    matchNames:
      - default
  selector:
    matchLabels:
      app.kubernetes.io/name: taskplaner
  endpoints:
    - port: http
      path: /metrics
      interval: 30s

3. Loki + Alloy Integration (Log Aggregation)

Important: Promtail is deprecated (LTS until Feb 2026, EOL March 2026). Use Grafana Alloy instead.

Installation Method:

  • Loki: Helm chart grafana/loki (monolithic mode for single node)
  • Alloy: Helm chart grafana/alloy

Integration Points:

Integration How Configuration
Pod logs Alloy DaemonSet Mounts /var/log/pods
Loki storage Longhorn PVC or MinIO Single-binary uses filesystem
Grafana datasource Auto-configured kube-prometheus-stack integration
k3s node logs Alloy journal reader journalctl access

Deployment Mode Decision:

Mode When to Use Our Choice
Monolithic (single-binary) Small deployments, <100GB/day Yes - single node k3s
Simple Scalable Medium deployments No
Microservices Large scale, HA required No

Loki Helm values (monolithic):

deploymentMode: SingleBinary

singleBinary:
  replicas: 1
  persistence:
    enabled: true
    storageClass: longhorn
    size: 10Gi

# Disable components not needed in monolithic
read:
  replicas: 0
write:
  replicas: 0
backend:
  replicas: 0

# Use filesystem storage (not S3/MinIO for simplicity)
loki:
  storage:
    type: filesystem
  schemaConfig:
    configs:
      - from: "2024-01-01"
        store: tsdb
        object_store: filesystem
        schema: v13
        index:
          prefix: index_
          period: 24h

Alloy DaemonSet Configuration:

# alloy-values.yaml
alloy:
  configMap:
    create: true
    content: |
      // Kubernetes logs collection
      loki.source.kubernetes "pods" {
        targets    = discovery.kubernetes.pods.targets
        forward_to = [loki.write.default.receiver]
      }

      // Send to Loki
      loki.write "default" {
        endpoint {
          url = "http://loki.loki.svc.cluster.local:3100/loki/api/v1/push"
        }
      }

      // Kubernetes discovery
      discovery.kubernetes "pods" {
        role = "pod"
      }

4. Traefik Metrics Integration

Traefik already exposes Prometheus metrics. Enable scraping:

Option A: ServiceMonitor (if using kube-prometheus-stack)

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: traefik
  namespace: monitoring
  labels:
    release: prometheus
spec:
  namespaceSelector:
    matchNames:
      - kube-system
  selector:
    matchLabels:
      app.kubernetes.io/name: traefik
  endpoints:
    - port: metrics
      path: /metrics
      interval: 30s

Option B: Verify Traefik metrics are enabled

Check Traefik deployment args include:

--entrypoints.metrics.address=:8888
--metrics.prometheus=true
--metrics.prometheus.entryPoint=metrics

Data Flow Diagrams

Metrics Flow

+------------------+     +------------------+     +------------------+
|   TaskPlaner     |     |     Traefik      |     |    k3s core      |
|   /metrics       |     |   :9100/metrics  |     |  :10249,10257... |
+--------+---------+     +--------+---------+     +--------+---------+
         |                        |                        |
         +------------------------+------------------------+
                                  |
                                  v
                        +-------------------+
                        |   Prometheus      |
                        | (ServiceMonitors) |
                        +--------+----------+
                                 |
                                 v
                        +-------------------+
                        |     Grafana       |
                        |   (Dashboards)    |
                        +-------------------+

Log Flow

+------------------+     +------------------+     +------------------+
|   TaskPlaner     |     |     Traefik      |     |   Other Pods     |
|   stdout/stderr  |     |   access logs    |     |   stdout/stderr  |
+--------+---------+     +--------+---------+     +--------+---------+
         |                        |                        |
         +------------------------+------------------------+
                                  |
                            /var/log/pods
                                  |
                                  v
                        +-------------------+
                        |   Alloy DaemonSet |
                        |  (log collection) |
                        +--------+----------+
                                 |
                                 v
                        +-------------------+
                        |      Loki         |
                        |  (log storage)    |
                        +--------+----------+
                                 |
                                 v
                        +-------------------+
                        |     Grafana       |
                        |   (log queries)   |
                        +-------------------+

GitOps Flow

+------------+     +------------+     +---------------+     +------------+
| Developer  | --> |   Gitea    | --> | Gitea Actions | --> | Container  |
| git push   |     | Repository |     | (build.yaml)  |     | Registry   |
+------------+     +-----+------+     +-------+-------+     +------------+
                         |                    |
                         |              (update values.yaml)
                         |                    |
                         v                    v
                   +------------+       +------------+
                   |  Webhook   | ----> |   ArgoCD   |
                   |  (notify)  |       |   Server   |
                   +------------+       +-----+------+
                                              |
                                        (sync app)
                                              |
                                              v
                                        +------------+
                                        | Kubernetes |
                                        |  (deploy)  |
                                        +------------+

Build Order (Dependencies)

Based on component dependencies, recommended installation order:

Phase 1: ArgoCD (no dependencies on observability)

1. Install ArgoCD via Helm
   - Creates namespace: argocd
   - Verify existing Application manifest works
   - Configure Gitea webhook

Dependencies: None (Traefik already running)
Validates: GitOps pipeline end-to-end

Phase 2: kube-prometheus-stack (foundational observability)

2. Configure k3s metrics exposure
   - Modify /etc/rancher/k3s/config.yaml
   - Restart k3s

3. Install kube-prometheus-stack via Helm
   - Creates namespace: monitoring
   - Includes: Prometheus, Grafana, Alertmanager
   - Includes: Default dashboards and alerts

Dependencies: k3s metrics exposed
Validates: Basic cluster monitoring working

Phase 3: Loki + Alloy (log aggregation)

4. Install Loki via Helm (monolithic mode)
   - Creates namespace: loki
   - Configure storage with Longhorn

5. Install Alloy via Helm
   - DaemonSet in loki namespace
   - Configure Kubernetes log discovery
   - Point to Loki endpoint

6. Add Loki datasource to Grafana
   - URL: http://loki.loki.svc.cluster.local:3100

Dependencies: Grafana from step 3, storage
Validates: Logs visible in Grafana Explore

Phase 4: Application Integration

7. Add TaskPlaner metrics endpoint (if not exists)
   - Expose /metrics in app
   - Create ServiceMonitor

8. Create application dashboards in Grafana
   - TaskPlaner-specific metrics
   - Request latency, error rates

Dependencies: All previous phases
Validates: Full observability of application

Resource Requirements

Component CPU Request Memory Request Storage
ArgoCD (all) 500m 512Mi -
Prometheus 200m 512Mi 10Gi (Longhorn)
Grafana 100m 256Mi 1Gi (Longhorn)
Alertmanager 50m 64Mi 1Gi (Longhorn)
Loki 200m 256Mi 10Gi (Longhorn)
Alloy (per node) 100m 128Mi -

Total additional: ~1.2 CPU cores, ~1.7Gi RAM, ~22Gi storage

Security Considerations

Network Policies

Consider network policies to restrict:

  • Prometheus scraping only from monitoring namespace
  • Loki ingestion only from Alloy
  • Grafana access only via Traefik

Secrets Management

Secret Location Purpose
argocd-initial-admin-secret argocd ns Initial admin password
argocd-secret argocd ns Webhook secrets, repo credentials
grafana-admin monitoring ns Grafana admin password

Ingress Authentication

For production, consider:

  • ArgoCD: Built-in OIDC/OAuth integration
  • Grafana: Built-in auth (local, LDAP, OAuth)
  • Prometheus: Traefik BasicAuth middleware (already pattern in use)

Anti-Patterns to Avoid

1. Skipping k3s Metrics Configuration

What happens: Prometheus installs but most dashboards show "No data" Prevention: Configure k3s to expose metrics BEFORE installing kube-prometheus-stack

2. Using Promtail Instead of Alloy

What happens: Technical debt - Promtail EOL is March 2026 Prevention: Use Alloy from the start; migration documentation exists

3. Running Loki in Microservices Mode for Small Clusters

What happens: Unnecessary complexity, resource overhead Prevention: Monolithic mode for clusters under 100GB/day log volume

4. Forgetting server.insecure for ArgoCD with Traefik

What happens: Redirect loop (ERR_TOO_MANY_REDIRECTS) Prevention: Always set configs.params.server.insecure=true when Traefik handles TLS

5. ServiceMonitor Label Mismatch

What happens: Prometheus doesn't discover custom ServiceMonitors Prevention: Ensure release: <helm-release-name> label matches kube-prometheus-stack release

Sources

ArgoCD:

Prometheus/Grafana:

Loki/Alloy:

Traefik Integration:


Last updated: 2026-02-03