Files

Thomas Richter 5dbabe6a2d docs: complete v2.0 CI/CD and observability research

Files:
- STACK-v2-cicd-observability.md (ArgoCD, Prometheus, Loki, Alloy)
- FEATURES.md (updated with CI/CD and observability section)
- ARCHITECTURE.md (updated with v2.0 integration architecture)
- PITFALLS-CICD-OBSERVABILITY.md (14 critical/moderate/minor pitfalls)
- SUMMARY-v2-cicd-observability.md (synthesis with roadmap implications)

Key findings:
- Stack: kube-prometheus-stack + Loki monolithic + Alloy (Promtail EOL March 2026)
- Architecture: 3-phase approach - GitOps first, observability second, CI tests last
- Critical pitfall: ArgoCD TLS redirect loop, Loki disk exhaustion, k3s metrics config

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-02-03 03:29:23 +01:00

40 KiB

Raw Blame History

Architecture Research

Domain: Personal task/notes web application with image attachments Researched: 2026-01-29 Confidence: HIGH

Standard Architecture

System Overview

+------------------------------------------------------------------+
|                        CLIENT LAYER                               |
|  +-------------------+  +-------------------+  +----------------+ |
|  |  Desktop Browser  |  |  Mobile Browser   |  |  PWA (future)  | |
|  +--------+----------+  +--------+----------+  +-------+--------+ |
|           |                      |                     |          |
+-----------+----------------------+---------------------+----------+
            |                      |                     |
            v                      v                     v
+------------------------------------------------------------------+
|                       PRESENTATION LAYER                          |
|  +------------------------------------------------------------+  |
|  |                    Web Frontend (SPA)                       |  |
|  |  +--------+  +--------+  +--------+  +--------+  +--------+ |  |
|  |  | Notes  |  | Tasks  |  | Search |  | Tags   |  | Upload | |  |
|  |  | View   |  | View   |  | View   |  | View   |  | View   | |  |
|  |  +--------+  +--------+  +--------+  +--------+  +--------+ |  |
|  +------------------------------+-----------------------------+   |
+---------------------------------|--------------------------------+
                                  | HTTP/REST
                                  v
+------------------------------------------------------------------+
|                       APPLICATION LAYER                           |
|  +------------------------------------------------------------+  |
|  |                    REST API (Monolith)                      |  |
|  |  +------------+  +------------+  +------------+             |  |
|  |  | Notes API  |  | Tasks API  |  | Search API |             |  |
|  |  +------------+  +------------+  +------------+             |  |
|  |  +------------+  +------------+  +------------+             |  |
|  |  | Tags API   |  | Upload API |  | Auth API   |             |  |
|  |  +------------+  +------------+  +------------+             |  |
|  +------------------------------+-----------------------------+   |
+---------------------------------|--------------------------------+
                                  |
            +---------------------+---------------------+
            |                     |                     |
            v                     v                     v
+------------------------------------------------------------------+
|                        DATA LAYER                                 |
|  +----------------+  +----------------+  +------------------+     |
|  |    SQLite      |  |  File Storage  |  |   FTS5 Index     |     |
|  |   (primary)    |  |   (images)     |  |  (full-text)     |     |
|  +----------------+  +----------------+  +------------------+     |
+------------------------------------------------------------------+

Component Responsibilities

Component	Responsibility	Typical Implementation
Web Frontend	UI rendering, user interaction, client-side state	React/Vue/Svelte SPA
REST API	Business logic, validation, orchestration	Node.js/Go/Python monolith
Notes API	CRUD operations for thoughts/notes	API route handler
Tasks API	CRUD for tasks, status transitions	API route handler
Search API	Full-text search across notes/tasks	Wraps FTS5 queries
Tags API	Tag management, note-tag associations	API route handler
Upload API	Image upload, validation, storage	Handles multipart forms
Auth API	Session management (single user)	Simple token/session
SQLite	Primary data persistence	Single file database
File Storage	Binary file storage (images)	Docker volume mount
FTS5 Index	Full-text search capabilities	SQLite virtual table

Recommended Project Structure

project/
+-- docker/
|   +-- Dockerfile           # Multi-stage build for frontend + backend
|   +-- docker-compose.yml   # Service orchestration
|   +-- nginx.conf           # Reverse proxy config (optional)
+-- backend/
|   +-- cmd/
|   |   +-- server/
|   |       +-- main.go      # Entry point
|   +-- internal/
|   |   +-- api/             # HTTP handlers
|   |   |   +-- notes.go
|   |   |   +-- tasks.go
|   |   |   +-- tags.go
|   |   |   +-- search.go
|   |   |   +-- upload.go
|   |   +-- models/          # Domain entities
|   |   |   +-- note.go
|   |   |   +-- task.go
|   |   |   +-- tag.go
|   |   |   +-- attachment.go
|   |   +-- repository/      # Data access
|   |   |   +-- sqlite.go
|   |   |   +-- notes_repo.go
|   |   |   +-- tasks_repo.go
|   |   +-- service/         # Business logic
|   |   |   +-- notes_svc.go
|   |   |   +-- search_svc.go
|   |   +-- storage/         # File storage abstraction
|   |       +-- local.go
|   +-- migrations/          # Database migrations
|   +-- go.mod
+-- frontend/
|   +-- src/
|   |   +-- components/      # Reusable UI components
|   |   +-- pages/           # Route-level views
|   |   +-- stores/          # Client state management
|   |   +-- api/             # Backend API client
|   |   +-- utils/           # Helpers
|   +-- public/
|   +-- package.json
+-- data/                    # Mounted volume (gitignored)
|   +-- app.db               # SQLite database
|   +-- uploads/             # Image storage
+-- .planning/               # Project planning docs

Structure Rationale

Monorepo with backend/frontend split: Keeps deployment simple (single container possible) while maintaining clear separation
internal/ in Go: Prevents external packages from importing internals; enforces encapsulation
Repository pattern: Abstracts SQLite access, enables future database swap if needed
Service layer: Business logic separated from HTTP handlers for testability
data/ volume: Single mount point for all persistent data (database + files)

Architectural Patterns

Pattern 1: Modular Monolith

What: Single deployable unit with clear internal module boundaries. Each domain (notes, tasks, tags, search) has its own package but shares the same database and process.

When to use: Single-user or small-team applications where operational simplicity matters more than independent scaling.

Trade-offs:

Pro: Simple deployment, easy debugging, no network overhead between modules
Pro: Single database transaction across domains when needed
Con: All modules must use same language/runtime
Con: Cannot scale modules independently (not needed for single user)

Example:

// internal/api/routes.go
func SetupRoutes(r *mux.Router, services *Services) {
    // Each domain gets its own route group
    notes := r.PathPrefix("/api/notes").Subrouter()
    notes.HandleFunc("", services.Notes.List).Methods("GET")
    notes.HandleFunc("", services.Notes.Create).Methods("POST")

    tasks := r.PathPrefix("/api/tasks").Subrouter()
    tasks.HandleFunc("", services.Tasks.List).Methods("GET")
    // Clear boundaries, but same process
}

Pattern 2: Repository Pattern for Data Access

What: Abstract data access behind interfaces. Repositories handle all database queries; services call repositories, not raw SQL.

When to use: Always for anything beyond trivial apps. Enables testing with mocks and future database changes.

Trade-offs:

Pro: Testable services (mock repositories)
Pro: Database-agnostic business logic
Pro: Query logic centralized
Con: Additional abstraction layer
Con: Can become overly complex if over-engineered

Example:

// internal/repository/notes_repo.go
type NotesRepository interface {
    Create(ctx context.Context, note *models.Note) error
    GetByID(ctx context.Context, id string) (*models.Note, error)
    List(ctx context.Context, opts ListOptions) ([]*models.Note, error)
    Search(ctx context.Context, query string) ([]*models.Note, error)
}

type sqliteNotesRepo struct {
    db *sql.DB
}

func (r *sqliteNotesRepo) Search(ctx context.Context, query string) ([]*models.Note, error) {
    // FTS5 search query
    rows, err := r.db.QueryContext(ctx, `
        SELECT n.id, n.title, n.body, n.created_at
        FROM notes n
        JOIN notes_fts ON notes_fts.rowid = n.id
        WHERE notes_fts MATCH ?
        ORDER BY rank
    `, query)
    // ...
}

Pattern 3: Content-Addressable Image Storage

What: Store images using content hash (MD5/SHA256) as filename. Prevents duplicates and enables cache-forever headers.

When to use: Any app storing user-uploaded images where deduplication and caching matter.

Trade-offs:

Pro: Automatic deduplication
Pro: Cache-forever possible (hash changes if content changes)
Pro: Simple to verify integrity
Con: Need reference counting for deletion
Con: Slightly more complex upload logic

Example:

// internal/storage/local.go
func (s *LocalStorage) Store(ctx context.Context, file io.Reader) (string, error) {
    // Hash while copying to temp file
    hasher := sha256.New()
    tmp, _ := os.CreateTemp(s.uploadDir, "upload-*")
    defer tmp.Close()

    _, err := io.Copy(io.MultiWriter(tmp, hasher), file)
    if err != nil {
        return "", err
    }

    hash := hex.EncodeToString(hasher.Sum(nil))
    finalPath := filepath.Join(s.uploadDir, hash[:2], hash)

    // Move to final location (subdirs by first 2 chars prevent too many files in one dir)
    os.MkdirAll(filepath.Dir(finalPath), 0755)
    os.Rename(tmp.Name(), finalPath)

    return hash, nil
}

Data Flow

Request Flow

[User Action: Create Note]
    |
    v
[Frontend Component] --HTTP POST /api/notes--> [Notes Handler]
    |                                               |
    | (optimistic UI update)                        v
    |                                          [Notes Service]
    |                                               |
    |                                               v
    |                                          [Notes Repository]
    |                                               |
    |                                               v
    |                                          [SQLite INSERT]
    |                                               |
    |                                               v
    |                                          [FTS5 trigger auto-updates index]
    |                                               v
    v                                               v
[UI shows new note] <--JSON response-- [Return created note]

Image Upload Flow

[User: Attach Image]
    |
    v
[Frontend: file input] --multipart POST /api/upload--> [Upload Handler]
    |                                                       |
    | (show progress)                                       v
    |                                                  [Validate: type, size]
    |                                                       |
    |                                                       v
    |                                                  [Hash content]
    |                                                       |
    |                                                       v
    |                                                  [Store to /data/uploads/{hash}]
    |                                                       |
    |                                                       v
    |                                                  [Create attachment record in DB]
    |                                                       |
    v                                                       v
[Insert image into note] <--{attachment_id, url}-- [Return attachment metadata]

Search Flow

[User: Type search query]
    |
    v
[Frontend: debounced input] --GET /api/search?q=...--> [Search Handler]
    |                                                       |
    | (show loading)                                        v
    |                                                  [Search Service]
    |                                                       |
    |                                                       v
    |                                                  [Query FTS5 virtual table]
    |                                                       |
    |                                                       v
    |                                                  [JOIN with notes/tasks tables]
    |                                                       |
    |                                                       v
    |                                                  [Apply ranking (bm25)]
    |                                                       |
    v                                                       v
[Display ranked results] <--JSON array-- [Return ranked results with snippets]

Key Data Flows

Note/Task CRUD: Frontend -> API Handler -> Service -> Repository -> SQLite. FTS5 index auto-updates via triggers.
Image Upload: Frontend -> Upload Handler -> File Storage (hash-based) -> DB record. Returns URL for embedding.
Full-Text Search: Frontend -> Search Handler -> FTS5 Query -> Ranked results with snippets.
Tag Association: Many-to-many through junction table. Tag changes trigger re-index if needed.

Scaling Considerations

Scale	Architecture Adjustments
1 user (target)	Single SQLite file, local file storage, single container. Current design is perfect.
2-10 users	Still works fine. SQLite handles concurrent reads well. May want WAL mode for better write concurrency.
10-100 users	Consider PostgreSQL for better write concurrency. Move files to S3-compatible storage (MinIO or Garage for self-hosted).
100+ users	Out of scope for personal app. Would need auth system, PostgreSQL, object storage, potentially message queue for uploads.

Scaling Priorities (For Future)

First bottleneck: SQLite write contention (if ever). Fix: WAL mode (simple) or PostgreSQL (more complex).
Second bottleneck: File storage if hosting many large images. Fix: Object storage with content-addressing.

Note: For a single-user personal app, these scaling considerations are theoretical. SQLite with WAL mode can handle thousands of notes and tasks without issue.

Anti-Patterns

Anti-Pattern 1: Storing Images in Database

What people do: Store image bytes directly in SQLite as BLOBs.

Why it's wrong:

Bloats database file significantly
Slows down backups (entire DB must be copied for any change)
Cannot leverage filesystem caching
Makes database migrations more complex

Do this instead: Store images on filesystem, store path/hash reference in database. Use content-addressable storage (hash as filename) for deduplication.

Anti-Pattern 2: No Full-Text Search Index

What people do: Use LIKE '%query%' for search.

Why it's wrong:

Full table scan for every search
Cannot rank by relevance
No word stemming or tokenization
Gets unusably slow with a few thousand notes

Do this instead: Use SQLite FTS5 from the start. It's built-in, requires no external dependencies, and handles relevance ranking.

Anti-Pattern 3: Microservices for Single User

What people do: Split notes, tasks, search, auth into separate services "for scalability."

Why it's wrong:

Massive operational overhead for no benefit
Network latency between services
Distributed transactions become complex
Debugging across services is painful
2024-2025 industry trend: many teams consolidating microservices back to monoliths

Do this instead: Build a well-structured modular monolith. Clear internal boundaries, single deployment. Extract services later only if needed (you won't need to for a personal app).

Anti-Pattern 4: Overengineering Auth for Single User

What people do: Implement full OAuth2/OIDC, JWT refresh tokens, role-based access control.

Why it's wrong:

Single user doesn't need roles
Complexity adds attack surface
More code to maintain
Personal app accessible only on your network

Do this instead: Simple session-based auth with a password. Consider basic HTTP auth behind a reverse proxy, or even IP-based allowlisting if only accessible from home network.

Integration Points

External Services

Service	Integration Pattern	Notes
None required	N/A	Self-hosted, no external dependencies by design
Optional: Reverse Proxy	HTTP	Nginx/Traefik for HTTPS termination if exposed to internet
Optional: Backup	File copy	Simple rsync/backup of data/ directory contains everything

Internal Boundaries

Boundary	Communication	Notes
Frontend <-> Backend	REST/JSON over HTTP	OpenAPI spec recommended for documentation
API Handlers <-> Services	Direct function calls	Same process, no serialization
Services <-> Repositories	Interface calls	Enables mocking in tests
Services <-> File Storage	Interface calls	Abstracts local vs future S3

Database Schema Overview

-- Core entities
CREATE TABLE notes (
    id TEXT PRIMARY KEY,
    title TEXT,
    body TEXT NOT NULL,
    type TEXT CHECK(type IN ('note', 'task')) NOT NULL,
    status TEXT CHECK(status IN ('open', 'done', 'archived')),  -- for tasks
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
    updated_at DATETIME DEFAULT CURRENT_TIMESTAMP
);

-- Full-text search (virtual table synced via triggers)
CREATE VIRTUAL TABLE notes_fts USING fts5(
    title,
    body,
    content='notes',
    content_rowid='rowid'
);

-- Tags (many-to-many)
CREATE TABLE tags (
    id TEXT PRIMARY KEY,
    name TEXT UNIQUE NOT NULL
);

CREATE TABLE note_tags (
    note_id TEXT REFERENCES notes(id) ON DELETE CASCADE,
    tag_id TEXT REFERENCES tags(id) ON DELETE CASCADE,
    PRIMARY KEY (note_id, tag_id)
);

-- Attachments (images)
CREATE TABLE attachments (
    id TEXT PRIMARY KEY,
    note_id TEXT REFERENCES notes(id) ON DELETE CASCADE,
    hash TEXT NOT NULL,           -- content hash, also filename
    filename TEXT,                -- original filename
    mime_type TEXT NOT NULL,
    size_bytes INTEGER,
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);

-- Indexes
CREATE INDEX idx_notes_type ON notes(type);
CREATE INDEX idx_notes_created ON notes(created_at DESC);
CREATE INDEX idx_attachments_note ON attachments(note_id);
CREATE INDEX idx_attachments_hash ON attachments(hash);

Build Order Implications

Based on component dependencies, suggested implementation order:

Phase 1: Data Foundation
- SQLite setup with migrations
- Basic schema (notes table)
- Repository layer for notes
- No FTS5 yet (add later)
Phase 2: Core API
- REST API handlers for notes CRUD
- Service layer
- Basic frontend with note list/create/edit
Phase 3: Tasks Differentiation
- Add type column (note vs task)
- Task-specific status handling
- Frontend task views
Phase 4: Tags
- Tags table and junction table
- Tag CRUD API
- Tag filtering in frontend
Phase 5: Image Attachments
- File storage abstraction
- Upload API with validation
- Attachment records in DB
- Frontend image upload/display
Phase 6: Search
- FTS5 virtual table and triggers
- Search API with ranking
- Search UI with highlighting
Phase 7: Containerization
- Dockerfile (multi-stage)
- docker-compose.yml
- Volume mounts for data persistence

Rationale: This order ensures each phase builds on working foundation. Notes must work before tasks (which are just notes with extra fields). Tags and attachments can be added independently. Search comes later as it indexes existing content. Containerization last so development is simpler.

Sources

Standard Notes Self-Hosting Architecture - API Gateway, Syncing Server patterns
Flatnotes - Database-less Architecture - Simple markdown file storage approach
Evernote Data Structure - Note/Resource/Attachment model
Task Manager Database Schema - Tags and task relationships
SQLite FTS5 Extension - Full-text search implementation (HIGH confidence - official docs)
Microservices vs Monoliths in 2026 - Modular monolith recommendation
MinIO Alternatives - Self-hosted storage options
Image Upload Architecture - Content-addressable storage pattern
Modern Web Application Architecture 2026 - Container and deployment patterns

Architecture research for: Personal task/notes web application Researched: 2026-01-29

v2.0 Architecture: CI/CD and Observability Integration

Domain: GitOps CI/CD and Observability Stack Researched: 2026-02-03 Confidence: HIGH (verified with official documentation)

Executive Summary

This section details how ArgoCD, Prometheus, Grafana, and Loki integrate with the existing k3s/Gitea/Traefik architecture. The integration follows established patterns for self-hosted Kubernetes observability stacks, with specific considerations for k3s's lightweight nature and Traefik as the ingress controller.

Key insight: The existing CI/CD foundation (Gitea Actions + ArgoCD Application) is already in place. This milestone adds observability and operational automation rather than building from scratch.

Current Architecture Overview

                                    Internet
                                        |
                                   [Traefik]
                                   (Ingress)
                                        |
              +-------------------------+-------------------------+
              |                         |                         |
        task.kube2          git.kube2               (future)
        .tricnet.de         .tricnet.de         argocd/grafana
              |                         |
        [TaskPlaner]              [Gitea]
         (default ns)           + Actions
              |                  Runner
              |                         |
        [Longhorn PVC]                  |
         (data store)                   |
                                        v
                            [Container Registry]
                             git.kube2.tricnet.de

Existing Components

Component	Namespace	Purpose	Status
k3s	-	Kubernetes distribution	Running
Traefik	kube-system	Ingress controller	Running
Longhorn	longhorn-system	Persistent storage	Running
cert-manager	cert-manager	TLS certificates	Running
Gitea	gitea (assumed)	Git hosting + CI	Running
TaskPlaner	default	Application	Running
ArgoCD Application	argocd	GitOps deployment	Defined (may need install)

Existing CI/CD Pipeline

From .gitea/workflows/build.yaml:

Push to master triggers Gitea Actions
Build Docker image with BuildX
Push to Gitea Container Registry
Update Helm values.yaml with new image tag
Commit with [skip ci]
ArgoCD detects change and syncs

Current gap: ArgoCD may not be installed yet (Application manifest exists but needs ArgoCD server).

Integration Architecture

Target State

                                    Internet
                                        |
                                   [Traefik]
                                   (Ingress)
                                        |
     +----------+----------+----------+----------+----------+
     |          |          |          |          |          |
   task.*    git.*     argocd.*   grafana.*   (internal)
     |          |          |          |          |
[TaskPlaner] [Gitea]   [ArgoCD]  [Grafana] [Prometheus]
     |          |          |          |      [Loki]
     |          |          |          |      [Alloy]
     |          +---webhook--->       |          |
     |                     |          |          |
     +------ metrics ------+----------+--------->+
     +------ logs ---------+---------[Alloy]---->+ (to Loki)

Namespace Strategy

Namespace	Components	Rationale
`argocd`	ArgoCD server, repo-server, application-controller	Standard convention; ClusterRoleBinding expects this
`monitoring`	Prometheus, Grafana, Alertmanager	Consolidate observability; kube-prometheus-stack default
`loki`	Loki, Alloy (DaemonSet)	Separate from metrics for resource isolation
`default`	TaskPlaner	Existing app deployment
`gitea`	Gitea + Actions Runner	Assumed existing

Alternative considered: All observability in single namespace Decision: Separate monitoring and loki because:

Different scaling characteristics (Alloy is DaemonSet, Prometheus is StatefulSet)
Easier resource quota management
Standard community practice

Component Integration Details

1. ArgoCD Integration

Installation Method: Helm chart from argo/argo-cd

Integration Points:

Integration	How	Configuration
Gitea Repository	HTTPS clone	Repository credential in argocd-secret
Gitea Webhook	POST to `/api/webhook`	Reduces sync delay from 3min to seconds
Traefik Ingress	IngressRoute or Ingress	`server.insecure=true` to avoid redirect loops
TLS	cert-manager annotation	Let's Encrypt via existing cluster-issuer

Critical Configuration:

# Helm values for ArgoCD with Traefik
configs:
  params:
    server.insecure: true  # Required: Traefik handles TLS

server:
  ingress:
    enabled: true
    ingressClassName: traefik
    annotations:
      cert-manager.io/cluster-issuer: letsencrypt-prod
    hosts:
      - argocd.kube2.tricnet.de
    tls:
      - secretName: argocd-tls
        hosts:
          - argocd.kube2.tricnet.de

Webhook Setup for Gitea:

In ArgoCD secret, set webhook.gogs.secret (Gitea uses Gogs-compatible webhooks)
In Gitea repository settings, add webhook:
- URL: https://argocd.kube2.tricnet.de/api/webhook
- Content type: application/json
- Secret: Same as configured in ArgoCD

Known Limitation: Webhooks work for Applications but not ApplicationSets with Gitea.

2. Prometheus/Grafana Integration (kube-prometheus-stack)

Installation Method: Helm chart prometheus-community/kube-prometheus-stack

Integration Points:

Integration	How	Configuration
k3s metrics	Exposed kube-* endpoints	k3s config modification required
Traefik metrics	ServiceMonitor	Traefik exposes `:9100/metrics`
TaskPlaner metrics	ServiceMonitor (future)	App must expose `/metrics` endpoint
Grafana UI	Traefik Ingress	Standard Kubernetes Ingress

Critical k3s Configuration:

k3s binds controller-manager, scheduler, and proxy to localhost by default. For Prometheus scraping, expose on 0.0.0.0.

Create/modify /etc/rancher/k3s/config.yaml:

kube-controller-manager-arg:
  - "bind-address=0.0.0.0"
kube-proxy-arg:
  - "metrics-bind-address=0.0.0.0"
kube-scheduler-arg:
  - "bind-address=0.0.0.0"

Then restart k3s: sudo systemctl restart k3s

k3s-specific Helm values:

# Disable etcd monitoring (k3s uses sqlite, not etcd)
defaultRules:
  rules:
    etcd: false

kubeEtcd:
  enabled: false

# Fix endpoint discovery for k3s
kubeControllerManager:
  enabled: true
  endpoints:
    - <k3s-server-ip>
  service:
    enabled: true
    port: 10257
    targetPort: 10257

kubeScheduler:
  enabled: true
  endpoints:
    - <k3s-server-ip>
  service:
    enabled: true
    port: 10259
    targetPort: 10259

kubeProxy:
  enabled: true
  endpoints:
    - <k3s-server-ip>
  service:
    enabled: true
    port: 10249
    targetPort: 10249

# Grafana ingress
grafana:
  ingress:
    enabled: true
    ingressClassName: traefik
    annotations:
      cert-manager.io/cluster-issuer: letsencrypt-prod
    hosts:
      - grafana.kube2.tricnet.de
    tls:
      - secretName: grafana-tls
        hosts:
          - grafana.kube2.tricnet.de

ServiceMonitor for TaskPlaner (future):

Once TaskPlaner exposes /metrics:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: taskplaner
  namespace: monitoring
  labels:
    release: prometheus  # Must match kube-prometheus-stack release
spec:
  namespaceSelector:
    matchNames:
      - default
  selector:
    matchLabels:
      app.kubernetes.io/name: taskplaner
  endpoints:
    - port: http
      path: /metrics
      interval: 30s

3. Loki + Alloy Integration (Log Aggregation)

Important: Promtail is deprecated (LTS until Feb 2026, EOL March 2026). Use Grafana Alloy instead.

Installation Method:

Loki: Helm chart grafana/loki (monolithic mode for single node)
Alloy: Helm chart grafana/alloy

Integration Points:

Integration	How	Configuration
Pod logs	Alloy DaemonSet	Mounts `/var/log/pods`
Loki storage	Longhorn PVC or MinIO	Single-binary uses filesystem
Grafana datasource	Auto-configured	kube-prometheus-stack integration
k3s node logs	Alloy journal reader	journalctl access

Deployment Mode Decision:

Mode	When to Use	Our Choice
Monolithic (single-binary)	Small deployments, <100GB/day	Yes - single node k3s
Simple Scalable	Medium deployments	No
Microservices	Large scale, HA required	No

Loki Helm values (monolithic):

deploymentMode: SingleBinary

singleBinary:
  replicas: 1
  persistence:
    enabled: true
    storageClass: longhorn
    size: 10Gi

# Disable components not needed in monolithic
read:
  replicas: 0
write:
  replicas: 0
backend:
  replicas: 0

# Use filesystem storage (not S3/MinIO for simplicity)
loki:
  storage:
    type: filesystem
  schemaConfig:
    configs:
      - from: "2024-01-01"
        store: tsdb
        object_store: filesystem
        schema: v13
        index:
          prefix: index_
          period: 24h

Alloy DaemonSet Configuration:

# alloy-values.yaml
alloy:
  configMap:
    create: true
    content: |
      // Kubernetes logs collection
      loki.source.kubernetes "pods" {
        targets    = discovery.kubernetes.pods.targets
        forward_to = [loki.write.default.receiver]
      }

      // Send to Loki
      loki.write "default" {
        endpoint {
          url = "http://loki.loki.svc.cluster.local:3100/loki/api/v1/push"
        }
      }

      // Kubernetes discovery
      discovery.kubernetes "pods" {
        role = "pod"
      }

4. Traefik Metrics Integration

Traefik already exposes Prometheus metrics. Enable scraping:

Option A: ServiceMonitor (if using kube-prometheus-stack)

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: traefik
  namespace: monitoring
  labels:
    release: prometheus
spec:
  namespaceSelector:
    matchNames:
      - kube-system
  selector:
    matchLabels:
      app.kubernetes.io/name: traefik
  endpoints:
    - port: metrics
      path: /metrics
      interval: 30s

Option B: Verify Traefik metrics are enabled

Check Traefik deployment args include:

--entrypoints.metrics.address=:8888
--metrics.prometheus=true
--metrics.prometheus.entryPoint=metrics

Data Flow Diagrams

Metrics Flow

+------------------+     +------------------+     +------------------+
|   TaskPlaner     |     |     Traefik      |     |    k3s core      |
|   /metrics       |     |   :9100/metrics  |     |  :10249,10257... |
+--------+---------+     +--------+---------+     +--------+---------+
         |                        |                        |
         +------------------------+------------------------+
                                  |
                                  v
                        +-------------------+
                        |   Prometheus      |
                        | (ServiceMonitors) |
                        +--------+----------+
                                 |
                                 v
                        +-------------------+
                        |     Grafana       |
                        |   (Dashboards)    |
                        +-------------------+

Log Flow

+------------------+     +------------------+     +------------------+
|   TaskPlaner     |     |     Traefik      |     |   Other Pods     |
|   stdout/stderr  |     |   access logs    |     |   stdout/stderr  |
+--------+---------+     +--------+---------+     +--------+---------+
         |                        |                        |
         +------------------------+------------------------+
                                  |
                            /var/log/pods
                                  |
                                  v
                        +-------------------+
                        |   Alloy DaemonSet |
                        |  (log collection) |
                        +--------+----------+
                                 |
                                 v
                        +-------------------+
                        |      Loki         |
                        |  (log storage)    |
                        +--------+----------+
                                 |
                                 v
                        +-------------------+
                        |     Grafana       |
                        |   (log queries)   |
                        +-------------------+

GitOps Flow

+------------+     +------------+     +---------------+     +------------+
| Developer  | --> |   Gitea    | --> | Gitea Actions | --> | Container  |
| git push   |     | Repository |     | (build.yaml)  |     | Registry   |
+------------+     +-----+------+     +-------+-------+     +------------+
                         |                    |
                         |              (update values.yaml)
                         |                    |
                         v                    v
                   +------------+       +------------+
                   |  Webhook   | ----> |   ArgoCD   |
                   |  (notify)  |       |   Server   |
                   +------------+       +-----+------+
                                              |
                                        (sync app)
                                              |
                                              v
                                        +------------+
                                        | Kubernetes |
                                        |  (deploy)  |
                                        +------------+

Build Order (Dependencies)

Based on component dependencies, recommended installation order:

Phase 1: ArgoCD (no dependencies on observability)

1. Install ArgoCD via Helm
   - Creates namespace: argocd
   - Verify existing Application manifest works
   - Configure Gitea webhook

Dependencies: None (Traefik already running)
Validates: GitOps pipeline end-to-end

Phase 2: kube-prometheus-stack (foundational observability)

2. Configure k3s metrics exposure
   - Modify /etc/rancher/k3s/config.yaml
   - Restart k3s

3. Install kube-prometheus-stack via Helm
   - Creates namespace: monitoring
   - Includes: Prometheus, Grafana, Alertmanager
   - Includes: Default dashboards and alerts

Dependencies: k3s metrics exposed
Validates: Basic cluster monitoring working

Phase 3: Loki + Alloy (log aggregation)

4. Install Loki via Helm (monolithic mode)
   - Creates namespace: loki
   - Configure storage with Longhorn

5. Install Alloy via Helm
   - DaemonSet in loki namespace
   - Configure Kubernetes log discovery
   - Point to Loki endpoint

6. Add Loki datasource to Grafana
   - URL: http://loki.loki.svc.cluster.local:3100

Dependencies: Grafana from step 3, storage
Validates: Logs visible in Grafana Explore

Phase 4: Application Integration

7. Add TaskPlaner metrics endpoint (if not exists)
   - Expose /metrics in app
   - Create ServiceMonitor

8. Create application dashboards in Grafana
   - TaskPlaner-specific metrics
   - Request latency, error rates

Dependencies: All previous phases
Validates: Full observability of application

Resource Requirements

Component	CPU Request	Memory Request	Storage
ArgoCD (all)	500m	512Mi	-
Prometheus	200m	512Mi	10Gi (Longhorn)
Grafana	100m	256Mi	1Gi (Longhorn)
Alertmanager	50m	64Mi	1Gi (Longhorn)
Loki	200m	256Mi	10Gi (Longhorn)
Alloy (per node)	100m	128Mi	-

Total additional: ~1.2 CPU cores, ~1.7Gi RAM, ~22Gi storage

Security Considerations

Network Policies

Consider network policies to restrict:

Prometheus scraping only from monitoring namespace
Loki ingestion only from Alloy
Grafana access only via Traefik

Secrets Management

Secret	Location	Purpose
`argocd-initial-admin-secret`	argocd ns	Initial admin password
`argocd-secret`	argocd ns	Webhook secrets, repo credentials
`grafana-admin`	monitoring ns	Grafana admin password

Ingress Authentication

For production, consider:

ArgoCD: Built-in OIDC/OAuth integration
Grafana: Built-in auth (local, LDAP, OAuth)
Prometheus: Traefik BasicAuth middleware (already pattern in use)

Anti-Patterns to Avoid

1. Skipping k3s Metrics Configuration

What happens: Prometheus installs but most dashboards show "No data" Prevention: Configure k3s to expose metrics BEFORE installing kube-prometheus-stack

2. Using Promtail Instead of Alloy

What happens: Technical debt - Promtail EOL is March 2026 Prevention: Use Alloy from the start; migration documentation exists

3. Running Loki in Microservices Mode for Small Clusters

What happens: Unnecessary complexity, resource overhead Prevention: Monolithic mode for clusters under 100GB/day log volume

4. Forgetting server.insecure for ArgoCD with Traefik

What happens: Redirect loop (ERR_TOO_MANY_REDIRECTS) Prevention: Always set configs.params.server.insecure=true when Traefik handles TLS

5. ServiceMonitor Label Mismatch

What happens: Prometheus doesn't discover custom ServiceMonitors Prevention: Ensure release: <helm-release-name> label matches kube-prometheus-stack release

Sources

ArgoCD:

Prometheus/Grafana:

Loki/Alloy:

Traefik Integration:

Traefik Metrics with Prometheus

Last updated: 2026-02-03

40 KiB Raw Blame History

Architecture Research

Standard Architecture

System Overview

Component Responsibilities

Recommended Project Structure

Structure Rationale

Architectural Patterns

Pattern 1: Modular Monolith

Pattern 2: Repository Pattern for Data Access

Pattern 3: Content-Addressable Image Storage

Data Flow

Request Flow

Image Upload Flow

Search Flow

Key Data Flows

Scaling Considerations

Scaling Priorities (For Future)

Anti-Patterns

Anti-Pattern 1: Storing Images in Database

Anti-Pattern 2: No Full-Text Search Index

Anti-Pattern 3: Microservices for Single User

Anti-Pattern 4: Overengineering Auth for Single User

Integration Points

External Services

Internal Boundaries

Database Schema Overview

Build Order Implications

Sources

v2.0 Architecture: CI/CD and Observability Integration

Executive Summary

Current Architecture Overview

Existing Components

Existing CI/CD Pipeline

Integration Architecture

Target State

Namespace Strategy

Component Integration Details

1. ArgoCD Integration

2. Prometheus/Grafana Integration (kube-prometheus-stack)

3. Loki + Alloy Integration (Log Aggregation)

4. Traefik Metrics Integration

Data Flow Diagrams

Metrics Flow

Log Flow

GitOps Flow

Build Order (Dependencies)

Phase 1: ArgoCD (no dependencies on observability)

Phase 2: kube-prometheus-stack (foundational observability)

Phase 3: Loki + Alloy (log aggregation)

Phase 4: Application Integration

Resource Requirements

Security Considerations

Network Policies

Secrets Management

Ingress Authentication

Anti-Patterns to Avoid

1. Skipping k3s Metrics Configuration

2. Using Promtail Instead of Alloy

3. Running Loki in Microservices Mode for Small Clusters

4. Forgetting server.insecure for ArgoCD with Traefik

5. ServiceMonitor Label Mismatch

Sources

40 KiB

Raw Blame History