Files: - STACK-v2-cicd-observability.md (ArgoCD, Prometheus, Loki, Alloy) - FEATURES.md (updated with CI/CD and observability section) - ARCHITECTURE.md (updated with v2.0 integration architecture) - PITFALLS-CICD-OBSERVABILITY.md (14 critical/moderate/minor pitfalls) - SUMMARY-v2-cicd-observability.md (synthesis with roadmap implications) Key findings: - Stack: kube-prometheus-stack + Loki monolithic + Alloy (Promtail EOL March 2026) - Architecture: 3-phase approach - GitOps first, observability second, CI tests last - Critical pitfall: ArgoCD TLS redirect loop, Loki disk exhaustion, k3s metrics config Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
40 KiB
Architecture Research
Domain: Personal task/notes web application with image attachments Researched: 2026-01-29 Confidence: HIGH
Standard Architecture
System Overview
+------------------------------------------------------------------+
| CLIENT LAYER |
| +-------------------+ +-------------------+ +----------------+ |
| | Desktop Browser | | Mobile Browser | | PWA (future) | |
| +--------+----------+ +--------+----------+ +-------+--------+ |
| | | | |
+-----------+----------------------+---------------------+----------+
| | |
v v v
+------------------------------------------------------------------+
| PRESENTATION LAYER |
| +------------------------------------------------------------+ |
| | Web Frontend (SPA) | |
| | +--------+ +--------+ +--------+ +--------+ +--------+ | |
| | | Notes | | Tasks | | Search | | Tags | | Upload | | |
| | | View | | View | | View | | View | | View | | |
| | +--------+ +--------+ +--------+ +--------+ +--------+ | |
| +------------------------------+-----------------------------+ |
+---------------------------------|--------------------------------+
| HTTP/REST
v
+------------------------------------------------------------------+
| APPLICATION LAYER |
| +------------------------------------------------------------+ |
| | REST API (Monolith) | |
| | +------------+ +------------+ +------------+ | |
| | | Notes API | | Tasks API | | Search API | | |
| | +------------+ +------------+ +------------+ | |
| | +------------+ +------------+ +------------+ | |
| | | Tags API | | Upload API | | Auth API | | |
| | +------------+ +------------+ +------------+ | |
| +------------------------------+-----------------------------+ |
+---------------------------------|--------------------------------+
|
+---------------------+---------------------+
| | |
v v v
+------------------------------------------------------------------+
| DATA LAYER |
| +----------------+ +----------------+ +------------------+ |
| | SQLite | | File Storage | | FTS5 Index | |
| | (primary) | | (images) | | (full-text) | |
| +----------------+ +----------------+ +------------------+ |
+------------------------------------------------------------------+
Component Responsibilities
| Component | Responsibility | Typical Implementation |
|---|---|---|
| Web Frontend | UI rendering, user interaction, client-side state | React/Vue/Svelte SPA |
| REST API | Business logic, validation, orchestration | Node.js/Go/Python monolith |
| Notes API | CRUD operations for thoughts/notes | API route handler |
| Tasks API | CRUD for tasks, status transitions | API route handler |
| Search API | Full-text search across notes/tasks | Wraps FTS5 queries |
| Tags API | Tag management, note-tag associations | API route handler |
| Upload API | Image upload, validation, storage | Handles multipart forms |
| Auth API | Session management (single user) | Simple token/session |
| SQLite | Primary data persistence | Single file database |
| File Storage | Binary file storage (images) | Docker volume mount |
| FTS5 Index | Full-text search capabilities | SQLite virtual table |
Recommended Project Structure
project/
+-- docker/
| +-- Dockerfile # Multi-stage build for frontend + backend
| +-- docker-compose.yml # Service orchestration
| +-- nginx.conf # Reverse proxy config (optional)
+-- backend/
| +-- cmd/
| | +-- server/
| | +-- main.go # Entry point
| +-- internal/
| | +-- api/ # HTTP handlers
| | | +-- notes.go
| | | +-- tasks.go
| | | +-- tags.go
| | | +-- search.go
| | | +-- upload.go
| | +-- models/ # Domain entities
| | | +-- note.go
| | | +-- task.go
| | | +-- tag.go
| | | +-- attachment.go
| | +-- repository/ # Data access
| | | +-- sqlite.go
| | | +-- notes_repo.go
| | | +-- tasks_repo.go
| | +-- service/ # Business logic
| | | +-- notes_svc.go
| | | +-- search_svc.go
| | +-- storage/ # File storage abstraction
| | +-- local.go
| +-- migrations/ # Database migrations
| +-- go.mod
+-- frontend/
| +-- src/
| | +-- components/ # Reusable UI components
| | +-- pages/ # Route-level views
| | +-- stores/ # Client state management
| | +-- api/ # Backend API client
| | +-- utils/ # Helpers
| +-- public/
| +-- package.json
+-- data/ # Mounted volume (gitignored)
| +-- app.db # SQLite database
| +-- uploads/ # Image storage
+-- .planning/ # Project planning docs
Structure Rationale
- Monorepo with backend/frontend split: Keeps deployment simple (single container possible) while maintaining clear separation
- internal/ in Go: Prevents external packages from importing internals; enforces encapsulation
- Repository pattern: Abstracts SQLite access, enables future database swap if needed
- Service layer: Business logic separated from HTTP handlers for testability
- data/ volume: Single mount point for all persistent data (database + files)
Architectural Patterns
Pattern 1: Modular Monolith
What: Single deployable unit with clear internal module boundaries. Each domain (notes, tasks, tags, search) has its own package but shares the same database and process.
When to use: Single-user or small-team applications where operational simplicity matters more than independent scaling.
Trade-offs:
- Pro: Simple deployment, easy debugging, no network overhead between modules
- Pro: Single database transaction across domains when needed
- Con: All modules must use same language/runtime
- Con: Cannot scale modules independently (not needed for single user)
Example:
// internal/api/routes.go
func SetupRoutes(r *mux.Router, services *Services) {
// Each domain gets its own route group
notes := r.PathPrefix("/api/notes").Subrouter()
notes.HandleFunc("", services.Notes.List).Methods("GET")
notes.HandleFunc("", services.Notes.Create).Methods("POST")
tasks := r.PathPrefix("/api/tasks").Subrouter()
tasks.HandleFunc("", services.Tasks.List).Methods("GET")
// Clear boundaries, but same process
}
Pattern 2: Repository Pattern for Data Access
What: Abstract data access behind interfaces. Repositories handle all database queries; services call repositories, not raw SQL.
When to use: Always for anything beyond trivial apps. Enables testing with mocks and future database changes.
Trade-offs:
- Pro: Testable services (mock repositories)
- Pro: Database-agnostic business logic
- Pro: Query logic centralized
- Con: Additional abstraction layer
- Con: Can become overly complex if over-engineered
Example:
// internal/repository/notes_repo.go
type NotesRepository interface {
Create(ctx context.Context, note *models.Note) error
GetByID(ctx context.Context, id string) (*models.Note, error)
List(ctx context.Context, opts ListOptions) ([]*models.Note, error)
Search(ctx context.Context, query string) ([]*models.Note, error)
}
type sqliteNotesRepo struct {
db *sql.DB
}
func (r *sqliteNotesRepo) Search(ctx context.Context, query string) ([]*models.Note, error) {
// FTS5 search query
rows, err := r.db.QueryContext(ctx, `
SELECT n.id, n.title, n.body, n.created_at
FROM notes n
JOIN notes_fts ON notes_fts.rowid = n.id
WHERE notes_fts MATCH ?
ORDER BY rank
`, query)
// ...
}
Pattern 3: Content-Addressable Image Storage
What: Store images using content hash (MD5/SHA256) as filename. Prevents duplicates and enables cache-forever headers.
When to use: Any app storing user-uploaded images where deduplication and caching matter.
Trade-offs:
- Pro: Automatic deduplication
- Pro: Cache-forever possible (hash changes if content changes)
- Pro: Simple to verify integrity
- Con: Need reference counting for deletion
- Con: Slightly more complex upload logic
Example:
// internal/storage/local.go
func (s *LocalStorage) Store(ctx context.Context, file io.Reader) (string, error) {
// Hash while copying to temp file
hasher := sha256.New()
tmp, _ := os.CreateTemp(s.uploadDir, "upload-*")
defer tmp.Close()
_, err := io.Copy(io.MultiWriter(tmp, hasher), file)
if err != nil {
return "", err
}
hash := hex.EncodeToString(hasher.Sum(nil))
finalPath := filepath.Join(s.uploadDir, hash[:2], hash)
// Move to final location (subdirs by first 2 chars prevent too many files in one dir)
os.MkdirAll(filepath.Dir(finalPath), 0755)
os.Rename(tmp.Name(), finalPath)
return hash, nil
}
Data Flow
Request Flow
[User Action: Create Note]
|
v
[Frontend Component] --HTTP POST /api/notes--> [Notes Handler]
| |
| (optimistic UI update) v
| [Notes Service]
| |
| v
| [Notes Repository]
| |
| v
| [SQLite INSERT]
| |
| v
| [FTS5 trigger auto-updates index]
| v
v v
[UI shows new note] <--JSON response-- [Return created note]
Image Upload Flow
[User: Attach Image]
|
v
[Frontend: file input] --multipart POST /api/upload--> [Upload Handler]
| |
| (show progress) v
| [Validate: type, size]
| |
| v
| [Hash content]
| |
| v
| [Store to /data/uploads/{hash}]
| |
| v
| [Create attachment record in DB]
| |
v v
[Insert image into note] <--{attachment_id, url}-- [Return attachment metadata]
Search Flow
[User: Type search query]
|
v
[Frontend: debounced input] --GET /api/search?q=...--> [Search Handler]
| |
| (show loading) v
| [Search Service]
| |
| v
| [Query FTS5 virtual table]
| |
| v
| [JOIN with notes/tasks tables]
| |
| v
| [Apply ranking (bm25)]
| |
v v
[Display ranked results] <--JSON array-- [Return ranked results with snippets]
Key Data Flows
- Note/Task CRUD: Frontend -> API Handler -> Service -> Repository -> SQLite. FTS5 index auto-updates via triggers.
- Image Upload: Frontend -> Upload Handler -> File Storage (hash-based) -> DB record. Returns URL for embedding.
- Full-Text Search: Frontend -> Search Handler -> FTS5 Query -> Ranked results with snippets.
- Tag Association: Many-to-many through junction table. Tag changes trigger re-index if needed.
Scaling Considerations
| Scale | Architecture Adjustments |
|---|---|
| 1 user (target) | Single SQLite file, local file storage, single container. Current design is perfect. |
| 2-10 users | Still works fine. SQLite handles concurrent reads well. May want WAL mode for better write concurrency. |
| 10-100 users | Consider PostgreSQL for better write concurrency. Move files to S3-compatible storage (MinIO or Garage for self-hosted). |
| 100+ users | Out of scope for personal app. Would need auth system, PostgreSQL, object storage, potentially message queue for uploads. |
Scaling Priorities (For Future)
- First bottleneck: SQLite write contention (if ever). Fix: WAL mode (simple) or PostgreSQL (more complex).
- Second bottleneck: File storage if hosting many large images. Fix: Object storage with content-addressing.
Note: For a single-user personal app, these scaling considerations are theoretical. SQLite with WAL mode can handle thousands of notes and tasks without issue.
Anti-Patterns
Anti-Pattern 1: Storing Images in Database
What people do: Store image bytes directly in SQLite as BLOBs.
Why it's wrong:
- Bloats database file significantly
- Slows down backups (entire DB must be copied for any change)
- Cannot leverage filesystem caching
- Makes database migrations more complex
Do this instead: Store images on filesystem, store path/hash reference in database. Use content-addressable storage (hash as filename) for deduplication.
Anti-Pattern 2: No Full-Text Search Index
What people do: Use LIKE '%query%' for search.
Why it's wrong:
- Full table scan for every search
- Cannot rank by relevance
- No word stemming or tokenization
- Gets unusably slow with a few thousand notes
Do this instead: Use SQLite FTS5 from the start. It's built-in, requires no external dependencies, and handles relevance ranking.
Anti-Pattern 3: Microservices for Single User
What people do: Split notes, tasks, search, auth into separate services "for scalability."
Why it's wrong:
- Massive operational overhead for no benefit
- Network latency between services
- Distributed transactions become complex
- Debugging across services is painful
- 2024-2025 industry trend: many teams consolidating microservices back to monoliths
Do this instead: Build a well-structured modular monolith. Clear internal boundaries, single deployment. Extract services later only if needed (you won't need to for a personal app).
Anti-Pattern 4: Overengineering Auth for Single User
What people do: Implement full OAuth2/OIDC, JWT refresh tokens, role-based access control.
Why it's wrong:
- Single user doesn't need roles
- Complexity adds attack surface
- More code to maintain
- Personal app accessible only on your network
Do this instead: Simple session-based auth with a password. Consider basic HTTP auth behind a reverse proxy, or even IP-based allowlisting if only accessible from home network.
Integration Points
External Services
| Service | Integration Pattern | Notes |
|---|---|---|
| None required | N/A | Self-hosted, no external dependencies by design |
| Optional: Reverse Proxy | HTTP | Nginx/Traefik for HTTPS termination if exposed to internet |
| Optional: Backup | File copy | Simple rsync/backup of data/ directory contains everything |
Internal Boundaries
| Boundary | Communication | Notes |
|---|---|---|
| Frontend <-> Backend | REST/JSON over HTTP | OpenAPI spec recommended for documentation |
| API Handlers <-> Services | Direct function calls | Same process, no serialization |
| Services <-> Repositories | Interface calls | Enables mocking in tests |
| Services <-> File Storage | Interface calls | Abstracts local vs future S3 |
Database Schema Overview
-- Core entities
CREATE TABLE notes (
id TEXT PRIMARY KEY,
title TEXT,
body TEXT NOT NULL,
type TEXT CHECK(type IN ('note', 'task')) NOT NULL,
status TEXT CHECK(status IN ('open', 'done', 'archived')), -- for tasks
created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
updated_at DATETIME DEFAULT CURRENT_TIMESTAMP
);
-- Full-text search (virtual table synced via triggers)
CREATE VIRTUAL TABLE notes_fts USING fts5(
title,
body,
content='notes',
content_rowid='rowid'
);
-- Tags (many-to-many)
CREATE TABLE tags (
id TEXT PRIMARY KEY,
name TEXT UNIQUE NOT NULL
);
CREATE TABLE note_tags (
note_id TEXT REFERENCES notes(id) ON DELETE CASCADE,
tag_id TEXT REFERENCES tags(id) ON DELETE CASCADE,
PRIMARY KEY (note_id, tag_id)
);
-- Attachments (images)
CREATE TABLE attachments (
id TEXT PRIMARY KEY,
note_id TEXT REFERENCES notes(id) ON DELETE CASCADE,
hash TEXT NOT NULL, -- content hash, also filename
filename TEXT, -- original filename
mime_type TEXT NOT NULL,
size_bytes INTEGER,
created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);
-- Indexes
CREATE INDEX idx_notes_type ON notes(type);
CREATE INDEX idx_notes_created ON notes(created_at DESC);
CREATE INDEX idx_attachments_note ON attachments(note_id);
CREATE INDEX idx_attachments_hash ON attachments(hash);
Build Order Implications
Based on component dependencies, suggested implementation order:
-
Phase 1: Data Foundation
- SQLite setup with migrations
- Basic schema (notes table)
- Repository layer for notes
- No FTS5 yet (add later)
-
Phase 2: Core API
- REST API handlers for notes CRUD
- Service layer
- Basic frontend with note list/create/edit
-
Phase 3: Tasks Differentiation
- Add type column (note vs task)
- Task-specific status handling
- Frontend task views
-
Phase 4: Tags
- Tags table and junction table
- Tag CRUD API
- Tag filtering in frontend
-
Phase 5: Image Attachments
- File storage abstraction
- Upload API with validation
- Attachment records in DB
- Frontend image upload/display
-
Phase 6: Search
- FTS5 virtual table and triggers
- Search API with ranking
- Search UI with highlighting
-
Phase 7: Containerization
- Dockerfile (multi-stage)
- docker-compose.yml
- Volume mounts for data persistence
Rationale: This order ensures each phase builds on working foundation. Notes must work before tasks (which are just notes with extra fields). Tags and attachments can be added independently. Search comes later as it indexes existing content. Containerization last so development is simpler.
Sources
- Standard Notes Self-Hosting Architecture - API Gateway, Syncing Server patterns
- Flatnotes - Database-less Architecture - Simple markdown file storage approach
- Evernote Data Structure - Note/Resource/Attachment model
- Task Manager Database Schema - Tags and task relationships
- SQLite FTS5 Extension - Full-text search implementation (HIGH confidence - official docs)
- Microservices vs Monoliths in 2026 - Modular monolith recommendation
- MinIO Alternatives - Self-hosted storage options
- Image Upload Architecture - Content-addressable storage pattern
- Modern Web Application Architecture 2026 - Container and deployment patterns
Architecture research for: Personal task/notes web application Researched: 2026-01-29
v2.0 Architecture: CI/CD and Observability Integration
Domain: GitOps CI/CD and Observability Stack Researched: 2026-02-03 Confidence: HIGH (verified with official documentation)
Executive Summary
This section details how ArgoCD, Prometheus, Grafana, and Loki integrate with the existing k3s/Gitea/Traefik architecture. The integration follows established patterns for self-hosted Kubernetes observability stacks, with specific considerations for k3s's lightweight nature and Traefik as the ingress controller.
Key insight: The existing CI/CD foundation (Gitea Actions + ArgoCD Application) is already in place. This milestone adds observability and operational automation rather than building from scratch.
Current Architecture Overview
Internet
|
[Traefik]
(Ingress)
|
+-------------------------+-------------------------+
| | |
task.kube2 git.kube2 (future)
.tricnet.de .tricnet.de argocd/grafana
| |
[TaskPlaner] [Gitea]
(default ns) + Actions
| Runner
| |
[Longhorn PVC] |
(data store) |
v
[Container Registry]
git.kube2.tricnet.de
Existing Components
| Component | Namespace | Purpose | Status |
|---|---|---|---|
| k3s | - | Kubernetes distribution | Running |
| Traefik | kube-system | Ingress controller | Running |
| Longhorn | longhorn-system | Persistent storage | Running |
| cert-manager | cert-manager | TLS certificates | Running |
| Gitea | gitea (assumed) | Git hosting + CI | Running |
| TaskPlaner | default | Application | Running |
| ArgoCD Application | argocd | GitOps deployment | Defined (may need install) |
Existing CI/CD Pipeline
From .gitea/workflows/build.yaml:
- Push to master triggers Gitea Actions
- Build Docker image with BuildX
- Push to Gitea Container Registry
- Update Helm values.yaml with new image tag
- Commit with
[skip ci] - ArgoCD detects change and syncs
Current gap: ArgoCD may not be installed yet (Application manifest exists but needs ArgoCD server).
Integration Architecture
Target State
Internet
|
[Traefik]
(Ingress)
|
+----------+----------+----------+----------+----------+
| | | | | |
task.* git.* argocd.* grafana.* (internal)
| | | | |
[TaskPlaner] [Gitea] [ArgoCD] [Grafana] [Prometheus]
| | | | [Loki]
| | | | [Alloy]
| +---webhook---> | |
| | | |
+------ metrics ------+----------+--------->+
+------ logs ---------+---------[Alloy]---->+ (to Loki)
Namespace Strategy
| Namespace | Components | Rationale |
|---|---|---|
argocd |
ArgoCD server, repo-server, application-controller | Standard convention; ClusterRoleBinding expects this |
monitoring |
Prometheus, Grafana, Alertmanager | Consolidate observability; kube-prometheus-stack default |
loki |
Loki, Alloy (DaemonSet) | Separate from metrics for resource isolation |
default |
TaskPlaner | Existing app deployment |
gitea |
Gitea + Actions Runner | Assumed existing |
Alternative considered: All observability in single namespace
Decision: Separate monitoring and loki because:
- Different scaling characteristics (Alloy is DaemonSet, Prometheus is StatefulSet)
- Easier resource quota management
- Standard community practice
Component Integration Details
1. ArgoCD Integration
Installation Method: Helm chart from argo/argo-cd
Integration Points:
| Integration | How | Configuration |
|---|---|---|
| Gitea Repository | HTTPS clone | Repository credential in argocd-secret |
| Gitea Webhook | POST to /api/webhook |
Reduces sync delay from 3min to seconds |
| Traefik Ingress | IngressRoute or Ingress | server.insecure=true to avoid redirect loops |
| TLS | cert-manager annotation | Let's Encrypt via existing cluster-issuer |
Critical Configuration:
# Helm values for ArgoCD with Traefik
configs:
params:
server.insecure: true # Required: Traefik handles TLS
server:
ingress:
enabled: true
ingressClassName: traefik
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
hosts:
- argocd.kube2.tricnet.de
tls:
- secretName: argocd-tls
hosts:
- argocd.kube2.tricnet.de
Webhook Setup for Gitea:
- In ArgoCD secret, set
webhook.gogs.secret(Gitea uses Gogs-compatible webhooks) - In Gitea repository settings, add webhook:
- URL:
https://argocd.kube2.tricnet.de/api/webhook - Content type:
application/json - Secret: Same as configured in ArgoCD
- URL:
Known Limitation: Webhooks work for Applications but not ApplicationSets with Gitea.
2. Prometheus/Grafana Integration (kube-prometheus-stack)
Installation Method: Helm chart prometheus-community/kube-prometheus-stack
Integration Points:
| Integration | How | Configuration |
|---|---|---|
| k3s metrics | Exposed kube-* endpoints | k3s config modification required |
| Traefik metrics | ServiceMonitor | Traefik exposes :9100/metrics |
| TaskPlaner metrics | ServiceMonitor (future) | App must expose /metrics endpoint |
| Grafana UI | Traefik Ingress | Standard Kubernetes Ingress |
Critical k3s Configuration:
k3s binds controller-manager, scheduler, and proxy to localhost by default. For Prometheus scraping, expose on 0.0.0.0.
Create/modify /etc/rancher/k3s/config.yaml:
kube-controller-manager-arg:
- "bind-address=0.0.0.0"
kube-proxy-arg:
- "metrics-bind-address=0.0.0.0"
kube-scheduler-arg:
- "bind-address=0.0.0.0"
Then restart k3s: sudo systemctl restart k3s
k3s-specific Helm values:
# Disable etcd monitoring (k3s uses sqlite, not etcd)
defaultRules:
rules:
etcd: false
kubeEtcd:
enabled: false
# Fix endpoint discovery for k3s
kubeControllerManager:
enabled: true
endpoints:
- <k3s-server-ip>
service:
enabled: true
port: 10257
targetPort: 10257
kubeScheduler:
enabled: true
endpoints:
- <k3s-server-ip>
service:
enabled: true
port: 10259
targetPort: 10259
kubeProxy:
enabled: true
endpoints:
- <k3s-server-ip>
service:
enabled: true
port: 10249
targetPort: 10249
# Grafana ingress
grafana:
ingress:
enabled: true
ingressClassName: traefik
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
hosts:
- grafana.kube2.tricnet.de
tls:
- secretName: grafana-tls
hosts:
- grafana.kube2.tricnet.de
ServiceMonitor for TaskPlaner (future):
Once TaskPlaner exposes /metrics:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: taskplaner
namespace: monitoring
labels:
release: prometheus # Must match kube-prometheus-stack release
spec:
namespaceSelector:
matchNames:
- default
selector:
matchLabels:
app.kubernetes.io/name: taskplaner
endpoints:
- port: http
path: /metrics
interval: 30s
3. Loki + Alloy Integration (Log Aggregation)
Important: Promtail is deprecated (LTS until Feb 2026, EOL March 2026). Use Grafana Alloy instead.
Installation Method:
- Loki: Helm chart
grafana/loki(monolithic mode for single node) - Alloy: Helm chart
grafana/alloy
Integration Points:
| Integration | How | Configuration |
|---|---|---|
| Pod logs | Alloy DaemonSet | Mounts /var/log/pods |
| Loki storage | Longhorn PVC or MinIO | Single-binary uses filesystem |
| Grafana datasource | Auto-configured | kube-prometheus-stack integration |
| k3s node logs | Alloy journal reader | journalctl access |
Deployment Mode Decision:
| Mode | When to Use | Our Choice |
|---|---|---|
| Monolithic (single-binary) | Small deployments, <100GB/day | Yes - single node k3s |
| Simple Scalable | Medium deployments | No |
| Microservices | Large scale, HA required | No |
Loki Helm values (monolithic):
deploymentMode: SingleBinary
singleBinary:
replicas: 1
persistence:
enabled: true
storageClass: longhorn
size: 10Gi
# Disable components not needed in monolithic
read:
replicas: 0
write:
replicas: 0
backend:
replicas: 0
# Use filesystem storage (not S3/MinIO for simplicity)
loki:
storage:
type: filesystem
schemaConfig:
configs:
- from: "2024-01-01"
store: tsdb
object_store: filesystem
schema: v13
index:
prefix: index_
period: 24h
Alloy DaemonSet Configuration:
# alloy-values.yaml
alloy:
configMap:
create: true
content: |
// Kubernetes logs collection
loki.source.kubernetes "pods" {
targets = discovery.kubernetes.pods.targets
forward_to = [loki.write.default.receiver]
}
// Send to Loki
loki.write "default" {
endpoint {
url = "http://loki.loki.svc.cluster.local:3100/loki/api/v1/push"
}
}
// Kubernetes discovery
discovery.kubernetes "pods" {
role = "pod"
}
4. Traefik Metrics Integration
Traefik already exposes Prometheus metrics. Enable scraping:
Option A: ServiceMonitor (if using kube-prometheus-stack)
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: traefik
namespace: monitoring
labels:
release: prometheus
spec:
namespaceSelector:
matchNames:
- kube-system
selector:
matchLabels:
app.kubernetes.io/name: traefik
endpoints:
- port: metrics
path: /metrics
interval: 30s
Option B: Verify Traefik metrics are enabled
Check Traefik deployment args include:
--entrypoints.metrics.address=:8888
--metrics.prometheus=true
--metrics.prometheus.entryPoint=metrics
Data Flow Diagrams
Metrics Flow
+------------------+ +------------------+ +------------------+
| TaskPlaner | | Traefik | | k3s core |
| /metrics | | :9100/metrics | | :10249,10257... |
+--------+---------+ +--------+---------+ +--------+---------+
| | |
+------------------------+------------------------+
|
v
+-------------------+
| Prometheus |
| (ServiceMonitors) |
+--------+----------+
|
v
+-------------------+
| Grafana |
| (Dashboards) |
+-------------------+
Log Flow
+------------------+ +------------------+ +------------------+
| TaskPlaner | | Traefik | | Other Pods |
| stdout/stderr | | access logs | | stdout/stderr |
+--------+---------+ +--------+---------+ +--------+---------+
| | |
+------------------------+------------------------+
|
/var/log/pods
|
v
+-------------------+
| Alloy DaemonSet |
| (log collection) |
+--------+----------+
|
v
+-------------------+
| Loki |
| (log storage) |
+--------+----------+
|
v
+-------------------+
| Grafana |
| (log queries) |
+-------------------+
GitOps Flow
+------------+ +------------+ +---------------+ +------------+
| Developer | --> | Gitea | --> | Gitea Actions | --> | Container |
| git push | | Repository | | (build.yaml) | | Registry |
+------------+ +-----+------+ +-------+-------+ +------------+
| |
| (update values.yaml)
| |
v v
+------------+ +------------+
| Webhook | ----> | ArgoCD |
| (notify) | | Server |
+------------+ +-----+------+
|
(sync app)
|
v
+------------+
| Kubernetes |
| (deploy) |
+------------+
Build Order (Dependencies)
Based on component dependencies, recommended installation order:
Phase 1: ArgoCD (no dependencies on observability)
1. Install ArgoCD via Helm
- Creates namespace: argocd
- Verify existing Application manifest works
- Configure Gitea webhook
Dependencies: None (Traefik already running)
Validates: GitOps pipeline end-to-end
Phase 2: kube-prometheus-stack (foundational observability)
2. Configure k3s metrics exposure
- Modify /etc/rancher/k3s/config.yaml
- Restart k3s
3. Install kube-prometheus-stack via Helm
- Creates namespace: monitoring
- Includes: Prometheus, Grafana, Alertmanager
- Includes: Default dashboards and alerts
Dependencies: k3s metrics exposed
Validates: Basic cluster monitoring working
Phase 3: Loki + Alloy (log aggregation)
4. Install Loki via Helm (monolithic mode)
- Creates namespace: loki
- Configure storage with Longhorn
5. Install Alloy via Helm
- DaemonSet in loki namespace
- Configure Kubernetes log discovery
- Point to Loki endpoint
6. Add Loki datasource to Grafana
- URL: http://loki.loki.svc.cluster.local:3100
Dependencies: Grafana from step 3, storage
Validates: Logs visible in Grafana Explore
Phase 4: Application Integration
7. Add TaskPlaner metrics endpoint (if not exists)
- Expose /metrics in app
- Create ServiceMonitor
8. Create application dashboards in Grafana
- TaskPlaner-specific metrics
- Request latency, error rates
Dependencies: All previous phases
Validates: Full observability of application
Resource Requirements
| Component | CPU Request | Memory Request | Storage |
|---|---|---|---|
| ArgoCD (all) | 500m | 512Mi | - |
| Prometheus | 200m | 512Mi | 10Gi (Longhorn) |
| Grafana | 100m | 256Mi | 1Gi (Longhorn) |
| Alertmanager | 50m | 64Mi | 1Gi (Longhorn) |
| Loki | 200m | 256Mi | 10Gi (Longhorn) |
| Alloy (per node) | 100m | 128Mi | - |
Total additional: ~1.2 CPU cores, ~1.7Gi RAM, ~22Gi storage
Security Considerations
Network Policies
Consider network policies to restrict:
- Prometheus scraping only from monitoring namespace
- Loki ingestion only from Alloy
- Grafana access only via Traefik
Secrets Management
| Secret | Location | Purpose |
|---|---|---|
argocd-initial-admin-secret |
argocd ns | Initial admin password |
argocd-secret |
argocd ns | Webhook secrets, repo credentials |
grafana-admin |
monitoring ns | Grafana admin password |
Ingress Authentication
For production, consider:
- ArgoCD: Built-in OIDC/OAuth integration
- Grafana: Built-in auth (local, LDAP, OAuth)
- Prometheus: Traefik BasicAuth middleware (already pattern in use)
Anti-Patterns to Avoid
1. Skipping k3s Metrics Configuration
What happens: Prometheus installs but most dashboards show "No data" Prevention: Configure k3s to expose metrics BEFORE installing kube-prometheus-stack
2. Using Promtail Instead of Alloy
What happens: Technical debt - Promtail EOL is March 2026 Prevention: Use Alloy from the start; migration documentation exists
3. Running Loki in Microservices Mode for Small Clusters
What happens: Unnecessary complexity, resource overhead Prevention: Monolithic mode for clusters under 100GB/day log volume
4. Forgetting server.insecure for ArgoCD with Traefik
What happens: Redirect loop (ERR_TOO_MANY_REDIRECTS)
Prevention: Always set configs.params.server.insecure=true when Traefik handles TLS
5. ServiceMonitor Label Mismatch
What happens: Prometheus doesn't discover custom ServiceMonitors
Prevention: Ensure release: <helm-release-name> label matches kube-prometheus-stack release
Sources
ArgoCD:
- ArgoCD Webhook Configuration
- ArgoCD Ingress Configuration
- ArgoCD Installation
- Mastering GitOps: ArgoCD and Gitea on Kubernetes
Prometheus/Grafana:
Loki/Alloy:
- Loki Monolithic Installation
- Loki Deployment Modes
- Migrate from Promtail to Alloy
- Grafana Loki 3.4 Release
- Alloy Replacing Promtail
Traefik Integration:
Last updated: 2026-02-03