docs: complete project research

Files: - STACK.md - SvelteKit + SQLite + TypeScript stack recommendation - FEATURES.md - Feature landscape with MVP definition - ARCHITECTURE.md - Modular monolith architecture with repository pattern - PITFALLS.md - Critical pitfalls and prevention strategies - SUMMARY.md - Executive synthesis with roadmap implications Key findings: - Stack: SvelteKit 2.50.x + Svelte 5.49.x with SQLite and better-sqlite3 for single-user simplicity - Architecture: Modular monolith with content-addressable image storage, FTS5 for search - Critical pitfall: Store images on filesystem (not DB) from Phase 1 to avoid painful migration Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 03:38:41 +01:00
parent f7df924719
commit 4e7c20b3ad
5 changed files with 1531 additions and 0 deletions
--- a/.planning/research/PITFALLS.md
+++ b/.planning/research/PITFALLS.md
@@ -0,0 +1,312 @@
+# Pitfalls Research
+
+**Domain:** Personal Task/Notes Web App with Image Attachments
+**Researched:** 2026-01-29
+**Confidence:** MEDIUM (based on training data patterns; WebSearch unavailable for verification)
+
+## Critical Pitfalls
+
+### Pitfall 1: Image Storage Coupled to Database
+
+**What goes wrong:**
+Storing images as BLOBs in the database (e.g., SQLite, PostgreSQL). Database grows massive, backups become slow, and queries for non-image data get impacted. Migrations become painful when you have GBs of binary data.
+
+**Why it happens:**
+It feels simpler to have "everything in one place." Developers avoid the complexity of file storage + database references.
+
+**How to avoid:**
+Store images on filesystem (or object storage like MinIO for container setups). Store only the file path/reference in the database. Use a consistent naming convention (e.g., `{uuid}.{ext}` or `{timestamp}_{hash}.{ext}`).
+
+**Warning signs:**
+- Database file growing faster than expected
+- Backup times increasing disproportionately
+- "Let me just base64 encode this" appearing in code
+
+**Phase to address:**
+Phase 1 (Core Data Model) — the storage strategy must be correct from the start; migrating images out of database later is painful.
+
+---
+
+### Pitfall 2: No Image Upload Size/Type Validation
+
+**What goes wrong:**
+Users (even yourself) accidentally upload 50MB RAW files or non-image files. Server crashes, storage fills up, or worse — malicious files get stored.
+
+**Why it happens:**
+"It's just for me" thinking leads to skipping validation. Edge cases seem unlikely for personal tools.
+
+**How to avoid:**
+- Server-side validation of file type (magic bytes, not just extension)
+- Reasonable size limits (e.g., 10MB per image)
+- Image format conversion on upload (resize large images, convert HEIC to JPEG)
+- Reject non-image MIME types
+
+**Warning signs:**
+- No file validation code in upload handler
+- Trusting client-side file picker to filter types
+- No max file size configuration
+
+**Phase to address:**
+Phase 2 (Image Handling) — build validation into the upload pipeline from the start.
+
+---
+
+### Pitfall 3: Tagging System Too Complex or Too Rigid
+
+**What goes wrong:**
+Two failure modes:
+1. **Over-engineered:** Hierarchical tags, tag colors, tag descriptions, tag merging... system becomes a maintenance burden
+2. **Under-thought:** No autocomplete, no tag normalization, end up with "work", "Work", "WORK" as separate tags
+
+**Why it happens:**
+Over-engineering: Feature creep before validating basic needs.
+Under-thought: "Tags are simple" — but consistent UX requires attention.
+
+**How to avoid:**
+Start with flat tags (no hierarchy). Implement:
+- Case-insensitive matching (store lowercase, display original)
+- Autocomplete from existing tags
+- Tag renaming capability (update all entries)
+- NO: tag colors, descriptions, nesting, icons until proven needed
+
+**Warning signs:**
+- Planning tag hierarchies before shipping basic tagging
+- No autocomplete in tag input
+- Multiple code paths for "exact match" vs "fuzzy match"
+
+**Phase to address:**
+Phase 3 (Organization/Tags) — resist complexity; ship simple tags first, iterate based on actual usage.
+
+---
+
+### Pitfall 4: Search That Doesn't Search Images' Context
+
+**What goes wrong:**
+User photographs a paper note, tags it "meeting notes", then searches for "budget" — the paper note discussing budget isn't found because search only checks text fields, not the context of why the image was captured.
+
+**Why it happens:**
+Search implementation focuses on structured data (title, description, tags) but images are opaque binary blobs.
+
+**How to avoid:**
+For v1 without OCR:
+- Encourage descriptive titles/notes when capturing images
+- Make it easy to add context to image entries
+- Consider a "description" field specifically for image content
+
+For later:
+- OCR on upload (Tesseract, cloud OCR)
+- Store extracted text for search indexing
+
+**Warning signs:**
+- Image entries have only tags, no description field
+- Search implementation ignores entry body/notes
+- User finds themselves re-photographing notes to find content
+
+**Phase to address:**
+Phase 2 (Image Handling) — ensure data model supports rich metadata for images.
+Phase 4 (Search) — index all text fields including descriptions.
+
+---
+
+### Pitfall 5: Mobile Browser Capture UX Disaster
+
+**What goes wrong:**
+Camera capture works in desktop browser testing but fails or is clunky on mobile:
+- File input doesn't trigger camera
+- Captured images are wrong orientation (EXIF rotation ignored)
+- Upload fails silently on mobile networks
+- UI doesn't fit mobile viewport
+
+**Why it happens:**
+Testing on desktop only. Mobile browser APIs have quirks. EXIF orientation handling is notoriously inconsistent.
+
+**How to avoid:**
+- Use `<input type="file" accept="image/*" capture="environment">` for mobile camera
+- Handle EXIF orientation server-side (normalize on upload)
+- Test on actual mobile devices early
+- Implement upload progress indicator
+- Design mobile-first (small viewport is the constraint)
+
+**Warning signs:**
+- No `capture` attribute on file input
+- Images appearing rotated in UI
+- "Works on desktop" but not tested on phone
+- No upload progress feedback
+
+**Phase to address:**
+Phase 2 (Image Handling) — mobile camera capture is a core requirement, not an afterthought.
+
+---
+
+### Pitfall 6: No Data Export/Backup Strategy
+
+**What goes wrong:**
+Months of notes and images, then:
+- Database corruption
+- Accidental deletion
+- Want to migrate to different system
+- Container volume disappears
+
+No way to recover because data only exists in app's internal format.
+
+**Why it happens:**
+"I'll add export later" — but later never comes. Personal projects lack the forcing function of user complaints.
+
+**How to avoid:**
+- Design export from day one (JSON + image files in a zip)
+- Automated backup script (cron job or container health check)
+- Document the data format so future-you can parse it
+- Consider SQLite file-based backup if using SQLite (just copy the file)
+
+**Warning signs:**
+- No export endpoint in API
+- No backup documentation
+- Only way to access data is through the UI
+- No volume mount strategy for container deployment
+
+**Phase to address:**
+Phase 1 (Core Data Model) — export-friendly data model.
+Phase 5 (Polish) — implement actual export functionality.
+
+---
+
+### Pitfall 7: Task/Thought Distinction Becomes Confusing
+
+**What goes wrong:**
+The distinction between "task" (actionable) and "thought" (reference) seems clear initially but breaks down:
+- Is a meeting note a task or thought?
+- Is a reminder a task?
+- User forgets which type they used and can't find things
+- Some entries are both
+
+**Why it happens:**
+Taxonomies that seem obvious become fuzzy with real data. Users don't think in the developer's categories.
+
+**How to avoid:**
+- Keep distinction minimal (maybe just a boolean: "actionable?")
+- Allow changing type after creation
+- Don't create separate "task view" and "thought view" initially — unified view with filter
+- Consider: is this distinction even needed? Tags might be enough ("action-needed" tag)
+
+**Warning signs:**
+- Planning elaborate workflows for tasks vs thoughts
+- Separate database tables for tasks and thoughts
+- Users hesitating at "Is this a task or thought?" during capture
+- Building two separate UIs
+
+**Phase to address:**
+Phase 1 (Core Data Model) — model as unified "entries" with a type field, not separate entities.
+
+---
+
+## Technical Debt Patterns
+
+Shortcuts that seem reasonable but create long-term problems.
+
+| Shortcut | Immediate Benefit | Long-term Cost | When Acceptable |
+|----------|-------------------|----------------|-----------------|
+| Storing images as base64 in JSON | Simple API design | 33% size increase, slow serialization | Never — use multipart/form-data |
+| No pagination on list views | Simpler frontend code | UI freezes with 500+ entries | Only for MVP with <100 entries, add quickly |
+| Hardcoded single-user auth | Skip auth complexity | Can't add users later, security theater | Acceptable for personal tool if network-isolated |
+| SQLite without WAL mode | Default config "just works" | Concurrent access issues | Never — always enable WAL for web apps |
+| No image thumbnails | Skip image processing setup | Slow page loads, excessive bandwidth | Only for MVP, add in first polish phase |
+
+## Integration Gotchas
+
+Common mistakes when connecting to external services.
+
+| Integration | Common Mistake | Correct Approach |
+|-------------|----------------|------------------|
+| Container volumes | Using default Docker volumes | Named volumes with explicit backup strategy |
+| Reverse proxy (nginx/traefik) | Missing client_max_body_size | Configure for max image upload size + margin |
+| Mobile camera API | Assuming desktop file input behavior | Test capture attribute, handle EXIF |
+| Browser localStorage | Storing auth tokens without expiry | Use httpOnly cookies or short-lived tokens |
+
+## Performance Traps
+
+Patterns that work at small scale but fail as usage grows.
+
+| Trap | Symptoms | Prevention | When It Breaks |
+|------|----------|------------|----------------|
+| Loading all entries on page load | Page takes seconds to load | Pagination, virtual scrolling | 200+ entries |
+| Full-text search without index | Search takes seconds | FTS5 in SQLite, or search index | 1000+ entries |
+| No image lazy loading | Page loads all images | Intersection Observer, lazy src | 20+ images visible |
+| Synchronous image processing | Upload hangs for large files | Background processing queue | 5MB+ images |
+| No database connection pooling | Connection errors under load | Use connection pool (even for SQLite) | Concurrent requests |
+
+## Security Mistakes
+
+Domain-specific security issues beyond general web security.
+
+| Mistake | Risk | Prevention |
+|---------|------|------------|
+| Predictable image URLs | Anyone with URL can view images | UUID-based paths, auth check on image fetch |
+| No auth on API endpoints | Data exposed to network | At minimum, basic auth or token |
+| Storing original filenames | Path traversal, XSS in filenames | Rename to UUID on upload |
+| EXIF data preserved | Location data leaked in images | Strip EXIF on upload (except orientation) |
+| Direct file path in database | Path traversal on retrieval | Store relative path, validate on read |
+
+## UX Pitfalls
+
+Common user experience mistakes in this domain.
+
+| Pitfall | User Impact | Better Approach |
+|---------|-------------|-----------------|
+| No quick capture mode | Friction kills habit formation | One-click/tap to new entry, minimal required fields |
+| Tags require exact typing | Frustrating to remember tag names | Autocomplete, recent tags shown |
+| No undo for delete | Data loss anxiety | Soft delete with "recently deleted" view |
+| Image-only entries need title | Can't capture quickly | Allow entries with just image, no required title |
+| Desktop-first design | Unusable on primary capture device (phone) | Mobile-first, responsive |
+
+## "Looks Done But Isn't" Checklist
+
+Things that appear complete but are missing critical pieces.
+
+- [ ] **Image upload:** Often missing server-side type validation — verify file magic bytes, not just extension
+- [ ] **Image display:** Often missing EXIF rotation handling — verify portrait photos display correctly
+- [ ] **Search:** Often missing full-text indexing — verify search is fast with 1000+ entries
+- [ ] **Tags:** Often missing case normalization — verify "Work" and "work" are same tag
+- [ ] **Mobile capture:** Often missing camera integration — verify can photograph directly in mobile browser
+- [ ] **Data persistence:** Often missing volume mounts — verify data survives container restart
+- [ ] **Delete:** Often missing soft delete — verify deleted items can be recovered
+
+## Recovery Strategies
+
+When pitfalls occur despite prevention, how to recover.
+
+| Pitfall | Recovery Cost | Recovery Steps |
+|---------|---------------|----------------|
+| Images in database | HIGH | Write migration script, extract to filesystem, update references, retest all image features |
+| No export capability | MEDIUM | Add export endpoint, document format, backfill historical data |
+| Broken mobile capture | LOW | Fix input attributes, test on device, may need EXIF handling |
+| Tag inconsistency (case issues) | MEDIUM | Write migration to normalize, update search/filter logic |
+| Missing pagination | MEDIUM | Add pagination to API, update frontend, may need loading states |
+| Corrupted database (no backup) | CRITICAL | Hope for partial recovery; rebuild from any image files; start fresh |
+
+## Pitfall-to-Phase Mapping
+
+How roadmap phases should address these pitfalls.
+
+| Pitfall | Prevention Phase | Verification |
+|---------|------------------|--------------|
+| Images in database | Phase 1: Data Model | Verify images stored on filesystem with DB reference |
+| No upload validation | Phase 2: Image Handling | Test with oversized file, wrong file type |
+| Complex tagging | Phase 3: Tags | Ship simple tags, resist adding features |
+| Search misses image context | Phase 2 + Phase 4 | Search for text that should be in image description |
+| Mobile capture broken | Phase 2: Image Handling | Test on actual phone, verify orientation |
+| No backup/export | Phase 1 (model) + Phase 5 (implement) | Export data, reimport to fresh instance |
+| Task/thought confusion | Phase 1: Data Model | Unified entry model with type field |
+
+## Sources
+
+- Training data patterns from todo/notes app development (MEDIUM confidence)
+- Common SQLite web app patterns (HIGH confidence — well documented)
+- Mobile browser API quirks (MEDIUM confidence — may have changed)
+- Image handling best practices (HIGH confidence — fundamental patterns)
+- Container deployment patterns (HIGH confidence — well documented)
+
+**Note:** WebSearch was unavailable for verification. Pitfalls are based on common patterns observed in training data. Recommend validating specific technical claims (e.g., current mobile browser capture API behavior) during implementation.
+
+---
+*Pitfalls research for: Personal Task/Notes Web App*
+*Researched: 2026-01-29*