docs: complete project research

Files:
- STACK.md - SvelteKit + SQLite + TypeScript stack recommendation
- FEATURES.md - Feature landscape with MVP definition
- ARCHITECTURE.md - Modular monolith architecture with repository pattern
- PITFALLS.md - Critical pitfalls and prevention strategies
- SUMMARY.md - Executive synthesis with roadmap implications

Key findings:
- Stack: SvelteKit 2.50.x + Svelte 5.49.x with SQLite and better-sqlite3 for single-user simplicity
- Architecture: Modular monolith with content-addressable image storage, FTS5 for search
- Critical pitfall: Store images on filesystem (not DB) from Phase 1 to avoid painful migration

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Thomas Richter
2026-01-29 03:38:41 +01:00
parent f7df924719
commit 4e7c20b3ad
5 changed files with 1531 additions and 0 deletions

View File

@@ -0,0 +1,312 @@
# Pitfalls Research
**Domain:** Personal Task/Notes Web App with Image Attachments
**Researched:** 2026-01-29
**Confidence:** MEDIUM (based on training data patterns; WebSearch unavailable for verification)
## Critical Pitfalls
### Pitfall 1: Image Storage Coupled to Database
**What goes wrong:**
Storing images as BLOBs in the database (e.g., SQLite, PostgreSQL). Database grows massive, backups become slow, and queries for non-image data get impacted. Migrations become painful when you have GBs of binary data.
**Why it happens:**
It feels simpler to have "everything in one place." Developers avoid the complexity of file storage + database references.
**How to avoid:**
Store images on filesystem (or object storage like MinIO for container setups). Store only the file path/reference in the database. Use a consistent naming convention (e.g., `{uuid}.{ext}` or `{timestamp}_{hash}.{ext}`).
**Warning signs:**
- Database file growing faster than expected
- Backup times increasing disproportionately
- "Let me just base64 encode this" appearing in code
**Phase to address:**
Phase 1 (Core Data Model) — the storage strategy must be correct from the start; migrating images out of database later is painful.
---
### Pitfall 2: No Image Upload Size/Type Validation
**What goes wrong:**
Users (even yourself) accidentally upload 50MB RAW files or non-image files. Server crashes, storage fills up, or worse — malicious files get stored.
**Why it happens:**
"It's just for me" thinking leads to skipping validation. Edge cases seem unlikely for personal tools.
**How to avoid:**
- Server-side validation of file type (magic bytes, not just extension)
- Reasonable size limits (e.g., 10MB per image)
- Image format conversion on upload (resize large images, convert HEIC to JPEG)
- Reject non-image MIME types
**Warning signs:**
- No file validation code in upload handler
- Trusting client-side file picker to filter types
- No max file size configuration
**Phase to address:**
Phase 2 (Image Handling) — build validation into the upload pipeline from the start.
---
### Pitfall 3: Tagging System Too Complex or Too Rigid
**What goes wrong:**
Two failure modes:
1. **Over-engineered:** Hierarchical tags, tag colors, tag descriptions, tag merging... system becomes a maintenance burden
2. **Under-thought:** No autocomplete, no tag normalization, end up with "work", "Work", "WORK" as separate tags
**Why it happens:**
Over-engineering: Feature creep before validating basic needs.
Under-thought: "Tags are simple" — but consistent UX requires attention.
**How to avoid:**
Start with flat tags (no hierarchy). Implement:
- Case-insensitive matching (store lowercase, display original)
- Autocomplete from existing tags
- Tag renaming capability (update all entries)
- NO: tag colors, descriptions, nesting, icons until proven needed
**Warning signs:**
- Planning tag hierarchies before shipping basic tagging
- No autocomplete in tag input
- Multiple code paths for "exact match" vs "fuzzy match"
**Phase to address:**
Phase 3 (Organization/Tags) — resist complexity; ship simple tags first, iterate based on actual usage.
---
### Pitfall 4: Search That Doesn't Search Images' Context
**What goes wrong:**
User photographs a paper note, tags it "meeting notes", then searches for "budget" — the paper note discussing budget isn't found because search only checks text fields, not the context of why the image was captured.
**Why it happens:**
Search implementation focuses on structured data (title, description, tags) but images are opaque binary blobs.
**How to avoid:**
For v1 without OCR:
- Encourage descriptive titles/notes when capturing images
- Make it easy to add context to image entries
- Consider a "description" field specifically for image content
For later:
- OCR on upload (Tesseract, cloud OCR)
- Store extracted text for search indexing
**Warning signs:**
- Image entries have only tags, no description field
- Search implementation ignores entry body/notes
- User finds themselves re-photographing notes to find content
**Phase to address:**
Phase 2 (Image Handling) — ensure data model supports rich metadata for images.
Phase 4 (Search) — index all text fields including descriptions.
---
### Pitfall 5: Mobile Browser Capture UX Disaster
**What goes wrong:**
Camera capture works in desktop browser testing but fails or is clunky on mobile:
- File input doesn't trigger camera
- Captured images are wrong orientation (EXIF rotation ignored)
- Upload fails silently on mobile networks
- UI doesn't fit mobile viewport
**Why it happens:**
Testing on desktop only. Mobile browser APIs have quirks. EXIF orientation handling is notoriously inconsistent.
**How to avoid:**
- Use `<input type="file" accept="image/*" capture="environment">` for mobile camera
- Handle EXIF orientation server-side (normalize on upload)
- Test on actual mobile devices early
- Implement upload progress indicator
- Design mobile-first (small viewport is the constraint)
**Warning signs:**
- No `capture` attribute on file input
- Images appearing rotated in UI
- "Works on desktop" but not tested on phone
- No upload progress feedback
**Phase to address:**
Phase 2 (Image Handling) — mobile camera capture is a core requirement, not an afterthought.
---
### Pitfall 6: No Data Export/Backup Strategy
**What goes wrong:**
Months of notes and images, then:
- Database corruption
- Accidental deletion
- Want to migrate to different system
- Container volume disappears
No way to recover because data only exists in app's internal format.
**Why it happens:**
"I'll add export later" — but later never comes. Personal projects lack the forcing function of user complaints.
**How to avoid:**
- Design export from day one (JSON + image files in a zip)
- Automated backup script (cron job or container health check)
- Document the data format so future-you can parse it
- Consider SQLite file-based backup if using SQLite (just copy the file)
**Warning signs:**
- No export endpoint in API
- No backup documentation
- Only way to access data is through the UI
- No volume mount strategy for container deployment
**Phase to address:**
Phase 1 (Core Data Model) — export-friendly data model.
Phase 5 (Polish) — implement actual export functionality.
---
### Pitfall 7: Task/Thought Distinction Becomes Confusing
**What goes wrong:**
The distinction between "task" (actionable) and "thought" (reference) seems clear initially but breaks down:
- Is a meeting note a task or thought?
- Is a reminder a task?
- User forgets which type they used and can't find things
- Some entries are both
**Why it happens:**
Taxonomies that seem obvious become fuzzy with real data. Users don't think in the developer's categories.
**How to avoid:**
- Keep distinction minimal (maybe just a boolean: "actionable?")
- Allow changing type after creation
- Don't create separate "task view" and "thought view" initially — unified view with filter
- Consider: is this distinction even needed? Tags might be enough ("action-needed" tag)
**Warning signs:**
- Planning elaborate workflows for tasks vs thoughts
- Separate database tables for tasks and thoughts
- Users hesitating at "Is this a task or thought?" during capture
- Building two separate UIs
**Phase to address:**
Phase 1 (Core Data Model) — model as unified "entries" with a type field, not separate entities.
---
## Technical Debt Patterns
Shortcuts that seem reasonable but create long-term problems.
| Shortcut | Immediate Benefit | Long-term Cost | When Acceptable |
|----------|-------------------|----------------|-----------------|
| Storing images as base64 in JSON | Simple API design | 33% size increase, slow serialization | Never — use multipart/form-data |
| No pagination on list views | Simpler frontend code | UI freezes with 500+ entries | Only for MVP with <100 entries, add quickly |
| Hardcoded single-user auth | Skip auth complexity | Can't add users later, security theater | Acceptable for personal tool if network-isolated |
| SQLite without WAL mode | Default config "just works" | Concurrent access issues | Never — always enable WAL for web apps |
| No image thumbnails | Skip image processing setup | Slow page loads, excessive bandwidth | Only for MVP, add in first polish phase |
## Integration Gotchas
Common mistakes when connecting to external services.
| Integration | Common Mistake | Correct Approach |
|-------------|----------------|------------------|
| Container volumes | Using default Docker volumes | Named volumes with explicit backup strategy |
| Reverse proxy (nginx/traefik) | Missing client_max_body_size | Configure for max image upload size + margin |
| Mobile camera API | Assuming desktop file input behavior | Test capture attribute, handle EXIF |
| Browser localStorage | Storing auth tokens without expiry | Use httpOnly cookies or short-lived tokens |
## Performance Traps
Patterns that work at small scale but fail as usage grows.
| Trap | Symptoms | Prevention | When It Breaks |
|------|----------|------------|----------------|
| Loading all entries on page load | Page takes seconds to load | Pagination, virtual scrolling | 200+ entries |
| Full-text search without index | Search takes seconds | FTS5 in SQLite, or search index | 1000+ entries |
| No image lazy loading | Page loads all images | Intersection Observer, lazy src | 20+ images visible |
| Synchronous image processing | Upload hangs for large files | Background processing queue | 5MB+ images |
| No database connection pooling | Connection errors under load | Use connection pool (even for SQLite) | Concurrent requests |
## Security Mistakes
Domain-specific security issues beyond general web security.
| Mistake | Risk | Prevention |
|---------|------|------------|
| Predictable image URLs | Anyone with URL can view images | UUID-based paths, auth check on image fetch |
| No auth on API endpoints | Data exposed to network | At minimum, basic auth or token |
| Storing original filenames | Path traversal, XSS in filenames | Rename to UUID on upload |
| EXIF data preserved | Location data leaked in images | Strip EXIF on upload (except orientation) |
| Direct file path in database | Path traversal on retrieval | Store relative path, validate on read |
## UX Pitfalls
Common user experience mistakes in this domain.
| Pitfall | User Impact | Better Approach |
|---------|-------------|-----------------|
| No quick capture mode | Friction kills habit formation | One-click/tap to new entry, minimal required fields |
| Tags require exact typing | Frustrating to remember tag names | Autocomplete, recent tags shown |
| No undo for delete | Data loss anxiety | Soft delete with "recently deleted" view |
| Image-only entries need title | Can't capture quickly | Allow entries with just image, no required title |
| Desktop-first design | Unusable on primary capture device (phone) | Mobile-first, responsive |
## "Looks Done But Isn't" Checklist
Things that appear complete but are missing critical pieces.
- [ ] **Image upload:** Often missing server-side type validation — verify file magic bytes, not just extension
- [ ] **Image display:** Often missing EXIF rotation handling — verify portrait photos display correctly
- [ ] **Search:** Often missing full-text indexing — verify search is fast with 1000+ entries
- [ ] **Tags:** Often missing case normalization — verify "Work" and "work" are same tag
- [ ] **Mobile capture:** Often missing camera integration — verify can photograph directly in mobile browser
- [ ] **Data persistence:** Often missing volume mounts — verify data survives container restart
- [ ] **Delete:** Often missing soft delete — verify deleted items can be recovered
## Recovery Strategies
When pitfalls occur despite prevention, how to recover.
| Pitfall | Recovery Cost | Recovery Steps |
|---------|---------------|----------------|
| Images in database | HIGH | Write migration script, extract to filesystem, update references, retest all image features |
| No export capability | MEDIUM | Add export endpoint, document format, backfill historical data |
| Broken mobile capture | LOW | Fix input attributes, test on device, may need EXIF handling |
| Tag inconsistency (case issues) | MEDIUM | Write migration to normalize, update search/filter logic |
| Missing pagination | MEDIUM | Add pagination to API, update frontend, may need loading states |
| Corrupted database (no backup) | CRITICAL | Hope for partial recovery; rebuild from any image files; start fresh |
## Pitfall-to-Phase Mapping
How roadmap phases should address these pitfalls.
| Pitfall | Prevention Phase | Verification |
|---------|------------------|--------------|
| Images in database | Phase 1: Data Model | Verify images stored on filesystem with DB reference |
| No upload validation | Phase 2: Image Handling | Test with oversized file, wrong file type |
| Complex tagging | Phase 3: Tags | Ship simple tags, resist adding features |
| Search misses image context | Phase 2 + Phase 4 | Search for text that should be in image description |
| Mobile capture broken | Phase 2: Image Handling | Test on actual phone, verify orientation |
| No backup/export | Phase 1 (model) + Phase 5 (implement) | Export data, reimport to fresh instance |
| Task/thought confusion | Phase 1: Data Model | Unified entry model with type field |
## Sources
- Training data patterns from todo/notes app development (MEDIUM confidence)
- Common SQLite web app patterns (HIGH confidence — well documented)
- Mobile browser API quirks (MEDIUM confidence — may have changed)
- Image handling best practices (HIGH confidence — fundamental patterns)
- Container deployment patterns (HIGH confidence — well documented)
**Note:** WebSearch was unavailable for verification. Pitfalls are based on common patterns observed in training data. Recommend validating specific technical claims (e.g., current mobile browser capture API behavior) during implementation.
---
*Pitfalls research for: Personal Task/Notes Web App*
*Researched: 2026-01-29*