garmin-warehouse¶
The largest of Casey's systems. Local-first pipeline at ~/garmin-warehouse/
that ingests training data + research podcasts and produces queryable
analytics, a kb, a UI, and synthesis (findings/).
For the canonical layer-by-layer breakdown, see
~/garmin-warehouse/ARCHITECTURE.md. This page is the orientation
+ system-level "what to know."
Quick orientation¶
- Location:
~/garmin-warehouse/(uv project, Python 3.12) - Sister repo:
~/data-ingestion/(the podcast pipeline that feeds the kb) - External data:
~/HealthData/(GarminDB-managed; 4 SQLite DBs + ICU cache files)~/.GarminDb/(GarminDB credentials + session token)- Run command for UI:
kbuizsh alias →scripts/run_ui.sh→ uvicorn atlocalhost:8765+cloudflaredtunnel - Public URL:
https://warehouse.caseymanos.com(gated by Cloudflare Access PIN)
The 8 layers¶
(Summarized from ARCHITECTURE.md; read that for full detail.)
| # | Layer | Where it lives | What it does |
|---|---|---|---|
| 1 | Sources | ~/HealthData/, podcast RSS |
Raw inputs from Garmin, intervals.icu, podcast feeds |
| 2 | Ingest | garmindb_cli.py, data-ingestion/ |
Pull, normalize, store as SQLite/JSON |
| 3 | Transcripts | Modal A10G | Whisper transcription for podcasts |
| 4 | Digests | swap.py, swap-podcast/ |
Per-episode claim/study extraction via Parallel API |
| 5 | kb | kb/ (DuckDB) |
Structured corpus: 9 tables, Voyage embeddings, HNSW |
| 6 | Queries | query.py, kb/query.py |
Named queries + semantic search |
| 7 | Triage | kb/{watches,applied,dismissed}.yaml, kb/triage.py, kb/review.py |
Application queue: which claims connect to which findings |
| 8 | UI / Synthesis | ui/, findings/, analyses/ |
Reading layer + Casey's hand-curated knowledge |
Critical files¶
| Path | What it does |
|---|---|
warehouse.py |
DuckDB connect(), ATTACHes 4 GarminDB SQLite files, creates icu_activities view from CSV cache |
query.py |
8 named queries (threshold-sessions, mpw, long-runs, race-compare, etc.) over the training data |
kb/load.py |
Loads ~/data-ingestion/insights/<show>/*.raw.json files into the kb DuckDB |
kb/embed.py |
Voyage 3-large 1024d embeddings, content-hash cache, HNSW index |
kb/sync.py |
Orchestrates load + embed; the "rebuild kb" entry point |
kb/query.py |
Named kb queries (semantic, contradictions, topic, etc.) |
kb/migrate.py |
Schema migrations, applied numerically (kb/migrations/00N_*.sql) |
kb/paths.py |
Central env-overrideable path registry — use this, never hardcode |
kb/triage.py |
TUI for walking the application queue (curses-based) |
kb/review.py |
Builds the application queue from watches.yaml |
kb/corpus_diff.py |
Snapshot/diff/list — see "did this change actually do anything" |
kb/stale_prompts.py |
Find claims extracted with old prompt versions |
scripts/daily_sync.sh |
7am launchd: garmindb sync + ICU refresh + kb sync + R2 backup + Worker cache + comment scan |
scripts/rebuild_kb.sh |
Deterministic kb rebuild with manifest |
scripts/verify_kb.sh |
Post-rebuild integrity check |
scripts/cache_for_worker.py |
Writes state/cache/{yesterday,query_cache}.json to R2 for the Worker |
ui/app.py |
FastAPI + HTMX + Jinja UI |
findings/_active.md |
Workstream state (read by coach subagent on session start) |
findings/_todo.md |
Casey's checklist (surfaced in UI at /active) |
completion_log.jsonl |
Strength/strides/sauna log (canonical) |
corrections.jsonl |
Interpretive overlays from Telegram morning-summary replies (see ADR 003) |
races.yaml |
Casey's race history with build summaries |
runs.jsonl |
Stage-level observability for sync/ingest/embed/load |
tests/ |
~94 pytest tests (44 kb/load + 21 idempotency + 24 freshness + 5 cache) |
Schema reality check¶
The schema is in kb/migrations/*.sql (managed by kb/migrate.py),
NOT kb/schema.sql. Migrations are at v4 in production. load.py
uses DELETE rather than DROP+CREATE. If you read schema.sql expecting
it to be authoritative, you'll be wrong.
Check current migration state:
kb scale (as of 2026-05-04)¶
- 8 podcast feeds: swap (174 main + 118 bonus), real-science-of-sport (97), running-effect (58), letsrun (58), morning-shakeout (23), coffee-club (11), strength-running (5)
- 544 episodes, 5,555 claims, 472 studies
- 19 findings (curated synthesis in
findings/) - ~290 episodes have raw.json on disk (some kb counts differ due to triage filtering at extraction time)
What this system does NOT do¶
- ❌ Hosted database — everything is local SQLite + DuckDB
- ❌ Multi-user — single-user system, 127.0.0.1 only (UI exposed via tunnel for Casey himself, not multi-tenant)
- ❌ Realtime — daily cron cadence, no websockets
- ❌ Authentication beyond Cloudflare Access PIN — no app-level auth
- ❌ Mobile — desktop Mac only
Related pages¶
systems/data-ingestion.md— the podcast pipeline that feeds the kbsystems/otq-checkin-worker.md— Telegram bot reading/writing from this warehouse via R2reference/cron-schedules.md— when daily_sync firesrunbooks/daily-sync-failures.md— how to recover when sync breaksrunbooks/kb-rebuild.md— full kb rebuild pathreference/kb-schema.md— table definitions + migration list