garmin-warehouse¶

The largest of Casey's systems. Local-first pipeline at ~/garmin-warehouse/ that ingests training data + research podcasts and produces queryable analytics, a kb, a UI, and synthesis (findings/).

For the canonical layer-by-layer breakdown, see ~/garmin-warehouse/ARCHITECTURE.md. This page is the orientation + system-level "what to know."

Quick orientation¶

Location: ~/garmin-warehouse/ (uv project, Python 3.12)
Sister repo: ~/data-ingestion/ (the podcast pipeline that feeds the kb)
External data:
~/HealthData/ (GarminDB-managed; 4 SQLite DBs + ICU cache files)
~/.GarminDb/ (GarminDB credentials + session token)
Run command for UI: kbui zsh alias → scripts/run_ui.sh → uvicorn at localhost:8765 + cloudflared tunnel
Public URL: https://warehouse.caseymanos.com (gated by Cloudflare Access PIN)

The 8 layers¶

(Summarized from ARCHITECTURE.md; read that for full detail.)

#	Layer	Where it lives	What it does
1	Sources	`~/HealthData/`, podcast RSS	Raw inputs from Garmin, intervals.icu, podcast feeds
2	Ingest	`garmindb_cli.py`, `data-ingestion/`	Pull, normalize, store as SQLite/JSON
3	Transcripts	Modal A10G	Whisper transcription for podcasts
4	Digests	`swap.py`, `swap-podcast/`	Per-episode claim/study extraction via Parallel API
5	kb	`kb/` (DuckDB)	Structured corpus: 9 tables, Voyage embeddings, HNSW
6	Queries	`query.py`, `kb/query.py`	Named queries + semantic search
7	Triage	`kb/{watches,applied,dismissed}.yaml`, `kb/triage.py`, `kb/review.py`	Application queue: which claims connect to which findings
8	UI / Synthesis	`ui/`, `findings/`, `analyses/`	Reading layer + Casey's hand-curated knowledge

Critical files¶

Path	What it does
`warehouse.py`	DuckDB connect(), ATTACHes 4 GarminDB SQLite files, creates `icu_activities` view from CSV cache
`query.py`	8 named queries (threshold-sessions, mpw, long-runs, race-compare, etc.) over the training data
`kb/load.py`	Loads `~/data-ingestion/insights/<show>/*.raw.json` files into the kb DuckDB
`kb/embed.py`	Voyage 3-large 1024d embeddings, content-hash cache, HNSW index
`kb/sync.py`	Orchestrates load + embed; the "rebuild kb" entry point
`kb/query.py`	Named kb queries (semantic, contradictions, topic, etc.)
`kb/migrate.py`	Schema migrations, applied numerically (`kb/migrations/00N_*.sql`)
`kb/paths.py`	Central env-overrideable path registry — use this, never hardcode
`kb/triage.py`	TUI for walking the application queue (curses-based)
`kb/review.py`	Builds the application queue from watches.yaml
`kb/corpus_diff.py`	Snapshot/diff/list — see "did this change actually do anything"
`kb/stale_prompts.py`	Find claims extracted with old prompt versions
`scripts/daily_sync.sh`	7am launchd: garmindb sync + ICU refresh + kb sync + R2 backup + Worker cache + comment scan
`scripts/rebuild_kb.sh`	Deterministic kb rebuild with manifest
`scripts/verify_kb.sh`	Post-rebuild integrity check
`scripts/cache_for_worker.py`	Writes `state/cache/{yesterday,query_cache}.json` to R2 for the Worker
`ui/app.py`	FastAPI + HTMX + Jinja UI
`findings/_active.md`	Workstream state (read by coach subagent on session start)
`findings/_todo.md`	Casey's checklist (surfaced in UI at `/active`)
`completion_log.jsonl`	Strength/strides/sauna log (canonical)
`corrections.jsonl`	Interpretive overlays from Telegram morning-summary replies (see ADR 003)
`races.yaml`	Casey's race history with build summaries
`runs.jsonl`	Stage-level observability for sync/ingest/embed/load
`tests/`	~94 pytest tests (44 kb/load + 21 idempotency + 24 freshness + 5 cache)

Schema reality check¶

The schema is in kb/migrations/*.sql (managed by kb/migrate.py), NOT kb/schema.sql. Migrations are at v4 in production. load.py uses DELETE rather than DROP+CREATE. If you read schema.sql expecting it to be authoritative, you'll be wrong.

Check current migration state:

uv run python kb/migrate.py status

kb scale (as of 2026-05-04)¶

8 podcast feeds: swap (174 main + 118 bonus), real-science-of-sport (97), running-effect (58), letsrun (58), morning-shakeout (23), coffee-club (11), strength-running (5)
544 episodes, 5,555 claims, 472 studies
19 findings (curated synthesis in findings/)
~290 episodes have raw.json on disk (some kb counts differ due to triage filtering at extraction time)

What this system does NOT do¶

❌ Hosted database — everything is local SQLite + DuckDB
❌ Multi-user — single-user system, 127.0.0.1 only (UI exposed via tunnel for Casey himself, not multi-tenant)
❌ Realtime — daily cron cadence, no websockets
❌ Authentication beyond Cloudflare Access PIN — no app-level auth
❌ Mobile — desktop Mac only

systems/data-ingestion.md — the podcast pipeline that feeds the kb
systems/otq-checkin-worker.md — Telegram bot reading/writing from this warehouse via R2
reference/cron-schedules.md — when daily_sync fires
runbooks/daily-sync-failures.md — how to recover when sync breaks
runbooks/kb-rebuild.md — full kb rebuild path
reference/kb-schema.md — table definitions + migration list