Skip to content

garmin-warehouse

The largest of Casey's systems. Local-first pipeline at ~/garmin-warehouse/ that ingests training data + research podcasts and produces queryable analytics, a kb, a UI, and synthesis (findings/).

For the canonical layer-by-layer breakdown, see ~/garmin-warehouse/ARCHITECTURE.md. This page is the orientation + system-level "what to know."

Quick orientation

  • Location: ~/garmin-warehouse/ (uv project, Python 3.12)
  • Sister repo: ~/data-ingestion/ (the podcast pipeline that feeds the kb)
  • External data:
  • ~/HealthData/ (GarminDB-managed; 4 SQLite DBs + ICU cache files)
  • ~/.GarminDb/ (GarminDB credentials + session token)
  • Run command for UI: kbui zsh alias → scripts/run_ui.sh → uvicorn at localhost:8765 + cloudflared tunnel
  • Public URL: https://warehouse.caseymanos.com (gated by Cloudflare Access PIN)

The 8 layers

(Summarized from ARCHITECTURE.md; read that for full detail.)

# Layer Where it lives What it does
1 Sources ~/HealthData/, podcast RSS Raw inputs from Garmin, intervals.icu, podcast feeds
2 Ingest garmindb_cli.py, data-ingestion/ Pull, normalize, store as SQLite/JSON
3 Transcripts Modal A10G Whisper transcription for podcasts
4 Digests swap.py, swap-podcast/ Per-episode claim/study extraction via Parallel API
5 kb kb/ (DuckDB) Structured corpus: 9 tables, Voyage embeddings, HNSW
6 Queries query.py, kb/query.py Named queries + semantic search
7 Triage kb/{watches,applied,dismissed}.yaml, kb/triage.py, kb/review.py Application queue: which claims connect to which findings
8 UI / Synthesis ui/, findings/, analyses/ Reading layer + Casey's hand-curated knowledge

Critical files

Path What it does
warehouse.py DuckDB connect(), ATTACHes 4 GarminDB SQLite files, creates icu_activities view from CSV cache
query.py 8 named queries (threshold-sessions, mpw, long-runs, race-compare, etc.) over the training data
kb/load.py Loads ~/data-ingestion/insights/<show>/*.raw.json files into the kb DuckDB
kb/embed.py Voyage 3-large 1024d embeddings, content-hash cache, HNSW index
kb/sync.py Orchestrates load + embed; the "rebuild kb" entry point
kb/query.py Named kb queries (semantic, contradictions, topic, etc.)
kb/migrate.py Schema migrations, applied numerically (kb/migrations/00N_*.sql)
kb/paths.py Central env-overrideable path registry — use this, never hardcode
kb/triage.py TUI for walking the application queue (curses-based)
kb/review.py Builds the application queue from watches.yaml
kb/corpus_diff.py Snapshot/diff/list — see "did this change actually do anything"
kb/stale_prompts.py Find claims extracted with old prompt versions
scripts/daily_sync.sh 7am launchd: garmindb sync + ICU refresh + kb sync + R2 backup + Worker cache + comment scan
scripts/rebuild_kb.sh Deterministic kb rebuild with manifest
scripts/verify_kb.sh Post-rebuild integrity check
scripts/cache_for_worker.py Writes state/cache/{yesterday,query_cache}.json to R2 for the Worker
ui/app.py FastAPI + HTMX + Jinja UI
findings/_active.md Workstream state (read by coach subagent on session start)
findings/_todo.md Casey's checklist (surfaced in UI at /active)
completion_log.jsonl Strength/strides/sauna log (canonical)
corrections.jsonl Interpretive overlays from Telegram morning-summary replies (see ADR 003)
races.yaml Casey's race history with build summaries
runs.jsonl Stage-level observability for sync/ingest/embed/load
tests/ ~94 pytest tests (44 kb/load + 21 idempotency + 24 freshness + 5 cache)

Schema reality check

The schema is in kb/migrations/*.sql (managed by kb/migrate.py), NOT kb/schema.sql. Migrations are at v4 in production. load.py uses DELETE rather than DROP+CREATE. If you read schema.sql expecting it to be authoritative, you'll be wrong.

Check current migration state:

uv run python kb/migrate.py status

kb scale (as of 2026-05-04)

  • 8 podcast feeds: swap (174 main + 118 bonus), real-science-of-sport (97), running-effect (58), letsrun (58), morning-shakeout (23), coffee-club (11), strength-running (5)
  • 544 episodes, 5,555 claims, 472 studies
  • 19 findings (curated synthesis in findings/)
  • ~290 episodes have raw.json on disk (some kb counts differ due to triage filtering at extraction time)

What this system does NOT do

  • ❌ Hosted database — everything is local SQLite + DuckDB
  • ❌ Multi-user — single-user system, 127.0.0.1 only (UI exposed via tunnel for Casey himself, not multi-tenant)
  • ❌ Realtime — daily cron cadence, no websockets
  • ❌ Authentication beyond Cloudflare Access PIN — no app-level auth
  • ❌ Mobile — desktop Mac only