ADR 005: kb migrations are source of truth, not schema.sql¶
Status: accepted Date: 2026-04-30 (formalized in this ADR 2026-05-04) Supersedes: —
Context¶
Early in the kb's life, schema lived as a single kb/schema.sql file
with all CREATE TABLE statements. kb/load.py would DROP TABLE
... CASCADE; CREATE TABLE ... from schema.sql at the start of every
load. This worked when:
- Casey was the only user of the kb
- Rebuilds were cheap (small corpus)
- No data state needed preserving across loads
But three things broke that model:
- Triage state matters. As the corpus grew and Casey curated
applied.yaml/dismissed.yaml, dropping + recreating tables meantkb/load.pywould briefly produce a kb with no rows, then reload from raw.json. This was harmless because the YAMLs survived independently — but during the brief window the UI would show empty results. - Embedding cost. Re-embedding 5,555 claims after every load is
wasteful even with content-hash caching. We want the load to
update the structured tables without disturbing
embeddings_claimrows whose content hasn't changed. - Schema evolution. Adding a column (e.g.
claim_prompt_versionin migration 002) with the schema.sql + DROP+CREATE pattern means any existing per-claim metadata (like ad-hoc notes) gets lost. Migrations preserve data; DROP+CREATE doesn't.
Decision¶
Schema is in kb/migrations/*.sql (managed by kb/migrate.py).
kb/schema.sqlstays as a historical/seed-only reference. It's the schema as of v1, no longer authoritative.kb/migrate.pyreadskb/migrations/00N_*.sqlfiles in order, applies the ones that aren't already in the_migrationsmetadata table, records each application.kb/load.pyusesDELETE(notDROP+CREATE) to clear rows for re-load. Schema is unchanged across loads.- All schema changes go through a numbered migration. v4 is current (as of 2026-05-04).
Migration history¶
| # | Description | Why |
|---|---|---|
| 001 | Initial schema | Bootstrap |
| 002 | Add claim_prompt_version column to claims |
Track which prompt version produced each claim, enable kb/stale_prompts.py |
| 003 | Add content_hash to embedding tables; backfill |
Skip-if-unchanged embedding cache (Voyage cost) |
| 004 | Episode-id normalization (strip leading zeros) | Ep-id swap-ep-007 → swap-ep-7 for cross-show consistency. 60 legacy rows preserved due to FK constraints. |
Consequences¶
Good: - Data state preserved across loads — UI never shows mid-load empty state - Embedding cache becomes possible (migration 003 enabled it) - Schema evolution doesn't lose data - Migration history is git-tracked + reproducible - New collaborators (or future Casey) read migrations in order to understand schema evolution
Tradeoffs:
- One more concept to learn ("schema lives in migrations dir")
- schema.sql is now misleading if you don't know it's historical —
hence this ADR + the reference/kb-schema.md warning
- Every new schema change requires writing a migration file (small
friction, but the right kind)
How to add a new migration¶
# 1. Write the migration file:
$EDITOR kb/migrations/005_add_my_feature.sql
# Example:
# ALTER TABLE claims ADD COLUMN reviewed BOOLEAN DEFAULT FALSE;
# CREATE INDEX claims_reviewed_idx ON claims (reviewed) WHERE reviewed;
# 2. Apply (idempotent):
uv run python kb/migrate.py up
# 3. Check status:
uv run python kb/migrate.py status
# 4. Commit:
git add kb/migrations/005_add_my_feature.sql
git commit -m "kb: migration 005 — add reviewed flag"
Future-proofing¶
If migrations become sufficiently complex (data-mutating, multi-step, multi-table), graduate to a real migration framework (alembic-style). For now, raw SQL files + a metadata table are sufficient.
What this implies for code¶
- Don't ever
DROP TABLEoutside a migration. Period. load.pyDELETEis the right pattern for "clear and re-import."- New analysis code should
SELECTthe columns it needs, not assume schema layout by readingschema.sql. - When
verify_kb.shchecks integrity, it compares against the applied migrations, not againstschema.sql.
Tooling reminders¶
# Where am I?
uv run python kb/migrate.py status
# Apply pending:
uv run python kb/migrate.py up
# Inspect what's in the kb:
uv tool run harlequin --adapter duckdb ~/garmin-warehouse/kb/kb.duckdb
References¶
- Code:
~/garmin-warehouse/kb/migrate.py - Code:
~/garmin-warehouse/kb/migrations/ - Doc:
reference/kb-schema.md - Memory:
warehouse_hardening_2026_05.md