kb Schema¶

The kb is at ~/garmin-warehouse/kb/kb.duckdb (DuckDB). 9 tables. Schema is managed by kb/migrate.py via numbered migrations in kb/migrations/*.sql, currently at v4 in production.

Critical: schema.sql is NOT authoritative¶

kb/schema.sql exists but is the historical/seed-only definition. The actual schema is whatever kb/migrations/*.sql has applied. To check current state:

uv run python kb/migrate.py status

kb/load.py uses DELETE rather than DROP+CREATE so loads don't need migration tags.

Tables¶

Core content¶

Table	Rows (2026-05-04)	Purpose
`episodes`	544	One row per podcast episode. Has `feed` column for multi-show disambiguation.
`claims`	5,555	Extracted claims, linked to episode. Has `claim_prompt_version` for stale-prompt detection.
`studies`	472	Resolved citations (PMID/DOI). Many-to-many with claims via `claim_studies`.
`claim_studies`	—	Junction table.
`findings`	19	Curated synthesis topics (mirrors `findings/*.md` files).
`finding_topics`	—	Topic chips per finding.
`xref`	—	Cross-references. `dst_kind` distinguishes finding↔finding from finding→study.

Embeddings¶

Table	Rows	Purpose
`embeddings_claim`	5,555	Voyage 3-large 1024d vectors per claim, indexed via HNSW
`embeddings_finding`	19	Same for findings

Embedding cache keyed by content_hash (migration 003). If a claim's text didn't change, re-embedding is a no-op.

Migration history¶

(Each migration in kb/migrations/ is numbered and applied in order. Current: v4.)

001 — initial schema
002 — added claim_prompt_version column to claims
003 — added content_hash-keyed embedding cache
004 — episode-id normalization (strip leading zeros). Legacy 60 rows preserved because DuckDB FK constraints prevent in-place rename.

To add a new migration: write kb/migrations/00N_description.sql, run uv run python kb/migrate.py up. The migrate.py records applied versions in a metadata table.

Key columns to know¶

`episodes`¶

id — like swap-ep-174, letsrun-thread-12345 (multi-source-aware)
feed — the source show identifier. Use SELECT DISTINCT feed FROM episodes to populate filter dropdowns
digest_path — file path for "open in editor" buttons
pub_date — for "show me X from the last month" queries

`claims`¶

episode_id — FK to episodes.id
kind — claim taxonomy
host_stance — endorses / skeptical / debating
otq_relevance — 0–5 score
claim_prompt_version — for finding stale claims after prompt edits

`findings`¶

id — matches the markdown filename stem
title — first H1 of the markdown body
study_links — count cached for graph node sizing

`xref`¶

src — finding ID
dst — finding ID or study PMID
dst_kind — finding | study

Triage state (NOT in DuckDB)¶

The triage state is in YAML files at ~/garmin-warehouse/kb/, not in the kb itself:

watches.yaml — declarative: "for finding X, watch for claims matching pattern Y"
applied.yaml — append-only: "this claim got applied to this finding on this date"
dismissed.yaml — append-only: "this claim is not relevant to this finding"

kb/review.py builds the application queue by joining watches.yaml against claims + embeddings_claim and excluding what's in applied.yaml/dismissed.yaml.

This pattern means a kb rebuild doesn't lose triage state — YAML files are the durable journal. The DuckDB is rebuildable; the YAMLs are not.

Useful queries¶

-- Claims per show:
SELECT feed, COUNT(*) FROM claims c JOIN episodes e ON c.episode_id = e.id
GROUP BY feed ORDER BY 2 DESC;

-- Stale-prompt claims (claims extracted with old prompts):
SELECT episode_id, COUNT(*)
FROM claims
WHERE claim_prompt_version < (SELECT MAX(claim_prompt_version) FROM claims)
GROUP BY episode_id;

-- Findings with most study links:
SELECT id, title, study_links FROM findings ORDER BY study_links DESC LIMIT 10;

-- Cross-references for a finding:
SELECT dst, dst_kind FROM xref WHERE src = 'bloodwork_baseline';

For ad-hoc exploration use Harlequin (already installed via uv tool install):

uv tool run harlequin --adapter duckdb ~/garmin-warehouse/kb/kb.duckdb

systems/garmin-warehouse.md
runbooks/kb-rebuild.md
Memory: warehouse_hardening_2026_05.md
Source: ~/garmin-warehouse/ARCHITECTURE.md