Runbook: data-ingestion failures¶
~/data-ingestion/scripts/podcast-sync.sh runs Sun 6am PT via launchd.
On failure, you get an email (Resend, podcast@updates.caseymanos.com)
+ Telegram. This page is the offline reference.
Pipeline stages¶
RSS feed pull
↓
triage.py (LLM relevance scoring, 1-5)
↓ reject if < cutoff
modal_transcribe.py (Whisper-large on A10G)
↓
research.py (Parallel API: claims + study resolution)
↓
<ep>.raw.json + <ep>.raw.meta.json sidecar
↓
swap.py (compile per-ep digest)
↓
notify.py (Resend + Telegram)
Sun 6am cron silent¶
Symptom: Sunday morning, no podcast-sync email or TG message.
# Was the Mac awake at 6am Sun?
pmset -g log | grep -E "Wake|Sleep" | grep "$(date -v-Sun '+%Y-%m-%d')" | head
# Did launchd think it should fire?
launchctl list | grep com.casey.podcast-sync
# Manual run:
~/data-ingestion/scripts/podcast-sync.sh
If launchd unloaded:
Modal job stuck or failing¶
Symptom: Pipeline log shows transcription started but never completed. Modal dashboard shows running container.
# Check Modal status:
modal app list | grep transcribe
# Kill stuck container:
modal app stop <app-id>
Common causes:
1. Container OOM on a long episode — Whisper-large can spike past
the 16GB allocation on 2hr+ episodes. Fix: split the ep, or bump
the container memory in modal_transcribe.py
2. Modal quota hit — unlikely at our volume; check dashboard
3. A10G unavailable — Modal will queue; usually resolves within
minutes
Parallel API task hung¶
Symptom: Logs show Parallel Task started but monitor_setup.py
poll never reports completion.
# What's pending:
uv run python ~/data-ingestion/find_misses.py
# Retry stuck/missed eps:
uv run python ~/data-ingestion/retry_misses.py
Note: find_misses checks <ep>.raw.json is complete (_raw_json_is_complete()
guard) — partial files don't count as success. The retry path is
crash-safe due to atomic-write idempotency (see IDEMPOTENCY.md in
data-ingestion).
Triage rejected something I wanted¶
Symptom: An episode you expected to ingest is missing from
insights/<show>/. Check triage logs.
Fix: either lower the show's relevance cutoff in
~/data-ingestion/podcasts/<show>.yaml, or force-include via
uv run python ~/data-ingestion/batch.py --show <show> --episode <ep-id> --force.
raw.json incomplete / partially extracted¶
Symptom: <ep>.raw.json exists but kb/load.py skips it or
loads with 0 claims.
# Check completeness:
python3 -c "
import json
with open('<ep>.raw.json') as f:
d = json.load(f)
print(f'episode: {d.get(\"episode_id\")}')
print(f'claims: {len(d.get(\"claims\", []))}')
print(f'studies: {len(d.get(\"studies\", []))}')
print(f'_complete: {d.get(\"_raw_json_is_complete\", \"NOT SET\")}')
"
If _raw_json_is_complete is False or missing:
Letsrun extraction returned 0 claims¶
Symptom: A Letsrun thread gets ingested but produces an empty or near-empty raw.json.
Cause: Default podcast prompt drops Q&A / quoted content silently. Letsrun is forum-shaped (mostly quotes), so the default prompt strips nearly everything.
Fix: Use forum mode.
See feedback memory.
Quota exhausted¶
Symptom: Anthropic 429s, Voyage 429s, or Parallel quota error.
Anthropic: Tier-based; we don't usually hit. If we do, wait a minute and retry. Check current usage at console.anthropic.com.
Voyage: Free tier 50M tokens/mo. Resets monthly. If exhausted:
- Wait until next month, OR
- Upgrade tier, OR
- Run kb/sync.py with embedding skipped (claims still load, just no
semantic search until next reset)
Parallel: Cost is volume-driven. Check the per-show 25-ep cap is enforced.
Ep transcribed but no claims¶
Symptom: Modal succeeded, transcript exists, but research.py produced 0 claims.
Likely either: 1. Episode genuinely had no extractable claims (Q&A-only ep) → triage cutoff might need raising 2. Prompt is failing for this content shape → run with verbose:
3. Forum mode needed (Letsrun) → see abovekb/sync didn't pick up new eps¶
Symptom: <ep>.raw.json exists with claims but kb/kb.duckdb
doesn't show them after kb/sync.py.
Check that:
1. The show's YAML exists in ~/data-ingestion/podcasts/
2. The episode-id-prefix → show registry includes the show
3. _raw_json_is_complete() is True on the file
If still missing, full rebuild:
Notification didn't arrive¶
If the pipeline ran successfully but no TG/email:
# Check delivery log:
tail ~/garmin-warehouse/logs/notifications.jsonl
# Test send:
# UI at /notifications has "test send" buttons
# Or:
uv run python -c "
from data_ingestion.notify import send_telegram, send_email
send_telegram('test', source='test')
send_email('test', 'body', source='test')
"
Logs¶
- Per-run:
~/data-ingestion/logs/<DATE>-<run-id>.log - Stage observability:
~/garmin-warehouse/runs.jsonl(cross-pipeline) - Notification delivery:
~/garmin-warehouse/logs/notifications.jsonl
Related¶
systems/data-ingestion.mdreference/cron-schedules.mdreference/external-apis.mdrunbooks/kb-rebuild.md— for full kb rebuild