Skip to content

Runbook: data-ingestion failures

~/data-ingestion/scripts/podcast-sync.sh runs Sun 6am PT via launchd. On failure, you get an email (Resend, podcast@updates.caseymanos.com) + Telegram. This page is the offline reference.

Pipeline stages

RSS feed pull
triage.py (LLM relevance scoring, 1-5)
    ↓ reject if < cutoff
modal_transcribe.py (Whisper-large on A10G)
research.py (Parallel API: claims + study resolution)
<ep>.raw.json + <ep>.raw.meta.json sidecar
swap.py (compile per-ep digest)
notify.py (Resend + Telegram)

Sun 6am cron silent

Symptom: Sunday morning, no podcast-sync email or TG message.

# Was the Mac awake at 6am Sun?
pmset -g log | grep -E "Wake|Sleep" | grep "$(date -v-Sun '+%Y-%m-%d')" | head

# Did launchd think it should fire?
launchctl list | grep com.casey.podcast-sync

# Manual run:
~/data-ingestion/scripts/podcast-sync.sh

If launchd unloaded:

launchctl load ~/Library/LaunchAgents/com.casey.podcast-sync.plist

Symptom: Pipeline log shows transcription started but never completed. Modal dashboard shows running container.

# Check Modal status:
modal app list | grep transcribe

# Kill stuck container:
modal app stop <app-id>

Common causes: 1. Container OOM on a long episode — Whisper-large can spike past the 16GB allocation on 2hr+ episodes. Fix: split the ep, or bump the container memory in modal_transcribe.py 2. Modal quota hit — unlikely at our volume; check dashboard 3. A10G unavailable — Modal will queue; usually resolves within minutes

Parallel API task hung

Symptom: Logs show Parallel Task started but monitor_setup.py poll never reports completion.

# What's pending:
uv run python ~/data-ingestion/find_misses.py

# Retry stuck/missed eps:
uv run python ~/data-ingestion/retry_misses.py

Note: find_misses checks <ep>.raw.json is complete (_raw_json_is_complete() guard) — partial files don't count as success. The retry path is crash-safe due to atomic-write idempotency (see IDEMPOTENCY.md in data-ingestion).

Triage rejected something I wanted

Symptom: An episode you expected to ingest is missing from insights/<show>/. Check triage logs.

# Triage scoring history:
grep "<ep-id>" ~/data-ingestion/manifests/triage-history.jsonl

Fix: either lower the show's relevance cutoff in ~/data-ingestion/podcasts/<show>.yaml, or force-include via uv run python ~/data-ingestion/batch.py --show <show> --episode <ep-id> --force.

raw.json incomplete / partially extracted

Symptom: <ep>.raw.json exists but kb/load.py skips it or loads with 0 claims.

# Check completeness:
python3 -c "
import json
with open('<ep>.raw.json') as f:
    d = json.load(f)
print(f'episode: {d.get(\"episode_id\")}')
print(f'claims: {len(d.get(\"claims\", []))}')
print(f'studies: {len(d.get(\"studies\", []))}')
print(f'_complete: {d.get(\"_raw_json_is_complete\", \"NOT SET\")}')
"

If _raw_json_is_complete is False or missing:

# Re-extract the episode:
cd ~/data-ingestion
uv run python research.py --episode <ep-id> --force

Letsrun extraction returned 0 claims

Symptom: A Letsrun thread gets ingested but produces an empty or near-empty raw.json.

Cause: Default podcast prompt drops Q&A / quoted content silently. Letsrun is forum-shaped (mostly quotes), so the default prompt strips nearly everything.

Fix: Use forum mode.

uv run python ~/data-ingestion/research.py \
  --episode <thread-id> --mode forum --force

See feedback memory.

Quota exhausted

Symptom: Anthropic 429s, Voyage 429s, or Parallel quota error.

Anthropic: Tier-based; we don't usually hit. If we do, wait a minute and retry. Check current usage at console.anthropic.com.

Voyage: Free tier 50M tokens/mo. Resets monthly. If exhausted: - Wait until next month, OR - Upgrade tier, OR - Run kb/sync.py with embedding skipped (claims still load, just no semantic search until next reset)

Parallel: Cost is volume-driven. Check the per-show 25-ep cap is enforced.

Ep transcribed but no claims

Symptom: Modal succeeded, transcript exists, but research.py produced 0 claims.

Likely either: 1. Episode genuinely had no extractable claims (Q&A-only ep) → triage cutoff might need raising 2. Prompt is failing for this content shape → run with verbose:

uv run python ~/data-ingestion/research.py --episode <ep> --verbose
3. Forum mode needed (Letsrun) → see above

kb/sync didn't pick up new eps

Symptom: <ep>.raw.json exists with claims but kb/kb.duckdb doesn't show them after kb/sync.py.

Check that: 1. The show's YAML exists in ~/data-ingestion/podcasts/ 2. The episode-id-prefix → show registry includes the show 3. _raw_json_is_complete() is True on the file

# Force re-load (not full rebuild):
uv run python ~/garmin-warehouse/kb/sync.py --force

If still missing, full rebuild:

~/garmin-warehouse/scripts/rebuild_kb.sh

Notification didn't arrive

If the pipeline ran successfully but no TG/email:

# Check delivery log:
tail ~/garmin-warehouse/logs/notifications.jsonl

# Test send:
# UI at /notifications has "test send" buttons
# Or:
uv run python -c "
from data_ingestion.notify import send_telegram, send_email
send_telegram('test', source='test')
send_email('test', 'body', source='test')
"

Logs

  • Per-run: ~/data-ingestion/logs/<DATE>-<run-id>.log
  • Stage observability: ~/garmin-warehouse/runs.jsonl (cross-pipeline)
  • Notification delivery: ~/garmin-warehouse/logs/notifications.jsonl