Skip to content

Runbook: daily_sync.sh failures

~/garmin-warehouse/scripts/daily_sync.sh runs at 7am daily via launchd. On failure, you get an email (Resend) + sometimes Telegram. The email body includes a per-pattern runbook generated from the failure. This page is the offline reference.

Stages, in order

quiet-hours guard (5am-11pm; FORCE_RUN=1 to override)
garmindb (Garmin Connect download + import)
pace cache backfill (icu_activities update)
intervals.icu refresh (icu_wellness.json)
kb/sync (load + embed; uses Voyage)
R2 backup (kb.duckdb daily, GarminDB Sundays, state daily)
Worker log merge (pulls completion_log_worker_*.jsonl from R2)
Worker corrections merge (pulls corrections_*.jsonl from R2)
cache_for_worker.py (writes yesterday.json + query_cache.json to R2)
comment_scan (Strava → intervals comments → completion_log)
notify (email always, TG on success/first-failure)

FAILURES[] is appended to whenever a stage exits non-zero. Email SUMMARY shows per-stage status (ok / partial / no-key / failed (exit N)).

Recovery by failure pattern

Garmin SSO session expired

Symptom: garmindb stage fails with "garth session expired" or similar.

cd ~/garmin-warehouse
uv run garmindb_cli.py --all --download --import --analyze --latest
# Will prompt for Garmin Connect username/password.
# Verify ~/.GarminDb/garth_session was rewritten:
ls -la ~/.GarminDb/garth_session
# Next daily_sync run resumes automatically.

intervals.icu 401

Symptom: refresh-icu stage fails with 401.

# Verify env var exists:
grep INTERVALS_ICU_KEY ~/.zshrc
# Manual refresh:
cd ~/garmin-warehouse && uv run python warehouse.py refresh-icu
# If still 401:
#   1. https://intervals.icu/settings#api
#   2. Regenerate API key
#   3. Update ~/.zshrc, source it

kb/sync failure (Voyage embedding)

Symptom: kb/sync stage fails. Usually quota or network.

cd ~/garmin-warehouse && uv run python kb/sync.py
# Check Voyage quota: https://dash.voyageai.com/
# Free tier: 50M tokens/mo, resets monthly.
# If quota: wait or upgrade.
# If network: retry tomorrow.
# UI's /search will return stale results until embeddings catch up.

R2 backup partial / failed

Symptom: R2_BACKUP_STATUS=partial in summary email.

Check the daily_sync.log for the specific R2 key that failed. Common causes:

  1. AWS CLI SSL error during multipart on a 1GB+ file:

    SSLV3_ALERT_BAD_RECORD_MAC
    
    Fix: the script already uses rclone for the GarminDB upload. If another stage is using aws s3 cp, switch it to rclone. See ADR 001.

  2. R2 token revoked or expired:

    # Test:
    rclone ls r2:garmin-warehouse-data/state/latest/
    # If 401/403: regenerate token in CF dashboard, update ~/.zshrc
    

  3. rclone: command not found:

    brew install rclone
    

cache_for_worker.py failure

Symptom: worker cache: failed (exit N).

# Run manually with verbose:
cd ~/garmin-warehouse && uv run python scripts/cache_for_worker.py
# Common cause: ICU API hiccup (it's read in the script for today's
# scheduled events). Retry usually works.
# If R2 upload fails: check rclone, see "R2 backup partial" above.

If cache_for_worker fails, the next morning's 8:30am Worker summary will say "no yesterday.json in R2" — annoying but not data-losing.

comment_scan failure

Symptom: comment_scan: failed (exit N).

Low impact — comment_scan extracts strength/strides/sauna mentions from Strava activity descriptions into completion_log.jsonl. Casey's manual logging (and now the Worker check-in) backfills the same data. Skip until next run.

cd ~/garmin-warehouse && uv run python scanners/comment_scan.py --days 3

When the whole sync didn't run

Symptom: no email at 7am, no log file with today's date.

# Was the Mac awake?
pmset -g log | grep -E "Wake|Sleep" | head -20

# Did launchd think it should fire?
launchctl list | grep com.casey.warehouse-sync

# Force a run:
FORCE_RUN=1 ~/garmin-warehouse/scripts/daily_sync.sh

# If launchd unloaded:
launchctl load ~/Library/LaunchAgents/com.casey.warehouse-sync.plist

When it ran but did nothing

daily_sync.sh has a 5am-11pm quiet-hours guard. If launchd did its "missed-wake recovery" and woke the Mac at 1am to fire the job, the script bails immediately (correct behavior — don't kick off Garmin downloads at 1am). Override with FORCE_RUN=1.

Logs

  • Per-run: ~/garmin-warehouse/logs/daily_sync.log (overwritten each run; the email has the relevant excerpts)
  • Stage observability: ~/garmin-warehouse/runs.jsonl (jsonl one row per stage, with cost + duration)
  • Quick summary:
    uv run python ~/garmin-warehouse/scripts/run_log_summary.py