Skip to content

R2 Bucket Layout

Bucket: garmin-warehouse-data (Cloudflare account a20a70bee90d635ffad79328f3edcd5f).

Top-level prefixes

garmin-warehouse-data/
├── kb/
│   ├── <DATE>/                    # daily snapshot (e.g. 2026-05-04/)
│   │   └── kb.duckdb              # 52MB, daily
│   └── latest/
│       └── kb.duckdb              # symlink-pattern; overwritten each day
├── healthdata/
│   └── <DATE>/                    # Sunday-only
│       └── garmin_activities.db   # 1.1GB, weekly
├── state/
│   ├── <DATE>/                    # daily snapshot
│   │   ├── completion_log.jsonl
│   │   ├── applied.yaml
│   │   ├── dismissed.yaml
│   │   ├── watches.yaml
│   │   ├── completion_log_worker_<msgId>.jsonl  # per check-in reply
│   │   └── corrections_<msgId>.jsonl            # per correction reply
│   ├── latest/                    # rolling head
│   │   ├── completion_log.jsonl
│   │   ├── applied.yaml
│   │   ├── dismissed.yaml
│   │   └── watches.yaml
│   └── cache/
│       ├── yesterday.json         # built by cache_for_worker.py at 7am
│       └── query_cache.json       # ditto

Who writes what

Key pattern Writer When
kb/<DATE>/kb.duckdb + kb/latest/kb.duckdb daily_sync.sh (aws s3 cp) 7am daily
state/<DATE>/{completion_log,applied,dismissed,watches}.{jsonl,yaml} + state/latest/... daily_sync.sh (aws s3 cp) 7am daily
healthdata/<DATE>/garmin_activities.db daily_sync.sh (rclone) 7am Sunday only
state/cache/yesterday.json + state/cache/query_cache.json scripts/cache_for_worker.py (rclone) 7am daily, after worker-log merge
state/<date>/completion_log_worker_<msgId>.jsonl Worker (R2 API) per check-in reply
state/<date>/corrections_<msgId>.jsonl Worker (R2 API) per correction reply

Who reads what

Reader Keys Purpose
Worker state/cache/yesterday.json, state/cache/query_cache.json morning summary + Q&A
Worker state/latest/completion_log.jsonl detect already-logged kinds (skip asking about them at 8pm)
daily_sync.sh state/*/completion_log_worker_*.jsonl merge into local completion_log.jsonl
daily_sync.sh state/*/corrections_*.jsonl merge into local corrections.jsonl
Casey (manual recovery) kb/latest/kb.duckdb, state/latest/* restore after laptop crash

Why per-source files

Per-reply files (completion_log_worker_<msgId>.jsonl) avoid GET-modify-PUT race conditions. If the Worker tried to read the canonical log, append, and write, two concurrent webhooks would clobber each other. Per-message-id files mean the path itself is the unique key; merge happens later, on the laptop, where there's only one writer.

Same pattern for corrections.

Tools

# List a prefix:
rclone ls r2:garmin-warehouse-data/state/

# Read a file:
rclone cat r2:garmin-warehouse-data/state/cache/yesterday.json

# Upload a file:
rclone copy local/file.txt r2:garmin-warehouse-data/path/

# Upload from stdin (used by cache_for_worker.py):
echo "content" | rclone rcat r2:garmin-warehouse-data/path/key.txt

For files >100MB, ALWAYS use rclone, not aws s3 cp. See ADR 001.

Lifecycle / retention

No lifecycle rules configured yet. Cost is ~$0/mo at this scale, so retention is currently "forever." If costs grow, add a rule like:

  • kb/<DATE>/ (not latest/): expire after 30d
  • healthdata/<DATE>/: expire after 4 weeks (keep 4 Sunday snapshots)
  • state/<DATE>/: expire after 90d
  • state/<date>/{completion_log_worker,corrections}_*.jsonl: expire after 7d (laptop has merged them by then)