R2 Bucket Layout¶
Bucket: garmin-warehouse-data (Cloudflare account
a20a70bee90d635ffad79328f3edcd5f).
Top-level prefixes¶
garmin-warehouse-data/
├── kb/
│ ├── <DATE>/ # daily snapshot (e.g. 2026-05-04/)
│ │ └── kb.duckdb # 52MB, daily
│ └── latest/
│ └── kb.duckdb # symlink-pattern; overwritten each day
│
├── healthdata/
│ └── <DATE>/ # Sunday-only
│ └── garmin_activities.db # 1.1GB, weekly
│
├── state/
│ ├── <DATE>/ # daily snapshot
│ │ ├── completion_log.jsonl
│ │ ├── applied.yaml
│ │ ├── dismissed.yaml
│ │ ├── watches.yaml
│ │ ├── completion_log_worker_<msgId>.jsonl # per check-in reply
│ │ └── corrections_<msgId>.jsonl # per correction reply
│ ├── latest/ # rolling head
│ │ ├── completion_log.jsonl
│ │ ├── applied.yaml
│ │ ├── dismissed.yaml
│ │ └── watches.yaml
│ └── cache/
│ ├── yesterday.json # built by cache_for_worker.py at 7am
│ └── query_cache.json # ditto
Who writes what¶
| Key pattern | Writer | When |
|---|---|---|
kb/<DATE>/kb.duckdb + kb/latest/kb.duckdb |
daily_sync.sh (aws s3 cp) |
7am daily |
state/<DATE>/{completion_log,applied,dismissed,watches}.{jsonl,yaml} + state/latest/... |
daily_sync.sh (aws s3 cp) |
7am daily |
healthdata/<DATE>/garmin_activities.db |
daily_sync.sh (rclone) |
7am Sunday only |
state/cache/yesterday.json + state/cache/query_cache.json |
scripts/cache_for_worker.py (rclone) |
7am daily, after worker-log merge |
state/<date>/completion_log_worker_<msgId>.jsonl |
Worker (R2 API) | per check-in reply |
state/<date>/corrections_<msgId>.jsonl |
Worker (R2 API) | per correction reply |
Who reads what¶
| Reader | Keys | Purpose |
|---|---|---|
| Worker | state/cache/yesterday.json, state/cache/query_cache.json |
morning summary + Q&A |
| Worker | state/latest/completion_log.jsonl |
detect already-logged kinds (skip asking about them at 8pm) |
daily_sync.sh |
state/*/completion_log_worker_*.jsonl |
merge into local completion_log.jsonl |
daily_sync.sh |
state/*/corrections_*.jsonl |
merge into local corrections.jsonl |
| Casey (manual recovery) | kb/latest/kb.duckdb, state/latest/* |
restore after laptop crash |
Why per-source files¶
Per-reply files (completion_log_worker_<msgId>.jsonl) avoid
GET-modify-PUT race conditions. If the Worker tried to read the
canonical log, append, and write, two concurrent webhooks would
clobber each other. Per-message-id files mean the path itself is the
unique key; merge happens later, on the laptop, where there's only
one writer.
Same pattern for corrections.
Tools¶
# List a prefix:
rclone ls r2:garmin-warehouse-data/state/
# Read a file:
rclone cat r2:garmin-warehouse-data/state/cache/yesterday.json
# Upload a file:
rclone copy local/file.txt r2:garmin-warehouse-data/path/
# Upload from stdin (used by cache_for_worker.py):
echo "content" | rclone rcat r2:garmin-warehouse-data/path/key.txt
For files >100MB, ALWAYS use rclone, not aws s3 cp. See ADR 001.
Lifecycle / retention¶
No lifecycle rules configured yet. Cost is ~$0/mo at this scale, so retention is currently "forever." If costs grow, add a rule like:
kb/<DATE>/(notlatest/): expire after 30dhealthdata/<DATE>/: expire after 4 weeks (keep 4 Sunday snapshots)state/<DATE>/: expire after 90dstate/<date>/{completion_log_worker,corrections}_*.jsonl: expire after 7d (laptop has merged them by then)