Skip to content

Casey System Docs

R2 Bucket Layout

R2 Bucket Layout¶

Bucket: garmin-warehouse-data (Cloudflare account a20a70bee90d635ffad79328f3edcd5f).

Top-level prefixes¶

garmin-warehouse-data/
├── kb/
│   ├── <DATE>/                    # daily snapshot (e.g. 2026-05-04/)
│   │   └── kb.duckdb              # 52MB, daily
│   └── latest/
│       └── kb.duckdb              # symlink-pattern; overwritten each day
│
├── healthdata/
│   └── <DATE>/                    # Sunday-only
│       └── garmin_activities.db   # 1.1GB, weekly
│
├── state/
│   ├── <DATE>/                    # daily snapshot
│   │   ├── completion_log.jsonl
│   │   ├── applied.yaml
│   │   ├── dismissed.yaml
│   │   ├── watches.yaml
│   │   ├── completion_log_worker_<msgId>.jsonl  # per check-in reply
│   │   └── corrections_<msgId>.jsonl            # per correction reply
│   ├── latest/                    # rolling head
│   │   ├── completion_log.jsonl
│   │   ├── applied.yaml
│   │   ├── dismissed.yaml
│   │   └── watches.yaml
│   └── cache/
│       ├── yesterday.json         # built by cache_for_worker.py at 7am
│       └── query_cache.json       # ditto

Who writes what¶

Key pattern	Writer	When
`kb/<DATE>/kb.duckdb` + `kb/latest/kb.duckdb`	`daily_sync.sh` (aws s3 cp)	7am daily
`state/<DATE>/{completion_log,applied,dismissed,watches}.{jsonl,yaml}` + `state/latest/...`	`daily_sync.sh` (aws s3 cp)	7am daily
`healthdata/<DATE>/garmin_activities.db`	`daily_sync.sh` (rclone)	7am Sunday only
`state/cache/yesterday.json` + `state/cache/query_cache.json`	`scripts/cache_for_worker.py` (rclone)	7am daily, after worker-log merge
`state/<date>/completion_log_worker_<msgId>.jsonl`	Worker (R2 API)	per check-in reply
`state/<date>/corrections_<msgId>.jsonl`	Worker (R2 API)	per correction reply

Who reads what¶

Reader	Keys	Purpose
Worker	`state/cache/yesterday.json`, `state/cache/query_cache.json`	morning summary + Q&A
Worker	`state/latest/completion_log.jsonl`	detect already-logged kinds (skip asking about them at 8pm)
`daily_sync.sh`	`state//completion_log_worker_.jsonl`	merge into local `completion_log.jsonl`
`daily_sync.sh`	`state//corrections_.jsonl`	merge into local `corrections.jsonl`
Casey (manual recovery)	`kb/latest/kb.duckdb`, `state/latest/*`	restore after laptop crash

Why per-source files¶

Per-reply files (completion_log_worker_<msgId>.jsonl) avoid GET-modify-PUT race conditions. If the Worker tried to read the canonical log, append, and write, two concurrent webhooks would clobber each other. Per-message-id files mean the path itself is the unique key; merge happens later, on the laptop, where there's only one writer.

Same pattern for corrections.

Tools¶

# List a prefix:
rclone ls r2:garmin-warehouse-data/state/

# Read a file:
rclone cat r2:garmin-warehouse-data/state/cache/yesterday.json

# Upload a file:
rclone copy local/file.txt r2:garmin-warehouse-data/path/

# Upload from stdin (used by cache_for_worker.py):
echo "content" | rclone rcat r2:garmin-warehouse-data/path/key.txt

For files >100MB, ALWAYS use rclone, not aws s3 cp. See ADR 001.

Lifecycle / retention¶

No lifecycle rules configured yet. Cost is ~$0/mo at this scale, so retention is currently "forever." If costs grow, add a rule like:

kb/<DATE>/ (not latest/): expire after 30d
healthdata/<DATE>/: expire after 4 weeks (keep 4 Sunday snapshots)
state/<DATE>/: expire after 90d
state/<date>/{completion_log_worker,corrections}_*.jsonl: expire after 7d (laptop has merged them by then)