ADR 004: Per-source-file pattern for Worker → R2 → laptop merge¶
Status: accepted Date: 2026-05-04 Supersedes: —
Context¶
The OTQCheckinAgent Worker writes Casey-state changes (check-in
completions + corrections) into R2, where daily_sync.sh later
merges them into the canonical local files (completion_log.jsonl,
corrections.jsonl).
Two natural designs:
- Canonical file pattern: Worker GETs
state/latest/completion_log.jsonlfrom R2, appends, PUTs back. Same key for everyone. - Per-source-file pattern: Worker writes
state/<date>/completion_log_worker_<msgId>.jsonl(one file per reply).daily_sync.shglobs + merges on next run.
The canonical pattern is the obvious one. But it has a race condition:
- Two webhooks fire near-simultaneously (Casey replies twice quickly, or two-update Telegram batch)
- Worker A reads
completion_log.jsonlv1 - Worker B reads
completion_log.jsonlv1 (same version) - Worker A appends entry A, PUTs
completion_log.jsonlv2 - Worker B appends entry B to its v1 copy, PUTs
completion_log.jsonlv3 → overwrites Worker A's append
Worker B's PUT silently clobbered Worker A's data. R2's S3 API doesn't
support If-Match etag conditional writes (it does support
If-None-Match: * for create-only, which doesn't help here).
Durable Objects can be used as a serialization point — only one
instance handles all webhooks for the same DO id — so a this.lock
on the R2 read-modify-write would work in theory. But:
- The DO's purpose is state Casey-DO-level, not a global mutex
- DO instances can be evicted; the lock pattern is fragile
- It requires the Worker to wait on R2 round-trips inside the lock, blocking other in-flight webhooks
Decision¶
Use per-source-file pattern. Each Worker write produces a unique
key including the Telegram message_id (which is monotonic and
unique per chat). Names:
state/<date>/completion_log_worker_<msgId>.jsonl— one per check-in replystate/<date>/corrections_<msgId>.jsonl— one per correction reply
daily_sync.sh runs at 7am, globs all state/*/completion_log_worker_*.jsonl
+ all state/*/corrections_*.jsonl, dedups by full-line hash, and
appends new lines to the local canonical files.
Why this works¶
- No race: each writer has a unique key. Two webhooks → two different keys → no overwrite possible.
- Idempotent merge: dedup by full-line hash means re-running the merge is safe (every PUT is the same content; merge is set-union semantics).
- Replayable: if the local merge ever loses data, the R2 keys
are the durable source of truth — re-run
daily_sync.sh(or its merge step manually) and everything reappears. - Audit trail: each file is named with the Telegram message_id that produced it. Trace from R2 file → original Telegram message → action.
Consequences¶
Good: - Worker code stays simple — no locking, no etag handling, no retry logic for write conflicts - daily_sync.sh's merge step is easy to test (deterministic from R2 contents) - Recovery from local-state corruption is trivial (delete + re-merge)
Tradeoffs: - More R2 keys over time (one per reply). At ~1 reply/day, ~365/year. Trivial in practice. Add a lifecycle rule (expire after 7d) once this matters. - Bucket prefix grows linearly. Tooling that lists prefixes might slow down at very high volume. Not relevant at our scale. - Adds a sync-step dependency: until daily_sync.sh runs, the local log lags the R2 source of truth by up to 24h. Acceptable — analyses run after sync, not in realtime.
Where it applies¶
The per-source-file pattern is appropriate when:
- Multiple writers can fire concurrently (Worker + manual + cron)
- A single canonical file is the eventual destination
- Reads of the canonical file can tolerate up-to-1-day lag
- The merge step has a natural single-writer (laptop's daily_sync)
It's NOT appropriate when:
- Only one writer ever exists (just write the canonical file)
- Reads need realtime consistency (use a database, not files)
- The state is large per-write (thousands of files in a prefix becomes a slow listing)
What it does NOT solve¶
- Worker reading the canonical file: when the Worker needs the
current
completion_log.jsonlto decide what to ask about (e.g.runCheckin()looks for already-logged kinds), it reads fromstate/latest/completion_log.jsonlwhichdaily_sync.shwrites during R2 backup. This reflects YESTERDAY's state, not real-time. Acceptable: the Worker's "did you do strength today" question doesn't need to know about a strength log entry made 30 minutes ago — it asks at 8pm, and 8pm-day entries are usually still pending.
References¶
- Code:
~/garmin-warehouse/cloudflare/otq-checkin/src/log.ts—appendEntries()andappendCorrection() - Code:
~/garmin-warehouse/scripts/daily_sync.sh— Worker log merge + corrections merge steps - Related: ADR 003 — the corrections flow that uses this pattern