Skip to content

ADR 004: Per-source-file pattern for Worker → R2 → laptop merge

Status: accepted Date: 2026-05-04 Supersedes: —

Context

The OTQCheckinAgent Worker writes Casey-state changes (check-in completions + corrections) into R2, where daily_sync.sh later merges them into the canonical local files (completion_log.jsonl, corrections.jsonl).

Two natural designs:

  1. Canonical file pattern: Worker GETs state/latest/completion_log.jsonl from R2, appends, PUTs back. Same key for everyone.
  2. Per-source-file pattern: Worker writes state/<date>/completion_log_worker_<msgId>.jsonl (one file per reply). daily_sync.sh globs + merges on next run.

The canonical pattern is the obvious one. But it has a race condition:

  • Two webhooks fire near-simultaneously (Casey replies twice quickly, or two-update Telegram batch)
  • Worker A reads completion_log.jsonl v1
  • Worker B reads completion_log.jsonl v1 (same version)
  • Worker A appends entry A, PUTs completion_log.jsonl v2
  • Worker B appends entry B to its v1 copy, PUTs completion_log.jsonl v3 → overwrites Worker A's append

Worker B's PUT silently clobbered Worker A's data. R2's S3 API doesn't support If-Match etag conditional writes (it does support If-None-Match: * for create-only, which doesn't help here).

Durable Objects can be used as a serialization point — only one instance handles all webhooks for the same DO id — so a this.lock on the R2 read-modify-write would work in theory. But:

  • The DO's purpose is state Casey-DO-level, not a global mutex
  • DO instances can be evicted; the lock pattern is fragile
  • It requires the Worker to wait on R2 round-trips inside the lock, blocking other in-flight webhooks

Decision

Use per-source-file pattern. Each Worker write produces a unique key including the Telegram message_id (which is monotonic and unique per chat). Names:

  • state/<date>/completion_log_worker_<msgId>.jsonl — one per check-in reply
  • state/<date>/corrections_<msgId>.jsonl — one per correction reply

daily_sync.sh runs at 7am, globs all state/*/completion_log_worker_*.jsonl + all state/*/corrections_*.jsonl, dedups by full-line hash, and appends new lines to the local canonical files.

Why this works

  1. No race: each writer has a unique key. Two webhooks → two different keys → no overwrite possible.
  2. Idempotent merge: dedup by full-line hash means re-running the merge is safe (every PUT is the same content; merge is set-union semantics).
  3. Replayable: if the local merge ever loses data, the R2 keys are the durable source of truth — re-run daily_sync.sh (or its merge step manually) and everything reappears.
  4. Audit trail: each file is named with the Telegram message_id that produced it. Trace from R2 file → original Telegram message → action.

Consequences

Good: - Worker code stays simple — no locking, no etag handling, no retry logic for write conflicts - daily_sync.sh's merge step is easy to test (deterministic from R2 contents) - Recovery from local-state corruption is trivial (delete + re-merge)

Tradeoffs: - More R2 keys over time (one per reply). At ~1 reply/day, ~365/year. Trivial in practice. Add a lifecycle rule (expire after 7d) once this matters. - Bucket prefix grows linearly. Tooling that lists prefixes might slow down at very high volume. Not relevant at our scale. - Adds a sync-step dependency: until daily_sync.sh runs, the local log lags the R2 source of truth by up to 24h. Acceptable — analyses run after sync, not in realtime.

Where it applies

The per-source-file pattern is appropriate when:

  • Multiple writers can fire concurrently (Worker + manual + cron)
  • A single canonical file is the eventual destination
  • Reads of the canonical file can tolerate up-to-1-day lag
  • The merge step has a natural single-writer (laptop's daily_sync)

It's NOT appropriate when:

  • Only one writer ever exists (just write the canonical file)
  • Reads need realtime consistency (use a database, not files)
  • The state is large per-write (thousands of files in a prefix becomes a slow listing)

What it does NOT solve

  • Worker reading the canonical file: when the Worker needs the current completion_log.jsonl to decide what to ask about (e.g. runCheckin() looks for already-logged kinds), it reads from state/latest/completion_log.jsonl which daily_sync.sh writes during R2 backup. This reflects YESTERDAY's state, not real-time. Acceptable: the Worker's "did you do strength today" question doesn't need to know about a strength log entry made 30 minutes ago — it asks at 8pm, and 8pm-day entries are usually still pending.

References

  • Code: ~/garmin-warehouse/cloudflare/otq-checkin/src/log.tsappendEntries() and appendCorrection()
  • Code: ~/garmin-warehouse/scripts/daily_sync.sh — Worker log merge + corrections merge steps
  • Related: ADR 003 — the corrections flow that uses this pattern