# ADR-009: Offline store-and-forward messaging (symmetric send-chain + opportunistic hybrid-PQ PCS mix-in)

## Status

Accepted (design) — implementation tracked by SR-2026-06-01-009..013.

## Date

2026-06-01

## Context

The live transport gives forward secrecy (FS) **and** post-compromise security (PCS) via the Double
Ratchet, but only while **both peers are online** over the Tor onion. Users expect WhatsApp-style
behaviour: send a message (or a photo/video) while the peer is offline, have it wait, and have it
**delivered automatically the moment the recipient's app next pings the relay**, in order, one by one.

The relay already provides the substrate: an **oblivious** store that holds opaque, padded blobs under
rotating HMAC tokens, **burns on pull** (single read), preserves deposit order (FIFO per token), and is
gated by PoW + TTL ([`relay.rs`], [`mailbox.rs`]). What is missing is the **async message crypto** that
sits on top — and the user's hard requirement is **maximum encryption**: offline messages must NOT
silently fall back to a static key. Sealing every offline message under the static, per-contact
root-derived `box_key` would mean a single leak of `R` decrypts the entire history — no FS, no PCS.
That baseline is rejected.

Two independent advisors (Codex, Gemini) were consulted and **converged**:

- Reject **(b)** static `box_key` (zero FS/PCS).
- A pure symmetric send-chain **(c)** gives FS + strict ordering + always-available delivery, but **no
  PCS** and no post-quantum re-injection — insufficient alone.
- A full async Double Ratchet seeded by one-time prekeys **(a)** is strongest but heavy and prone to
  state desync across dropped/out-of-order offline messages without synchronous ACKs.
- **(d)** per-message fresh hybrid PQ+DH mix-in gives maximum PCS but burns the recipient's prekeys fast.

The reconciled recommendation is a **hybrid of (c) and (d)**.

## Decision

**A continuous symmetric send-chain with opportunistic, one-time, hybrid-PQ PCS mix-ins**, delivered
through the existing oblivious relay, with a fail-closed "maximum encryption" policy.

### 1. Per-contact key separation (from the onboarding root `R`)

All derived with labelled HKDF/HMAC, independent of the live ratchet and of the directory keys:

- `CK0`  — initial symmetric **chain key** (one per direction; directional binding mirrors
  `directory.rs`, bound to the sender's identity so A→B and B→A never collide).
- `K_prekey` — seed for the one-time-prekey queue token/seal.
- `K_replay`, `K_hdr` — replay-window and header-AEAD separation.

### 2. One-time hybrid prekeys, published through the oblivious relay

The **recipient** pre-publishes signed one-time prekey bundles and replenishes them:

- bundle = `{ bundle_id, ot_x25519_pub, ot_mlkem768_ek, expiry, sig_identity }`.
- deposited under a rotating per-contact token `prekey_token(epoch) = HMAC(K_prekey, "prekey" || epoch)`,
  sealed under `prekey_seal = HMAC(K_prekey, "prekey-seal")`. The relay sees only padded opaque blobs;
  different contacts use different seeds → no cross-contact linkability; sealed-sender at the envelope.
- the **sender** burn-on-read pulls one bundle before composing a message.

### 3. Sender message step (no round trip)

1. Pull one unused prekey bundle (if any).
2. If a bundle is available — **PCS mix-in**:
   `ss = X25519(eph_sk, ot_x25519_pub) ‖ MLKEM768.Encap(ot_mlkem768_ek) -> (ct_kem, ss_kem)`;
   `CK = HKDF(CK ‖ ss_dh ‖ ss_kem, "pcs-mixin" || bundle_id)`  → fresh FS **and** PCS, post-quantum.
3. **Message ratchet**: `(MK, CK') = HKDF(CK, "msg-ratchet")`; advance `CK := CK'`, `Index += 1`.
4. **AEAD** (XChaCha20-Poly1305), crash-safe **synthetic** nonce — never caller-random:
   `k_enc = HKDF(MK, "enc")`; `nonce = HKDF(MK, "nonce" || Index)` (24 B).
5. **Header** (sealed): `{ epoch, Index, PrevIndex, bundle_id?, ct_kem?, eph_pub?, prev_hash }`.
6. **Persist chain state (CK, Index) to disk BEFORE network delivery** (single SQLite txn), then deposit.

### 4. Maximum-encryption policy (fail-closed)

When no fresh prekey is available, the sender does **not** silently downgrade to FS-only symmetric
ratcheting. The message is completed against the symmetric chain, written to the **outbox**, and the app
triggers a prekey refresh; the **Retry/Refresh** action re-fetches prekeys and resends. A reduced-security
("FS-only, no PCS") send is possible only as a **loud, explicit, opt-in** override — never the default.
This is exactly the user's "it stays there, click refresh to try again" behaviour.

### 5. Receiver: ordering, gaps, replay

- Strict monotonic `Index` per chain. Gap (`Index` ahead of expected) → advance the chain, deriving and
  **caching skipped message keys** up to a bounded limit (e.g. 2000) with expiry.
- Replay: an `Index` ≤ the highest seen and absent from the skipped cache, or a reused `bundle_id`, is
  dropped. Seen-set keyed by `(epoch, Index)` and used-`bundle_id`.
- Relay already burns on pull; the receiver releases messages to the UI **in `Index` order**.

### 6. Offline media (photos / video)

Chunking large media through the fixed-size padded mailbox is wasteful. Instead:

- encrypt the media with a random per-file `FileKey` (XChaCha20-Poly1305, chunked with
  `nonce_i = HKDF(FileKey, "chunk" || i)`), store the ciphertext in a **decoupled oblivious bulk lane**
  (object store reached over Tor), addressed by a random `cap_media`.
- send a normal async message whose sealed payload is the **manifest**:
  `{ media_id, cap_media, chunk_count, chunk_size, file_hash, FileKey }`.
- enforce a size cap + TTL; uploads are resumable by chunk index.

### 7. Sender outbox + retry (incl. media)

- The full crypto ratchet is completed and the **final relay blob is persisted** to a local `outbox`
  table. Retry re-`POST`s the **exact same blob** (never re-encrypts) until the relay returns OK —
  cryptographic state can never desync due to a network flake.
- Pending items render in the UI with their delivery state; **media renders greyed-out with a centred
  "try again" (⟳) button** until its bulk upload + manifest deposit both succeed.

## Consequences

- **Security**: FS for every offline message; PCS + post-quantum re-injection whenever a prekey is
  available; fail-closed when not. A leak of `R` no longer exposes history. The threat boundary now
  includes prekey management and an at-rest bulk media lane — [`THREAT_MODEL.md`] must be updated.
- **Complexity**: new persistent per-contact state (chain key, index, skipped-key cache, outbox, prekey
  pool) and a bulk-object lane. State is small and bounded.
- **Crypto discipline**: reuses audited primitives (X25519, ML-KEM-768, (X)ChaCha20-Poly1305, HKDF,
  HMAC) — no new primitive; this ADR is the required sign-off for the protocol change.
- **No obliviousness regression**: prekeys, messages, and media pointers all ride rotating unlinkable
  tokens with sealed-sender and fixed-size padding; the bulk lane is decoupled and capability-addressed.

## Implementation slices

- SR-…-009 `core`: per-contact async state + key separation + symmetric send-chain + synthetic nonces.
- SR-…-010 `core`: one-time hybrid prekey bundles (gen/sign/verify) + PCS mix-in.
- SR-…-011 `demo`: oblivious prekey publish/replenish + pull; mailbox deposit/poll-release wiring.
- SR-…-012 `demo/ui`: sender outbox + retry; deliver-on-ping; ordering/gap/replay enforcement.
- SR-…-013 `demo/ui`: offline media bulk lane + manifest; greyed media bubble with centred retry button.
