---
last_verified: 2026-05-30
verified_version: 0.1.34
owner: qa
freshness_days: 30
---

# Test Plan — pvtcoms

Full strategy (the quick-start is [`TESTING.md`](./TESTING.md)). Reconciled from Codex + Gemini + research. Guiding
principle (Gemini): **test the *absence* of bad behavior, not just the presence of good behavior.** Attackers in 2026 don't
break ML-KEM — they send malformed bytes over the Tor socket, scrape a key from a log, or exploit a timing leak.

## Test pyramid

| Layer | What | Tools | Coverage target |
|---|---|---|---|
| **Crypto correctness** | Deterministic Known-Answer-Tests for every primitive (hybrid KEM, AEAD, HKDF, signatures, SAS, transcript hash) | **NIST ACVP vectors**, **`wycheproof`** crate, committed `tests/vectors/*.json` | ≥95% line / ≥90% branch |
| **Constant-time** | No timing side-channels (the RUSTSEC-2025-0144 ml-dsa bug class) | **`subtle`/`crypto-bigint`** types + **dudect-style** timing tests | required, not % |
| **Protocol / ratchet** | Property + fuzz: out-of-order / dropped / duplicate / replay / reconnect / clock-skew → **no desync, no message loss, replay never advances state** | **`proptest`**, **`cargo-fuzz`** | ≥90% line |
| **Unsafe / concurrency** | UB in FFI/buffers; data races in session store / token rotation / retry queue | **`miri`**, **`loom`** | required |
| **Wire parsers** | Every byte from an untrusted source (envelopes, mailbox tokens, invite/SAS encoding) | **`cargo-fuzz`** continuous (the #1 attack surface) | fuzz corpus, no panics/OOM |
| **Network / Tor** | Bootstrap, stream failure/retry, circuit rotation, offline→reconnect mailbox poll | **`chutney`** (local Tor net) + **`arti`** in-process; live-Tor pre-release | scenario checklist |
| **Mailbox / rendezvous** | Token issued→redeemed→invalidated; single-use consumed once; expiry/grace; illegal transitions hard-rejected | model-based state tests | ≥90% line |
| **Adversarial / negative** | Replay onboarding packets; stolen token (pre/post legit use); MITM onboarding (altered invite/prekey); SAS-mismatch forces `unverified`; corrupt/truncated/oversized fields | hand-written + `proptest` | checklist (required) |
| **Metadata-leak** | Assert logs/crash dumps contain **no plaintext, keys, IPs, onion addrs, invite secrets**; assert **zero network syscalls outside the Tor SOCKS port** (`strace`/eBPF wrapper); panic paths redact before reporting | snapshot redaction tests + syscall assertion | required |
| **Test quality** | Prove the suite actually catches bugs | **`cargo-mutants`** (mutation testing) | surviving-mutant budget |
| **UI** | Onboarding, add/verify, send, recover, panic flows | **Compose UI test** (semantics), **Tauri WebDriver** | ≥70% critical-journey |

## Status (implemented as of 0.1.10)

- ✅ **Crypto KATs** — RFC 8439 (ChaCha20-Poly1305) + RFC 5869 (HKDF) in `core/src/kat.rs`; differential
  FIPS-203 ML-KEM-768 KAT (libcrux vs RustCrypto, byte-identical) in `core/src/pqkem.rs`.
- ✅ **Wycheproof vectors** — `core/tests/wycheproof.rs` runs the full ChaCha20-Poly1305 + Ed25519 +
  X25519 edge-case suites against our AEAD (`seal`/`open`), `identity::verify`, and the pinned X25519
  (incl. forgery/malleability/small-order cases). Runs in `cargo test`.
- ✅ **Property tests** — `core/tests/properties.rs` (13 proptest properties: ratchet desync/dropped/
  out-of-order, parser roundtrips).
- ✅ **cargo-fuzz** — `core/fuzz/` (its own nightly workspace) with libFuzzer targets for every untrusted
  parser: `wire_decode`, `invite_parse`, `prekey_parse`, `rotation_parse`, `media_manifest`. Survived
  ~40M executions with no panics/OOM. CI **builds** every target on each run and runs a 60s coverage-
  guided **smoke per target daily**. (Augments the inline xorshift fuzz already in `core/src`.)
- ✅ **loom** — `concurrency/` crate (`pvtcoms-concurrency`) holds the per-contact lock-registry interning
  pattern (the real `offline_client::contact_lock` now uses it) with sync primitives that switch to loom
  under `--cfg loom`. `loom_tests` exhaustively model-check that the registry interns **exactly once per
  key** under all interleavings (mutual exclusion preserved), distinct keys stay independent, and
  concurrent `fetch_add` ids are unique. CI runs `RUSTFLAGS="--cfg loom" cargo test -p pvtcoms-concurrency`.
- ✅ **cargo-mutants** — mutation testing on the untrusted-wire parsers, the crypto boundary, **and the
  crypto protocol core, the identity/KEM facades, the framing/storage helpers, the media +
  offline-outbox modules, and the relay policy engine** — the **entire `core` crate** (all 23 modules)
  is now under mutation testing
  (`core/src/{wire,contacts,rotation,crypto,handshake,ratchet,sendchain,offline,prekey,invite,directory,identity,pqkem,mailbox,store,keystore,cover,pad,kat,pow,media,outbox,relay}.rs`): **0
  surviving**. Mutation testing of `media` surfaced a real off-by-one in `decrypt_media`'s chunk-count
  bound (a legitimately-encrypted MAX_MEDIA object failed to decrypt) — now fixed and pinned by an
  (ignored, ~64 MiB) max-size round-trip test. Twelve provably-equivalent or boundary-only mutants are
  documented + excluded in `.cargo/mutants.toml`. Surviving mutants drove targeted killing tests: wire `AuthReply`/`AuthConfirm` round-trips
  + `MAX_FRAME` pin; `Contacts::from_encrypted` boundary cases; rotation payload/seal-key binding;
  **`handshake::signed_payload`** must bind context+role+transcript (a constant payload broke MAL-BIND/
  anti-reflection yet passed every functional test); **ratchet** KDFs must be input-dependent (a constant
  KDF = AEAD nonce reuse + no forward secrecy, uncaught because the per-message AAD differs by `n`),
  `Header::to_bytes` layout, `can_send`, and the exact `MAX_SKIP` skip bound; **directory** record
  parser bounds (count/length caps, the `||`-overflow OOB guard); **store** at-rest log parser bounds
  (truncated/overrun records) and `pad`/`mailbox` boundary + diagnostic counts. The `identity`, `pqkem`,
  `keystore`, `cover`, `kat` and `pow` facades were already 0-surviving (thin wrappers over audited
  primitives with strong KAT/round-trip tests). Five provably-equivalent mutants documented + excluded
  in `.cargo/mutants.toml`. CI runs it daily (scheduled only — it re-runs the suite once per mutant, so
  it's kept off the PR path).
- ✅ **Metadata-leak (static + at-rest)** — `core/tests/output_hygiene.rs` enforces that the `core`
  library stays silent (no logging/print), no secret-bearing type derives `Debug`/`Serialize` (the
  `{:?}`-into-panic/log leak path), and no demo log sink interpolates a secret identifier; each rule is
  negative-tested. `core/tests/at_rest_no_leak.rs` proves every at-rest seal hides its plaintext and is
  non-deterministic. Backed by zeroize-on-drop of the live ratchet key state (`core/src/ratchet.rs`).
  The **syscall-level** assertion (zero network egress outside the Tor SOCKS port, incl. no plaintext
  DNS) is still pending — it needs a real Tor network the sandbox can't provide.
- ⏳ **Pending** (toolchain/infra not yet wired): `miri` (blocked by libcrux SIMD + C deps under Miri),
  dudect constant-time, `chutney`/`arti` Tor integration in CI (the sandbox blocks Tor
  relay ports — runs manually), metadata-leak **syscall** assertion (strace/eBPF, run-on-real-net),
  Compose/Tauri UI tests.

## Non-negotiable minimum (vs gold-plating)
- **Minimum**: KAT/Wycheproof/ACVP vectors in CI · `proptest` on roundtrips + ratchet invariants · `cargo-fuzz` on every wire
  parser · `miri` on unsafe · metadata-leak assertions · reproducible **Rust core** build.
- **High-ROI adds**: `loom` (P2P concurrency is exactly its domain) · `cargo-mutants` · constant-time/dudect · chutney/arti Tor tests · multi-client integration (SimpleX-style: spin up ≥2 real clients, full invite+SAS+ratchet+mailbox round-trip).
- **Defer (gold-plating)**: full bit-for-bit *Android NDK* reproducibility on day 1 · AFL alongside libFuzzer · formal verification (unless we adopt Cryspen's verified `ml-kem`).

## Don't mock Tor
Mocked transports assume instant, reliable, in-order delivery — hiding the circuit collapses, latency, and reordering where
real desync bugs live. Integration tests run over **chutney/arti** (deterministic local Tor net) in CI; live-Tor smoke runs
pre-release.

## CI matrix
- **OS**: Linux, macOS, Windows. **Rust**: stable + pinned nightly (miri/loom/fuzz infra). **Targets**: linux-gnu, Android arm64, desktop triples.
- **Jobs**: lint/format → unit+property (`cargo-nextest`) → coverage (`cargo-llvm-cov`) → fuzz smoke (per-PR) / long fuzz (nightly) → miri+loom (nightly) → Tor integration (scheduled + pre-release) → Compose/Tauri UI → reproducibility check (`diffoscope`).

## Per-milestone (maps to BUILD_PLAN M1–M5)
- **M1** crypto core → KAT harness, unit tests, proptest scaffolding, constant-time tests.
- **M2** session/ratchet → full property suite + first fuzz targets + loom on session store.
- **M3** network/mailbox → chutney/arti Tor tests + mailbox state-machine + replay/stolen-token negatives.
- **M4** apps → Compose/Tauri critical-path + invite/deep-link tests + log-redaction assertions.
- **M5** hardening → long fuzz, chaos-Tor runs, full adversarial checklist, reproducibility gate, security regression suite (mapped to `THREAT_MODEL.md`).
