Distributed Testing, Simulation, and Deterministic Replay: Testing Client Semantics, Idempotency, and Exactly-Once Claims

LESSON

Distributed Testing, Simulation, and Deterministic Replay

019 30 min intermediate

Distributed Testing, Simulation, and Deterministic Replay: Testing Client Semantics, Idempotency, and Exactly-Once Claims

Core Insight

In CheckoutService, the internal replication protocol is now tested with history oracles, but a customer does not care which quorum accepted an entry. The customer cares whether pressing "confirm order" once can charge them twice, whether retrying after a timeout is safe, and what response they should trust after an ambiguous failure.

"Exactly once" is rarely a simple internal property. In distributed systems, it is usually a client-facing contract built from smaller mechanisms: idempotency keys, deduplication windows, durable request records, retry discipline, external-effect guards, and response recovery. A test that checks only the database row can miss the thing the client actually experiences: duplicate money movement, duplicate email, duplicate shipment, or an unknown outcome that forces unsafe retries.

The trade-off is client simplicity versus system complexity. A strong client contract makes retries safe and understandable, but the service must preserve request identity, record enough outcome evidence, and coordinate external effects across crashes and partitions. If the test harness does not model the client's ambiguous view of the world, it can prove a cleaner guarantee than the system really offers.

Name the Client Contract

Before testing idempotency or exactly-once behavior, write the contract in client language.

Weak contracts sound like this:

The server usually deduplicates retries.
The operation should not run twice.
The client can retry on timeout.

Those statements are not testable enough. A stronger contract names the operation, the identity, the observable effect, and the ambiguous cases:

For a given merchant and idempotency key, confirm(order_id, key)
creates at most one external payment capture.

If the client repeats the same request with the same key after a timeout,
the service returns the original outcome or a stable "still unknown" response.

If the request body changes under the same key,
the service rejects the request instead of creating a second effect.

That contract tells the harness what to observe. It also reveals what must be modeled: retries, request identity, request body equality, durable outcome records, and external payment effects.

Idempotency Is Not Just Deduplication

Deduplication often means "drop repeats." Idempotency is stronger and more precise: applying the same operation identity multiple times has the same externally visible effect as applying it once.

A naive deduplication table might store only keys:

seen_keys:
  k1

That is not enough. The service also needs to know what the key meant and what happened:

idempotency_record:
  merchant: m1
  key: k1
  request_hash: h_confirm_order_1_amount_50
  status: captured
  payment_capture_id: p778
  response_body: approved

Without the request hash, a client or adapter bug could reuse k1 for a different request. Without the outcome, a retry after timeout cannot recover the original response. Without a durable record, a crash can forget that the external effect already happened.

Idempotency also has a scope. A key may be unique per tenant, per account, per operation type, per endpoint, or globally. Tests should include the scope explicitly:

same key, same merchant -> same operation identity
same key, different merchant -> independent operation identity
same key, different request hash -> conflict
same key, expired retention window -> contract-specific behavior

Exactly-Once Claims Need Effect Oracles

Exactly-once language is dangerous because it can mean several different things:

Those are different properties. A handler can run twice while the external effect happens once. A queue can deliver twice while an idempotent consumer writes once. A database can hold one row while the payment provider saw two captures.

The oracle must match the promise. If the client promise is "at most one external capture," the test should count provider-observed captures, not only local database rows.

bad oracle:
  assert orders[order_id].status == "confirmed"

better oracle:
  assert payment_provider.captures_for(idempotency_key).count <= 1
  assert every retry returns the same terminal outcome or a safe unknown response
  assert conflicting request bodies under the same key are rejected

For some systems, the honest contract is not exactly once. It may be:

at-least-once delivery with idempotent consumer
at-most-once external effect
effectively-once within a retention window
exactly-once state transition inside one transactional log
retryable until a stable outcome is returned

Testing gets much easier when the claim is honest.

Ambiguous Outcomes Are the Hard Case

The most important client tests are often not the clean success path. They are the unknown outcome path.

Suppose the client sends a request:

1  client -> service: confirm(order-1, key k1)
2  service records in-flight key k1
3  service sends capture(k1) to payment provider
4  payment provider captures funds
5  service crashes before recording provider response
6  client times out

From the client's perspective, the operation is unknown. It might have failed before the external effect, succeeded before the crash, or still be in progress.

Unsafe retry behavior looks like this:

7  client retries confirm(order-1, key k1)
8  service finds no durable outcome
9  service sends a second capture
10 provider records duplicate capture

Safer behavior uses durable intent and recovery evidence:

2  service durably records key k1 as in-flight
3  service sends capture with provider idempotency key k1
4  provider captures funds
5  service crashes before recording response
6  client retries with k1
7  service sees in-flight k1
8  service asks provider or recovery log for outcome
9  service records captured p778
10 service returns stable approved response

The test harness must create the crash between the external effect and local outcome recording. If it only crashes before or after the whole handler, it misses the dangerous boundary.

Worked Example

The deterministic harness models a client, two service replicas, a replicated idempotency table, and a payment provider stub:

actors:
  C  client
  A  service replica
  B  service replica
  P  payment provider model

state:
  idempotency table replicated asynchronously
  provider records every capture attempt

The property is:

For merchant m1 and idempotency key k1,
all retries of the same confirm request cause at most one provider capture.

The harness explores this schedule:

1   C sends confirm(order-1, k1, amount=50) to A
2   A durably records in-flight(k1, hash=h1)
3   A sends replication m1 -> B
4   network delays m1
5   A sends capture(k1, h1) to P
6   P records capture p1 and returns approved
7   A crashes before recording outcome p1
8   C times out
9   C retries confirm(order-1, k1, amount=50) to B
10  B has not seen m1
11  B sends capture(k1, h1) to P
12  invariant checks provider captures for k1

If the provider model has correct idempotency by key and request hash, step 11 should return the original outcome or reject the duplicate without a second capture. If the provider model accepts the second capture, the service must have a stronger local or shared guard before calling the provider.

Now add a second generated retry:

13  C retries confirm(order-1, k1, amount=75) to B

That is not the same operation. The service should reject it as an idempotency conflict:

409 conflict:
  same idempotency key
  different request hash

The final database state is not enough. The oracle checks:

provider_captures(m1, k1).count <= 1
all same-hash retries return compatible outcome
all different-hash retries under k1 are rejected
if outcome is unknown, response is stable and safe to retry

Those checks express the client semantics directly.

What the Harness Must Control

Client semantics tests need the same deterministic controls as earlier lessons, plus a few client-specific ones.

The harness should control request identity:

It should control ambiguous failure points:

It should record external effects:

It should distinguish client-visible responses:

success: terminal outcome known
conflict: same key with different request
in_progress: safe retry later
unknown: service cannot yet prove terminal outcome
failure: operation definitely did not happen

The response vocabulary matters because clients automate retries based on it.

Common Failure Modes

One mistake is asserting only internal state. If the client promise concerns an external effect, the test must observe that effect.

Another mistake is treating timeout as failure. A timeout means the client does not know. The service must make retries safe under that uncertainty.

A third mistake is storing idempotency keys without request hashes or scopes. That can merge unrelated operations or allow conflicting requests to reuse the same identity.

A fourth mistake is letting retention windows disappear from tests. If idempotency records expire after 24 hours, the contract must say what happens after 24 hours and the harness should test that boundary.

A fifth mistake is using exactly-once language when the real contract is narrower. "At most one external capture for the same scoped idempotency key" is often more testable and more honest than "exactly once."

Practice

Take one client-facing operation that supports retries.

  1. What is the exact operation identity?
  2. What is the idempotency scope?
  3. Which request fields must match on retry?
  4. Which external effects should be counted by the oracle?
  5. What response should the client get after a timeout?
  6. What happens if the service crashes after the external effect but before local recording?
  7. What is the retention window for the idempotency record?
  8. Which response states are safe for automated retry?

Then write one test schedule that crashes at the most inconvenient point: after the side effect may have happened, before the client receives a stable outcome. That is where client semantics become real.

Connections

Resources

Key Takeaways

PREVIOUS Distributed Testing, Simulation, and Deterministic Replay: Testing Consensus, Replication, and Membership Protocols NEXT Distributed Testing, Simulation, and Deterministic Replay: Observability for Reproducible Distributed Bugs