Distributed Testing, Simulation, and Deterministic Replay: Testing Client Semantics, Idempotency, and Exactly-Once Claims
LESSON
Distributed Testing, Simulation, and Deterministic Replay: Testing Client Semantics, Idempotency, and Exactly-Once Claims
Core Insight
In CheckoutService, the internal replication protocol is now tested with history oracles, but a customer does not care which quorum accepted an entry. The customer cares whether pressing "confirm order" once can charge them twice, whether retrying after a timeout is safe, and what response they should trust after an ambiguous failure.
"Exactly once" is rarely a simple internal property. In distributed systems, it is usually a client-facing contract built from smaller mechanisms: idempotency keys, deduplication windows, durable request records, retry discipline, external-effect guards, and response recovery. A test that checks only the database row can miss the thing the client actually experiences: duplicate money movement, duplicate email, duplicate shipment, or an unknown outcome that forces unsafe retries.
The trade-off is client simplicity versus system complexity. A strong client contract makes retries safe and understandable, but the service must preserve request identity, record enough outcome evidence, and coordinate external effects across crashes and partitions. If the test harness does not model the client's ambiguous view of the world, it can prove a cleaner guarantee than the system really offers.
Name the Client Contract
Before testing idempotency or exactly-once behavior, write the contract in client language.
Weak contracts sound like this:
The server usually deduplicates retries.
The operation should not run twice.
The client can retry on timeout.
Those statements are not testable enough. A stronger contract names the operation, the identity, the observable effect, and the ambiguous cases:
For a given merchant and idempotency key, confirm(order_id, key)
creates at most one external payment capture.
If the client repeats the same request with the same key after a timeout,
the service returns the original outcome or a stable "still unknown" response.
If the request body changes under the same key,
the service rejects the request instead of creating a second effect.
That contract tells the harness what to observe. It also reveals what must be modeled: retries, request identity, request body equality, durable outcome records, and external payment effects.
Idempotency Is Not Just Deduplication
Deduplication often means "drop repeats." Idempotency is stronger and more precise: applying the same operation identity multiple times has the same externally visible effect as applying it once.
A naive deduplication table might store only keys:
seen_keys:
k1
That is not enough. The service also needs to know what the key meant and what happened:
idempotency_record:
merchant: m1
key: k1
request_hash: h_confirm_order_1_amount_50
status: captured
payment_capture_id: p778
response_body: approved
Without the request hash, a client or adapter bug could reuse k1 for a different request. Without the outcome, a retry after timeout cannot recover the original response. Without a durable record, a crash can forget that the external effect already happened.
Idempotency also has a scope. A key may be unique per tenant, per account, per operation type, per endpoint, or globally. Tests should include the scope explicitly:
same key, same merchant -> same operation identity
same key, different merchant -> independent operation identity
same key, different request hash -> conflict
same key, expired retention window -> contract-specific behavior
Exactly-Once Claims Need Effect Oracles
Exactly-once language is dangerous because it can mean several different things:
- the server processes the request handler once
- the database stores one command record
- the message queue delivers one message
- the external provider sees one capture
- the client observes one successful outcome
Those are different properties. A handler can run twice while the external effect happens once. A queue can deliver twice while an idempotent consumer writes once. A database can hold one row while the payment provider saw two captures.
The oracle must match the promise. If the client promise is "at most one external capture," the test should count provider-observed captures, not only local database rows.
bad oracle:
assert orders[order_id].status == "confirmed"
better oracle:
assert payment_provider.captures_for(idempotency_key).count <= 1
assert every retry returns the same terminal outcome or a safe unknown response
assert conflicting request bodies under the same key are rejected
For some systems, the honest contract is not exactly once. It may be:
at-least-once delivery with idempotent consumer
at-most-once external effect
effectively-once within a retention window
exactly-once state transition inside one transactional log
retryable until a stable outcome is returned
Testing gets much easier when the claim is honest.
Ambiguous Outcomes Are the Hard Case
The most important client tests are often not the clean success path. They are the unknown outcome path.
Suppose the client sends a request:
1 client -> service: confirm(order-1, key k1)
2 service records in-flight key k1
3 service sends capture(k1) to payment provider
4 payment provider captures funds
5 service crashes before recording provider response
6 client times out
From the client's perspective, the operation is unknown. It might have failed before the external effect, succeeded before the crash, or still be in progress.
Unsafe retry behavior looks like this:
7 client retries confirm(order-1, key k1)
8 service finds no durable outcome
9 service sends a second capture
10 provider records duplicate capture
Safer behavior uses durable intent and recovery evidence:
2 service durably records key k1 as in-flight
3 service sends capture with provider idempotency key k1
4 provider captures funds
5 service crashes before recording response
6 client retries with k1
7 service sees in-flight k1
8 service asks provider or recovery log for outcome
9 service records captured p778
10 service returns stable approved response
The test harness must create the crash between the external effect and local outcome recording. If it only crashes before or after the whole handler, it misses the dangerous boundary.
Worked Example
The deterministic harness models a client, two service replicas, a replicated idempotency table, and a payment provider stub:
actors:
C client
A service replica
B service replica
P payment provider model
state:
idempotency table replicated asynchronously
provider records every capture attempt
The property is:
For merchant m1 and idempotency key k1,
all retries of the same confirm request cause at most one provider capture.
The harness explores this schedule:
1 C sends confirm(order-1, k1, amount=50) to A
2 A durably records in-flight(k1, hash=h1)
3 A sends replication m1 -> B
4 network delays m1
5 A sends capture(k1, h1) to P
6 P records capture p1 and returns approved
7 A crashes before recording outcome p1
8 C times out
9 C retries confirm(order-1, k1, amount=50) to B
10 B has not seen m1
11 B sends capture(k1, h1) to P
12 invariant checks provider captures for k1
If the provider model has correct idempotency by key and request hash, step 11 should return the original outcome or reject the duplicate without a second capture. If the provider model accepts the second capture, the service must have a stronger local or shared guard before calling the provider.
Now add a second generated retry:
13 C retries confirm(order-1, k1, amount=75) to B
That is not the same operation. The service should reject it as an idempotency conflict:
409 conflict:
same idempotency key
different request hash
The final database state is not enough. The oracle checks:
provider_captures(m1, k1).count <= 1
all same-hash retries return compatible outcome
all different-hash retries under k1 are rejected
if outcome is unknown, response is stable and safe to retry
Those checks express the client semantics directly.
What the Harness Must Control
Client semantics tests need the same deterministic controls as earlier lessons, plus a few client-specific ones.
The harness should control request identity:
- idempotency key
- tenant or merchant scope
- operation type
- request body hash
- client retry attempt id
It should control ambiguous failure points:
- timeout before request reaches service
- crash after durable intent but before external effect
- crash after external effect but before local outcome
- lost response after successful commit
- retry routed to a stale replica
- provider returns unknown or retryable error
It should record external effects:
- payment captures
- shipments
- emails
- queue publishes
- ledger entries
- third-party API calls
It should distinguish client-visible responses:
success: terminal outcome known
conflict: same key with different request
in_progress: safe retry later
unknown: service cannot yet prove terminal outcome
failure: operation definitely did not happen
The response vocabulary matters because clients automate retries based on it.
Common Failure Modes
One mistake is asserting only internal state. If the client promise concerns an external effect, the test must observe that effect.
Another mistake is treating timeout as failure. A timeout means the client does not know. The service must make retries safe under that uncertainty.
A third mistake is storing idempotency keys without request hashes or scopes. That can merge unrelated operations or allow conflicting requests to reuse the same identity.
A fourth mistake is letting retention windows disappear from tests. If idempotency records expire after 24 hours, the contract must say what happens after 24 hours and the harness should test that boundary.
A fifth mistake is using exactly-once language when the real contract is narrower. "At most one external capture for the same scoped idempotency key" is often more testable and more honest than "exactly once."
Practice
Take one client-facing operation that supports retries.
- What is the exact operation identity?
- What is the idempotency scope?
- Which request fields must match on retry?
- Which external effects should be counted by the oracle?
- What response should the client get after a timeout?
- What happens if the service crashes after the external effect but before local recording?
- What is the retention window for the idempotency record?
- Which response states are safe for automated retry?
Then write one test schedule that crashes at the most inconvenient point: after the side effect may have happened, before the client receives a stable outcome. That is where client semantics become real.
Connections
- Builds on Testing Consensus, Replication, and Membership Protocols, because internal protocol evidence supports the service's ability to preserve client-visible guarantees.
- Prepares for Observability for Reproducible Distributed Bugs, because client-semantics failures require traces that connect retries, idempotency records, and external effects.
- Connects to reliability practice because post-incident evidence often starts from a customer-visible duplicate, not from an internal invariant failure.
Resources
- [BOOK] Designing Data-Intensive Applications
- [DOC] Stripe API: Idempotent Requests
- [DOC] Apache Kafka Documentation: Message Delivery Semantics
- [DOC] Jepsen Analyses
Key Takeaways
- Client semantics should be tested as observable contracts: request identity, retry behavior, external effects, and stable outcomes.
- Idempotency requires scope, request equality, durable outcome evidence, and a defined retention boundary.
- Exactly-once claims must name exactly what is counted: handler execution, database state, message delivery, external side effect, or client-visible outcome.
- The hardest tests crash or partition the system when the side effect may have happened but the client still sees an unknown result.
← Back to Distributed Testing, Simulation, and Deterministic Replay