Exactly-Once Semantics, Idempotency, and Deduplication

LESSON

013 30 min intermediate

Exactly-Once Semantics, Idempotency, and Deduplication

The core idea: "Exactly once" is usually a scoped processing guarantee, not a universal delivery promise, so reliable systems combine bounded commits with idempotent effects and deduplication, accepting a trade-off between stronger safety and more state at the boundaries.

Core Insight

Imagine a payment worker that reads ChargeCustomer from a queue, calls a payment provider, writes the result to a database, and emits PaymentCaptured. The worker successfully charges the card, then crashes before recording the result. When it restarts, the queue redelivers the message because the previous attempt was never acknowledged.

The system is now in the uncomfortable space where distributed systems spend a lot of time: the operation may have happened, but the local service cannot prove exactly what completed before the crash. Retrying might fix a missed charge, or it might double-charge the customer. Avoiding all retries would hurt availability, but retrying blindly creates duplicate side effects.

This is why "exactly once" must be treated carefully. Some platforms can provide exactly-once semantics inside a controlled boundary, such as reading from one log, updating managed state, and committing to another log transactionally. That is real and useful. It does not automatically make the whole business workflow exactly once, especially when external APIs, emails, payments, or human-visible side effects sit outside that boundary.

The practical design goal is safe repetition. Idempotency makes repeated operations harmless. Deduplication recognizes repeated work using stable identity and memory. Exactly-once semantics, idempotency, and deduplication are related, but they solve different parts of the failure story.

Where Duplicates Come From

Duplicates are not rare bugs. They are a normal result of systems trying to recover from uncertainty.

A client may retry an HTTP request after a timeout even though the server completed the first attempt. A broker may redeliver a message after a consumer crashes before acknowledgement. A producer may retry after an ambiguous write acknowledgement. A stream processor may replay records after a rebalance or checkpoint restore.

consumer receives message
consumer performs side effect
consumer crashes before ack
broker redelivers message

The broker is not doing something wrong. From its perspective, redelivery is the safe move because it cannot assume the consumer completed the work. The application must decide whether the repeated work should be applied, suppressed, or transformed into a safe no-op.

Exactly-Once Semantics: A Bounded Contract

Exactly-once semantics means different things in different systems, so the first question is always: exactly once across which boundary?

A stream processor may coordinate source offsets, processing state, and output writes so each input record contributes to managed output once, even across crash and retry:

input log -> processor state -> output log
        one coordinated commit boundary

Inside that boundary, the platform controls the relevant pieces. It can commit input progress and output atomically, or restore them together from a checkpoint. That can prevent a record from being double-applied to the managed output.

The guarantee weakens when the workflow crosses a boundary the platform does not control:

input log -> processor -> payment API
input log -> processor -> email provider
input log -> processor -> warehouse with separate commits

The processor can retry its own work, but it may not know whether the external side effect already happened. The lesson is not that exactly-once systems are fake. The lesson is that the guarantee is scoped. A precise exactly-once claim names the source, state, sink, commit protocol, and failure assumptions.

Idempotency: Making Repetition Harmless

An operation is idempotent when repeating it has the same effect as applying it once. Idempotency works at the effect boundary, where the business consequence happens.

Some operations are naturally idempotent:

set order status = shipped
replace profile email with x@example
record result for request_id = abc if absent

Other operations are not idempotent by default:

increment balance by 10
send another welcome email
charge card 25 USD

The non-idempotent operations can often be redesigned. A payment request can carry an idempotency key. An inventory reservation can use a stable reservation ID. A workflow step can record step_id completion before exposing the next transition.

request_id=abc, amount=25
first attempt: apply charge and remember abc
retry: return stored result for abc

The trade-off is that idempotency usually requires explicit operation identity, durable records, and careful response semantics. The reward is large: retries become part of the normal recovery path instead of a source of accidental multiplication.

Deduplication: Identity Plus Memory

Deduplication is a mechanism for detecting repeated work. It asks, "have I seen this operation before?"

To answer that, the system needs three things:

a stable identity, such as a request ID, message ID, producer sequence number, or business key
remembered history, such as a table, cache, compacted topic, sequence tracker, or time window
a rule for whether to suppress, replay a stored result, or accept the event

event(id=abc) -> unseen -> apply and remember abc
event(id=abc) -> seen   -> suppress or return stored result

Deduplication is powerful, but it is only as strong as its identity and retention model. If IDs change on retry, dedupe cannot recognize the repeat. If the remembered window expires too early, an old duplicate may be accepted. If the dedupe store is not updated atomically with the effect, a crash can still leave ambiguity.

That is the core trade-off: stronger dedupe needs more durable memory, clearer identities, and more careful transaction boundaries.

Worked Example: Charging a Customer Safely

Start with a fragile workflow:

message: charge order 123
worker: call payment provider
worker: write result
worker: ack message

If the worker crashes after the provider call but before writing the result, retry is ambiguous.

A safer design gives the operation a stable identity:

idempotency_key = charge:order-123

The worker sends that key to the payment provider if the provider supports idempotent requests. It also records the key and result in its own database. On retry, the worker does not create a new charge intent; it asks for the result associated with the same key or returns the stored result.

This still requires engineering discipline. The key must be stable across retries. The stored result must survive crashes. The code must handle "request in progress" and "provider result unknown" states. But the workflow has moved from "hope delivery happened once" to "make repeated attempts converge on one business effect."

Choosing the Tool

Concept	Main Question	Typical Mechanism	Main Trade-off
Exactly-once semantics	Can this bounded pipeline avoid double-apply?	Coordinated commits, transactions, checkpointed source and sink state	Strong but scoped; external side effects often sit outside
Idempotency	If the operation repeats, is the effect still safe?	Stable request keys, state transitions, stored results	Requires business-level identity and response design
Deduplication	Can the system recognize repeated work?	Message IDs, sequence numbers, dedupe tables, windows	Requires memory, retention policy, and atomicity with effects

Use exactly-once semantics when the platform controls the source, state, and sink well enough to commit them together. Use idempotency wherever a request may be repeated and the business effect must not multiply. Use deduplication when repeated messages or requests need to be recognized and suppressed.

Most robust systems use more than one. A stream job may rely on exactly-once semantics for its managed logs, idempotent writes for an external sink, and dedupe tables for requests that cross service boundaries.

Common Misreadings

Exactly once does not mean no duplicate message can ever appear anywhere. It usually means a bounded pipeline can avoid applying the same input twice to a managed output under defined assumptions.

Deduplication does not remove the need for idempotency. Dedupe windows can expire, IDs can be wrong, and external systems can retry independently. Idempotent effects are the stronger last line of defense.

Retries are not the enemy. Unsafe retries are the enemy. Available systems need retries, so the design work is to give repeated attempts stable identity and safe effects.

Connections

The previous lesson on checkpointing showed how systems resume from durable boundaries. Exactly-once claims depend on those boundaries being aligned: input position, processing state, and output commit must agree after recovery.

The next lesson turns to production coordination systems such as etcd, Consul, and ZooKeeper. Their leases, watches, and compare-and-swap primitives are often used to build the identities, ownership boundaries, and state transitions that make retry-heavy workflows safer.

Resources

[DOC] Apache Kafka Documentation: Exactly Once Semantics
- Focus: How Kafka scopes exactly-once behavior to producer, transaction, and stream-processing boundaries.
[DOC] Stripe Docs: Idempotent Requests
- Focus: How stable request keys make payment-style retries safe at an external API boundary.
[DOC] Apache Flink Documentation: Fault Tolerance
- Focus: How checkpointing supports consistent recovery in stateful stream processing.
[BOOK] Designing Data-Intensive Applications
- Focus: The broader relationship between at-least-once delivery, idempotency, transactions, and stream processing.

Key Takeaways

Exactly-once semantics is usually scoped to a controlled source-state-sink boundary, not an end-to-end promise for every side effect.
Idempotency makes repeated requests converge on one effect, which is often the practical foundation for safe retries.
Deduplication needs stable identity and remembered history; without both, duplicate suppression becomes guesswork.

← Back to Consensus and Coordination

← Back to Distributed Systems

← Back to Learning Hub