LESSON
Day 265: Delivery Semantics: At-Most-Once, At-Least-Once, Exactly-Once
Delivery semantics are not broker slogans. They are statements about what can be lost, what can repeat, and at which boundary the system is actually making that promise.
Today's "Aha!" Moment
The insight: The difference between at-most-once, at-least-once, and exactly-once is mostly about when a system decides work is done relative to crashes and retries.
Why this matters: Teams often talk about these semantics as if they were product labels attached to a queue or broker. That is how expensive misunderstandings start. The real question is always:
- when do we acknowledge or commit?
- what happens if we crash right before or right after that point?
- does the guarantee cover only the broker, the consumer pipeline, or the final side effect too?
The universal pattern:
- acknowledge early -> risk loss, avoid duplicates
- acknowledge late -> avoid loss, risk duplicates
- coordinate processing and commit atomically -> stronger guarantees, but only inside a bounded system
Concrete anchor: An order event is consumed from Kafka and used to send a confirmation email. If the consumer marks the offset as done before sending the email, a crash can lose the email forever. If it marks the offset only after sending, a crash can cause the email to be sent twice. The semantics are created by that timing.
How to recognize when this applies:
- A broker or queue promises reliability, but downstream side effects still duplicate or disappear.
- Crashes during processing create arguments about whether the message was "already handled."
- Rebalances or retries are exposing gaps between commit timing and real business completion.
Common misconceptions:
- [INCORRECT] "
Exactly-oncemeans no duplicate side effect can ever happen anywhere." - [INCORRECT] "
At-least-onceis always safer, so it is always better." - [CORRECT] The truth: Each semantic is a different trade-off between loss, duplication, coordination cost, and scope of guarantee.
Real-world examples:
- Metrics pipeline: Duplicate increments may be acceptable, so
at-least-onceplus aggregation tolerance is often enough. - Billing or email delivery: Reprocessing may be expensive or user-visible, so idempotency keys or transactional boundaries become much more important.
Why This Matters
The problem: Delivery semantics are where nice diagrams meet crash reality. Messages can be read, processed, retried, rebalanced, re-sent, or committed at awkward moments. If the system boundary is vague, teams think they bought stronger guarantees than they actually have.
Before:
- Acknowledgements and offset commits are treated as routine plumbing.
- Consumers are called "exactly-once" because the broker supports transactions.
- Duplicate side effects appear in production and nobody knows which layer lied.
After:
- Delivery semantics are treated as crash-boundary design decisions.
- Teams distinguish broker guarantees from end-to-end business guarantees.
- Idempotency, transactional writes, and commit timing are chosen deliberately.
Real-world impact: This avoids lost work, reduces duplicate side effects, and makes incident response much faster because the team can say exactly which boundary was guaranteed and which one was not.
Learning Objectives
By the end of this session, you will be able to:
- Explain what delivery semantics actually describe - Understand them as crash and commit contracts, not marketing labels.
- Describe how
at-most-onceandat-least-onceare created mechanically - Reason from ack and commit timing to loss or duplication outcomes. - Evaluate what
exactly-oncereally means in practice - Distinguish bounded transactional guarantees from true end-to-end business effects.
Core Concepts Explained
Concept 1: Delivery Semantics Are About the "Done" Boundary
The key question is not:
- "does the broker store messages durably?"
The key question is:
- "when does the system declare this message fully handled?"
That declaration may happen at several different layers:
- broker acknowledges producer write
- consumer commits offset
- business logic updates a database
- external side effect happens, such as charging a card or sending an email
Those are not the same boundary.
This is why delivery semantics are tricky. A pipeline can be:
- durable at the broker layer
- replayable at the consumer layer
- still duplicate or lose effects at the business layer
So the mature mental model is:
- delivery semantics are scoped promises
Whenever someone says "this is exactly-once," the immediate follow-up should be:
- exactly once between which boundaries?
If the answer is vague, the guarantee is probably being overstated.
Concept 2: At-Most-Once and At-Least-Once Come From Commit Timing
The cleanest way to understand these semantics is to imagine one consumer processing one message.
At-most-once
The consumer acknowledges or commits first, then does the work.
Shape:
- read message
- mark it done
- process side effect
If the process crashes after step 2 but before step 3:
- the message is considered consumed
- the side effect may never happen
So at-most-once means:
- a message will not be processed more than once
- but it may be processed zero times in reality
This is appropriate when:
- occasional loss is acceptable
- duplicates are more dangerous than drops
- the consumer work is cheap or non-critical
At-least-once
The consumer does the work first, then acknowledges or commits.
Shape:
- read message
- perform side effect
- mark it done
If the process crashes after step 2 but before step 3:
- the system may retry the message
- the side effect may happen again
So at-least-once means:
- the work should eventually happen
- but duplicates are part of the contract
This is often the default practical choice because losing data is usually worse than replaying it. But it only works cleanly when downstream processing is:
- idempotent
- deduplicated
- or tolerant of repetition
That is why at-least-once is not "safe by itself." It pushes responsibility onto the consumer boundary.
Concept 3: Exactly-Once Is Usually a Bounded Transactional Guarantee
Exactly-once sounds absolute, but in practice it usually means:
- inside a specific pipeline boundary, the system coordinates writes and progress markers so replays do not produce duplicate logical output
In Kafka-style systems, this commonly involves some combination of:
- idempotent producers
- transactions
- atomic commit of output records and consumed offsets
- stateful processing that can roll forward consistently after recovery
That is powerful, but it is not magic.
If your pipeline consumes from Kafka and writes back to Kafka transactionally, you can get a strong guarantee inside that loop.
But if the consumer also:
- sends an email
- calls a payment gateway
- triggers a webhook
- invokes a third-party API
then that external side effect is usually outside the broker's transactional boundary.
So the real lesson is:
exactly-onceis strongest when the whole workflow participates in the same atomic boundary- once you cross into external systems, you often fall back to
at-least-once + idempotency
This is why strong event systems still rely heavily on:
- idempotency keys
- deduplication tables
- transactional outbox patterns
- carefully designed side-effect boundaries
And it sets up the next lessons naturally:
- schema contracts define what the stream means
- stream processing lessons later will explain how stronger semantics are maintained through state, windows, and transactional coordination
Troubleshooting
Issue: "We enabled exactly-once, but users still received duplicate emails."
Why it happens / is confusing: The team assumed the broker or stream processor's transactional guarantee extended to the email provider.
Clarification / Fix: Treat the external call as a separate side-effect boundary. Use idempotency keys, deduplication, or a delivery record the email layer can check safely.
Issue: "We never see duplicates, but sometimes events seem to disappear."
Why it happens / is confusing: Offsets or acknowledgements are being recorded before the real work is durably finished.
Clarification / Fix: Check whether the consumer is committing too early. That usually means you are operating in at-most-once territory, whether you intended to or not.
Issue: "After a rebalance or crash, some records are processed again."
Why it happens / is confusing: Teams interpret replay as broker failure rather than normal recovery behavior.
Clarification / Fix: This is standard at-least-once behavior when commit and processing are separated by a crash window. Make the consumer idempotent or narrow the transactional boundary.
Advanced Connections
Connection 1: Delivery Semantics <-> Consumer Groups and Rebalancing
The parallel: The previous lesson showed that consumer groups constantly renegotiate partition ownership. Delivery semantics determine what happens when ownership changes while some work is in flight but not yet committed.
Real-world case: Rebalances expose the gap between "message read" and "message durably completed," which is exactly where duplicates or losses appear.
Connection 2: Delivery Semantics <-> End-to-End Stream Processing
The parallel: Later lessons on state stores, event time, and exactly-once pipelines depend on this one. Stateful stream processors are valuable partly because they can coordinate state updates, output writes, and progress tracking more tightly than ad hoc consumers.
Real-world case: A stream job that atomically updates state and output topic records can provide much stronger behavior than a hand-rolled consumer calling arbitrary external services.
Resources
Optional Deepening Resources
- [DOCS] Apache Kafka Documentation
- Link: https://kafka.apache.org/documentation/
- Focus: Use it as the main official reference for producer guarantees, consumer commits, transactions, and delivery semantics.
- [DOCS] Apache Kafka Documentation: Semantics
- Link: https://kafka.apache.org/documentation/#semantics
- Focus: Read it for the official project framing of producer and consumer delivery guarantees.
- [DOCS] Confluent Documentation: Kafka Message Delivery Guarantees
- Link: https://docs.confluent.io/kafka/design/delivery-semantics.html
- Focus: Use it for a practical explanation of at-most-once, at-least-once, and transactional exactly-once in Kafka ecosystems.
- [DOCS] Confluent Documentation: Exactly-Once Semantics
- Link: https://docs.confluent.io/platform/current/streams/concepts.html
- Focus: Read the transactions and exactly-once sections to understand where stronger guarantees are real and where they stop.
Key Insights
- Delivery semantics are scoped promises - Always ask which boundary is actually covered: broker write, consumer commit, state update, or final external side effect.
- Crash timing creates the guarantee -
At-most-onceandat-least-oncediffer mostly in whether you commit before or after work becomes durable. Exactly-onceis usually bounded, not universal - Strong transactional pipelines exist, but external side effects still usually need idempotency and deduplication.