Day 494: Change Data Capture as Integration Backbone

The core idea: CDC turns the source database's commit log into a durable feed of committed changes, which removes many dual-write races but replaces them with log-position management, schema-governance, and replay discipline.

Today's "Aha!" Moment

In 062.md, Harbor Point made overload visible in manifest-projector by keeping backlog in Kafka instead of in process memory. That helped the consumer survive bursts, but it did not solve a more basic integration problem. The booking platform still updated PostgreSQL, published a Kafka event, and called the CRM API through three different code paths. When the API server crashed after committing the booking row but before publishing booking-confirmed, the manifest stayed correct because it read the database directly, while finance and CRM stayed stale because their trigger lived in application code that never finished.

CDC changes the shape of that failure. Instead of asking every write path to remember which downstream systems need to hear about the change, Harbor Point lets PostgreSQL's write-ahead log become the one place that records "this transaction really committed." A connector tails that log, translates committed row changes into ordered records, and ships them to Kafka for downstream consumers. The non-obvious point is that CDC is not magic event-driven architecture; it is a disciplined way to reuse an existing durability boundary as an integration boundary.

That distinction matters because CDC gives you facts about committed state, not automatically perfect business events. Harbor Point can trust that booking_id=8841 changed from pending to confirmed because that happened in the transaction log. It cannot assume the raw row change explains why the upgrade happened, whether the CRM should notify the guest, or which downstream systems are allowed to react synchronously. The trade-off is attractive but real: CDC removes the classic "database write succeeded, event publish failed" race, while pushing more responsibility into schema compatibility, downstream enrichment, and replay-aware operations.

Why This Matters

Once a company has more than one downstream consumer, integration patterns become a reliability issue rather than a code-organization preference. Harbor Point's booking database feeds an embarkation dashboard, a revenue warehouse, a loyalty CRM, and a cabin-search index. If each consumer gets updates through a different path, every new feature quietly creates another dual write, another polling job, or another webhook retry loop. Each mechanism can work in isolation. Together they produce a system where nobody can answer a simple incident question: "Which committed booking changes are downstream systems still missing?"

CDC makes that question answerable because it centralizes change propagation around a durable sequence of commits. A lagging connector shows up as log lag. A broken sink shows up as consumer lag. A schema change shows up as a decode failure or incompatible payload. The architecture becomes easier to operate because the source of truth for propagation is explicit.

The production consequence is not just cleaner code. It is fewer silent data drifts between systems, simpler backfills, and a reusable stream that multiple consumers can replay independently. The price is that the database schema now matters outside the service boundary, transaction-log retention becomes part of capacity planning, and teams must decide when a row-level change is sufficient versus when they need a higher-level domain event. CDC is strong precisely because it narrows one class of failure while exposing others that used to be hidden.

Core Walkthrough

Part 1: Grounded Situation

Keep one Harbor Point flow in view:

booking-api -> PostgreSQL primary
            -> WAL / logical replication stream
            -> cdc-relay
            -> booking-changes topic
            -> manifest service / finance warehouse / CRM sync / search index

Suppose guest #8841 confirms cabin S12 for the July 14 sailing. The booking transaction updates three tables in one commit:

UPDATE bookings
SET status = 'confirmed', confirmed_at = NOW()
WHERE booking_id = 8841;

INSERT INTO payments (booking_id, amount_cents, status)
VALUES (8841, 220000, 'authorized');

UPDATE cabin_inventory
SET reserved = true
WHERE sailing_id = '2026-07-14' AND cabin_id = 'S12';

If Harbor Point implements integration as synchronous fan-out from booking-api, the write path now has to update PostgreSQL, publish a Kafka event, and maybe call the CRM before it can claim success. That increases latency and still leaves a crash window after the database commit. If it implements integration as table polling, the warehouse and CRM repeatedly query for updated_at > last_seen, which adds load, misses deletes unless extra conventions exist, and can observe rows in a different order than the original commit stream.

CDC uses a different lever. The application performs its normal database transaction and stops there. After commit, PostgreSQL advances the WAL and logical decoding exposes the committed changes as a replayable stream. The integration contract becomes: if the source transaction commits, the change will eventually appear in the CDC stream; if it does not commit, downstream consumers should never see it.

Part 2: Mechanism

The internal mechanism matters more than the label "CDC." The connector is not watching tables with ad hoc queries. It is reading the database's own durability machinery.

Harbor Point's transaction commits on PostgreSQL.
The commit generates WAL records at a specific log sequence number, or LSN.
A logical decoding slot lets cdc-relay read those committed changes in commit order without re-querying the tables.
The relay converts each committed row change into a Kafka record that carries table identity, key columns, operation type, and source position such as LSN and transaction metadata.
Downstream consumers checkpoint their own offsets independently, so finance can replay a week of history without forcing the manifest service to do the same.

That gives Harbor Point an integration backbone because the connector can survive restarts by resuming from the last durable source position it emitted. A simplified emitted record might look like this:

{
  "source_table": "bookings",
  "op": "u",
  "key": {"booking_id": 8841},
  "before": {"status": "pending"},
  "after": {"status": "confirmed", "confirmed_at": "2026-04-06T09:14:22Z"},
  "source_position": {"lsn": "0/16B6A90", "tx_id": 731992}
}

Two details are easy to miss in production. First, commit ordering is only as strong as the boundary you preserve. The WAL gives Harbor Point transaction order at the source database. Once the relay partitions records by key in Kafka, consumers get a stable order per partition or per key, not a single global order across the whole estate. Second, the stream contains committed state transitions, not the application-level meaning of those transitions. A row update can tell finance that the booking became confirmed; it may not tell the CRM whether the guest was auto-upgraded because of loyalty status or manually upgraded by an agent unless Harbor Point stores that meaning explicitly.

Snapshots are the other half of the mechanism. When Harbor Point adds a new consumer, it usually cannot start from "events after now" because the consumer needs historical state too. CDC systems therefore combine an initial snapshot with log streaming. The hard part is avoiding a torn view where the snapshot sees one version of a row but the log stream starts too late to capture the change that happened during the snapshot. Mature connectors solve this with snapshot markers, locking strategies, or low/high watermark techniques tied to source positions. That operational detail is why CDC is more than "read some rows and then tail updates."

Part 3: Implications and Trade-offs

Used well, CDC removes several painful design constraints. Harbor Point no longer needs the booking service to know every consumer that cares about booking state. New integrations can subscribe to booking-changes without modifying the write path. Replays become normal because the source position is durable. Audit questions get easier because the team can compare source-log lag, topic lag, and sink lag instead of stitching together API logs and cron timestamps.

The trade-off is that CDC also widens the blast radius of source-schema decisions. Renaming bookings.status to booking_state, splitting one table into three, or changing a column type is no longer purely local if downstream consumers deserialize those fields directly. Harbor Point needs schema versioning, compatibility checks, and consumer contracts even though the source of truth is "just the database." This is why many teams pair CDC with the outbox pattern: row-level CDC is excellent for committed fact propagation, but semantic business events often belong in an outbox table whose schema is designed for consumers rather than leaked from internal tables.

Backpressure from 062.md also reappears here in a more dangerous place. If cdc-relay falls behind for hours because Kafka is unavailable or sink throughput drops, PostgreSQL may have to retain WAL segments or replication-slot state much longer than planned. The result is not merely stale downstream data; it can become source-database disk pressure. CDC therefore turns flow control into a source-side operational concern. The same mechanism that makes replays and recovery easy also means lag has to be monitored where the log is produced, not just where it is consumed.

This is where CDC earns the word "backbone." It is not the only integration pattern Harbor Point needs, but it is the one that can reliably fan committed facts out to many consumers without re-embedding the same publish logic into every service. The team should choose it when the source database is authoritative and downstream systems need to react to committed state. It should reach for explicit domain events or an outbox when downstream consumers need meaning that raw table mutations do not express cleanly.

Failure Modes and Misconceptions

"CDC gives me exactly-once integration automatically." CDC typically gives durable source ordering and restartable positions, not end-to-end exactly-once business effects. Downstream sinks still need idempotency or transactional boundaries, which is the same lesson 061.md made about replay-safe consumers.
"Polling tables is basically the same thing." Polling can approximate change propagation, but it usually adds query load, weakens delete handling, and turns commit order into guesswork around updated_at columns. CDC reads the actual commit log instead of reconstructing history from current table state.
"A replica or connector lagging behind is only a downstream freshness problem." In many databases, slow CDC consumers can hold onto log retention through replication slots or equivalent mechanisms. That means integration lag can create storage pressure on the source system itself.
"CDC means consumers should bind directly to whatever columns exist today." That is the fastest way to make schema evolution painful. Consumers need a compatibility contract, and in many cases Harbor Point should publish normalized events from an outbox instead of leaking internal table shape.
"If CDC captures every row change, I no longer need application events." Row changes answer "what committed." They often do not answer "what business event should other systems react to?" A reservation merge, fraud-review hold, or loyalty upgrade may require an event model with clearer semantics than raw table mutations provide.

Connections

Connection 1: 061.md defined the replay boundary; CDC reuses it for integration

Exactly-once myths disappear once you name the durable boundary. CDC works for the same reason replay-safe consumers work: the source database already has a commit log, so Harbor Point can resume from a precise position instead of guessing what changed.

Connection 2: 062.md explained overload as a control problem; CDC pushes that problem closer to the source

When a projector falls behind, Harbor Point risks stale read models. When a CDC connector falls behind, Harbor Point may also retain WAL and pressure the primary database. The mechanism is still backpressure, but the failure surface is broader.

Connection 3: 064.md turns captured changes into projections and stream tables

Once CDC provides a trustworthy change stream, the next question is how to materialize it into read models without losing ordering, dedupe discipline, or replayability. That is the job of projections and stream tables.

Resources

[DOC] PostgreSQL Logical Decoding Concepts
- Focus: See how logical decoding exposes committed WAL changes and why replication slots matter for restartable consumption.
[DOC] PostgreSQL Logical Replication
- Focus: Review the publication/subscription model and the operational constraints around schema changes and replication lag.
[DOC] Debezium Documentation
- Focus: Study how a production CDC platform represents snapshots, source positions, schema history, and connector recovery.
[BOOK] Designing Data-Intensive Applications
- Focus: Revisit the sections on replication logs, event streams, and data integration to place CDC among outbox, stream processing, and materialized views.

Key Takeaways

CDC works because it reuses the database's own commit log as the authoritative record of committed change, which removes many application-level dual-write races.
The stream carries durable facts about state transitions, not automatically the full business meaning of those transitions, so row-level CDC and domain events solve different problems.
Replayability is a feature only if source positions, snapshots, and backpressure are operated deliberately; otherwise connector lag can become source-database risk.
CDC becomes an integration backbone when multiple consumers need the same committed facts, but it still requires schema governance, idempotent sinks, and clear boundaries around what the stream promises.

← Back to Consistency and Replication

← Back to Learning Hub