Plumtree - Epidemic Broadcast Trees

LESSON

Gossip, Membership, and Epidemic Systems

009 30 min intermediate

Plumtree - Epidemic Broadcast Trees

The core idea: Plumtree combines eager tree broadcast with lazy epidemic repair, trading a little control complexity for common-case efficiency without losing recovery paths when the tree breaks.

Core Insight

Suppose a cluster has to broadcast every published event to thousands of subscribers. A simple tree is attractive: each node forwards the full payload to its children, and the message reaches the cluster with little duplication. But if an internal link is stale or a forwarding node misses the message, an entire branch can be left behind.

Pure epidemic broadcast has the opposite shape. Nodes forward to multiple peers, so alternate paths naturally repair missed deliveries. The cost is duplication: the same payload may arrive several times, consume bandwidth several times, and force receivers to deduplicate constantly.

Plumtree's non-obvious move is to stop choosing between those extremes. It sends full messages along an efficient eager tree in the common case, while lazy gossip sends lightweight notifications along backup edges. If a node hears that a message exists but did not receive the payload, it can request repair and rebuild the tree around working paths.

The trade-off is efficiency versus resilience. A pure tree is cheap but brittle. Pure gossip is resilient but noisy. Plumtree adds protocol machinery so the expensive repair behavior is available when needed without paying full epidemic cost for every successful broadcast.

Why Trees and Gossip Each Fall Short

A broadcast tree gives a clean mental model:

        A
      / | \
     B  C  D
    / \    \
   E   F    G

If A publishes a message and every edge works, each node receives the payload once. Forwarding is predictable, bandwidth is controlled, and duplicate delivery is low.

The weakness is that the tree is a single preferred structure. If B fails to receive or forward the message, E and F may never learn that the message existed:

        A
      / | \
     B  C  D
    / \    \
   E   F    G

B misses message
  -> E and F miss message too

Epidemic broadcast handles that failure better because messages travel through several paths. A missed path is not fatal if another neighbor eventually forwards the payload. But carrying the full payload through many redundant paths creates extra traffic and repeated deduplication work.

The useful design question is not "tree or gossip?" It is "which path should carry the full payload most of the time, and which path should exist mainly to notice and repair gaps?"

Eager and Lazy Links

Plumtree keeps two kinds of peer relationships for broadcast.

Eager links carry the full message immediately. Together, they form a tree-like fast path:

A --full message--> B --full message--> E
 \                 \
  \                 -> F
   -> C --full message--> G

Lazy links carry only a small announcement that a message exists. The announcement is often described as an IHAVE style notification: "I have message m123." It does not include the full payload.

A --IHAVE m123--> D
C --IHAVE m123--> F

That split matters. Eager forwarding optimizes the normal case. Lazy notification keeps extra paths alive without paying full payload cost on each one.

A simplified receive path looks like this:

on full message m:
  if m is new:
    deliver m
    forward full m to eager peers
    announce m to lazy peers
  else:
    tell sender to prune this eager edge

on lazy notification for m:
  if m is missing after a short delay:
    graft toward notifier and request full m

The delay before requesting repair is important. It gives the eager tree a chance to deliver the message first. If the payload arrives normally, the lazy notification costs little. If the payload never arrives, the lazy edge reveals the gap.

Prune and Graft

Plumtree adapts its broadcast structure through two corrective messages.

PRUNE says, in effect, "this eager edge is creating duplicates; stop sending me full payloads on it." When a node receives a duplicate full message from a peer, that peer is probably not needed as an eager parent for that broadcast path. Demoting the edge to lazy reduces future duplication.

GRAFT says, "I heard that a message exists but did not receive it; promote this lazy edge and send me the payload." When the eager tree fails to reach a node, the lazy path becomes a repair path and may become eager afterward.

The adaptive loop looks like this:

duplicate full payload arrives
  -> send PRUNE
  -> eager edge becomes lazy
  -> less duplicate traffic next time

lazy notice arrives but payload is missing
  -> send GRAFT
  -> lazy edge becomes eager
  -> missing payload is repaired

This is the mechanism that makes Plumtree more than "tree plus gossip." The tree is not fixed. It is continuously shaped by delivery evidence. Edges that waste bandwidth are demoted. Edges that repair missed delivery are promoted.

Worked Example

Imagine A broadcasts message m123. The current eager tree should send the full message through B to E, while D has only a lazy link to E.

A -> B -> E    eager full-message path
D -> E         lazy notification path

Now B is slow or the B -> E edge is broken. E does not receive the full payload from the eager tree. Later, E receives a lazy notification from D:

D -> E: IHAVE m123

E checks its message cache and sees that m123 is missing. It sends a repair request:

E -> D: GRAFT m123
D -> E: full message m123

After repair, the D -> E edge may become eager because it proved useful. The protocol has learned from failure and adjusted the fast path.

The opposite adjustment happens when a node receives duplicates. If E gets m123 from both B and D, it can keep one eager path and prune the other:

E -> D: PRUNE
D -> E: future messages are lazy notifications, not full payloads

That is the core balance: repair missing branches, but trim redundant payload paths.

Implications and Trade-offs

Plumtree works well when most broadcasts follow a stable tree most of the time, but the system still needs a way to detect broken branches. The eager path keeps steady-state cost low. The lazy path keeps the system from silently losing messages when the eager path is stale.

The costs are real:

The main trade-off is common-case bandwidth versus repair responsiveness. Longer lazy delays reduce unnecessary repair requests but make misses slower to heal. Shorter delays repair faster but may request payloads that were already about to arrive.

Plumtree is therefore not a guarantee of zero duplicates or zero misses. It is a structured compromise: pay near-tree cost when things work, and keep epidemic-style evidence available when the tree stops working.

Operational Failure Modes

Assuming the tree is authoritative

The eager tree is a preferred fast path, not the only path. If lazy repair is disabled or too weak, broken branches can silently miss messages.

Keeping too many eager edges

If duplicate payloads are not pruned, Plumtree drifts toward expensive epidemic broadcast. Deduplication still protects correctness, but bandwidth cost rises.

Repairing too aggressively

If lazy notifications trigger immediate repair, nodes may request payloads that the eager path would have delivered moments later. A short delay prevents unnecessary graft churn.

Ignoring the underlying overlay

Lazy repair only helps if alternate paths exist. A poor membership or partial-view overlay can leave the protocol without useful backup edges.

Connections

SWIM and Lifeguard harden membership and failure detection. Plumtree uses the same broad lesson for broadcast: the fast path needs a repair path because real distributed systems spend time in partial failure and churn.

HyParView is a natural companion because Plumtree needs a resilient peer overlay. The quality of eager and lazy links depends on the membership topology underneath.

Anti-entropy, next, applies a similar fast-path/repair-path idea to replicated data. Broadcast tries to deliver messages now; anti-entropy repairs state later when delivery or replication was incomplete.

Resources

Key Takeaways

  1. Pure trees minimize duplicate traffic but can lose whole branches when an edge fails.
  2. Pure epidemic broadcast is resilient but pays for that resilience with duplicate payload traffic.
  3. Plumtree sends full messages on eager tree links and uses lazy notifications to detect and repair missed deliveries.
  4. PRUNE and GRAFT let the broadcast tree adapt by trimming wasteful edges and promoting useful repair paths.
PREVIOUS SWIM Improvements - Infection Dampening & Health Multipliers NEXT Anti-entropy with Merkle Trees