Partitioning Strategies and Shard Keys

LESSON

Consistency and Replication

013 30 min advanced

Partitioning Strategies and Shard Keys

The core idea: A shard key is a workload argument encoded in data layout: it decides which operations stay local, which invariants remain cheap, and where real traffic will concentrate.

Core Insight

Harbor Point has accepted the lesson from sharding: the reservation system needs multiple authority domains. The next question is sharper. Which key should decide those domains?

The tempting answer is "pick the field we query most." For Harbor Point, that might be desk_id, because traders often view their own reservations, or issuer_id, because risk teams inspect exposure by issuer. But the strongest path is not a dashboard. It is the decision that one scarce allocation bucket can move from held to confirmed exactly once. If two desks race for the same bucket, the system wants one shard to decide that invariant, not a distributed workflow by accident.

That is the non-obvious role of a shard key. It is not just a spreading function. It says which facts belong under the same owner. A key that makes convenient reads local can make decisive writes cross-shard. A key that keeps the invariant local may force search, reporting, and history screens into derived indexes or projections.

For Harbor Point, the strongest candidate is an allocation-centric key such as allocation_bucket_id, derived from the issuer, instrument, settlement window, and reservation bucket. That does not make every query perfect. It makes the most important consistency boundary explicit.

Logical Partitions Before Physical Shards

Harbor Point separates three concepts that teams often blur together:

That distinction lets the data model stay stable while placement changes. Harbor Point does not want application code to say "send this issuer to machine 7." It wants application code to compute a stable logical partition, then let the partition map decide which physical shard owns that partition today.

def allocation_partition(issuer_id: str, instrument_id: str, window: str) -> int:
    key = f"{issuer_id}:{instrument_id}:{window}"
    return murmur3(key) % 4096


def route_confirmation(command):
    partition = allocation_partition(
        command.issuer_id,
        command.instrument_id,
        command.settlement_window,
    )
    return partition_map.owner_for(partition)

With 4096 logical partitions on 12 physical shard groups, Harbor Point can later move partition 173 from shard B to shard F without redefining the application key. That is why mature sharded systems usually keep more logical partitions than machines. The key defines ownership; placement is allowed to evolve.

This also keeps the next lesson possible. Rebalancing can move logical partitions under live traffic only if those partitions already exist as stable units of ownership.

Choosing the Boundary

Harbor Point evaluates candidate keys by asking: "Which state must be decided together on the strongest path?"

Candidate key What becomes local What breaks first
desk_id Trader dashboards and desk-scoped history Two desks competing for the same allocation cross shards
issuer_id Issuer exposure checks and issuer-scoped reports A hot issuer can overload one shard during volatile markets
allocation_bucket_id Hold, release, extend, and confirm for one scarce allocation bucket Desk history and issuer-wide reports need derived views

The last option is usually the safest for the decisive write path. If bucket B-173 is the unit that can be reserved exactly once, then the rows that enforce that invariant should live with the owner of B-173. The confirmation path can stay single-shard:

def confirm_allocation(command):
    shard = route_confirmation(command)
    return shard.transaction(
        assert_bucket_state(
            bucket_id=command.bucket_id,
            status="held",
            hold_id=command.hold_id,
        ),
        mark_bucket_confirmed(command.bucket_id),
        append_audit_event(command.hold_id, "confirmed"),
    )

That local transaction is the payoff. Harbor Point still replicates the shard for durability and failover, but it does not need cross-shard coordination merely to decide whether one allocation bucket is available.

The cost is that not every read is local. A dashboard that asks "show every reservation for desk 12" may need a desk-index projection. A risk report that asks "show issuer-wide exposure" may need an issuer-index projection. Those read models are not failures. They are the price of keeping the decisive write invariant cheap.

Strategy Trade-offs

Different partitioning strategies emphasize different risks.

Strategy                 Helps with                         Hurts with
-----------------------  ---------------------------------  -----------------------------------
hash by stable id         even spread, simple ownership      range scans and locality
range by business field   locality, ordered scans            hot ranges and monotonic writes
directory lookup          custom placement, tenant control   metadata complexity and routing bugs
compound key              matching a domain invariant        schema/API discipline

Hashing allocation_bucket_id spreads writes across many partitions, which is useful when no single issuer dominates. It also destroys natural range locality. If the risk team wants all buckets for CA-MUNI in one scan, the system may need an issuer projection or a scatter-gather query with a clear latency budget.

Range partitioning by issuer_id keeps issuer reports local, but it can create a hot shard exactly when the market focuses on one issuer. Splitting by issuer_id + instrument_id may help, but the design still has to ask whether the key has enough cardinality and whether the hot path carries the key at request time.

A directory strategy can place specific tenants, issuers, or buckets wherever the control plane chooses. That is flexible, especially for large customers or uneven workloads, but it turns routing metadata into a larger operational surface. The directory must be cached, versioned, monitored, and kept consistent during rebalancing.

No strategy wins universally. The shard key should be chosen with the system's strongest operations, hottest traffic, and most painful migrations in view.

Evaluating a Shard Key

A practical shard-key review for Harbor Point asks five questions:

  1. Is the key present at the start of every authoritative write?
  2. Does the key keep the strongest invariant inside one shard?
  3. Does the key have enough cardinality to spread real production traffic?
  4. Is the key stable, or would normal business edits move ownership?
  5. What read models, indexes, or projections are required because of this choice?

The answers should become metrics after launch. Average cluster QPS is not enough. Harbor Point watches per-shard write rate, hottest logical partition concentration, cross-shard transaction count, scatter-gather read count, routing-cache misses, and partition-size skew. Those metrics show whether the chosen key still matches the workload.

The lesson is not that allocation_bucket_id is magic. It is that the primary shard key should encode a deliberate claim: "This is the smallest state we need to decide together." Everything that does not fit that claim should be handled explicitly instead of smuggled into the strongest path.

Failure Modes

Choosing the friendliest read query as the primary key. This often makes dashboards pleasant while pushing critical writes across shards. Start with the strongest invariant, then build projections for convenience reads.

Using mutable business fields as shard keys. Region, desk ownership, priority tier, or routing label may be meaningful, but if the value changes, ownership changes. Use stable identifiers for primary ownership and expose mutable fields through indexes or derived views.

Assuming hashing eliminates hotspots. Hashing smooths load only when the input key has enough cardinality and traffic is not dominated by a tiny set of keys. Monitor hot logical partitions, not just physical shard averages.

Baking physical placement into the data model. If records know they live on shard_7, rebalancing becomes a schema problem. Keep logical partitions stable and map them to physical shards through versioned metadata.

Letting derived views pretend to be authoritative. Desk and issuer projections are useful, but the owner of the allocation bucket should remain the source of truth for hold, release, extend, and confirm decisions.

Connections

Resources

Key Takeaways

  1. A shard key decides which work stays local, not merely how rows are distributed.
  2. The primary shard key should usually match the smallest invariant boundary that must be decided together.
  3. Hash, range, directory, and compound-key strategies all trade locality, spread, routing complexity, and hotspot risk differently.
  4. Many logical partitions on fewer physical shards make future rebalancing possible without redefining the data model.
PREVIOUS Sharding and Authority Boundaries NEXT Rebalancing Partitions Under Live Traffic