LESSON

022 30 min advanced

Day 421: Sharding Strategies: Range, Hash, and Hybrid

The core idea: A sharding strategy decides which reads and writes stay inside one replica group and which ones turn into distributed work; range sharding buys locality, hash sharding buys spread, and hybrid schemes spend extra routing complexity to get both where the workload actually needs them.

Today's "Aha!" Moment

In 04.md, Harbor Point saw that quorum math only tells you what happens inside one replica set. That leaves the scaling question that replication alone cannot answer: which records should share a replica set in the first place? Once the municipal-bond platform outgrew a single group, the team had to choose how issuer_limits, issuer_exposure, and recent reservations for issuers like MUNI-77 would be partitioned across shards.

The pressure came from a very specific production pattern. When California released a budget update at 09:30, reservation traffic for CA issuers surged, and the risk dashboard started polling every few seconds for "all California issuers above 80% of their intraday limit." If Harbor Point kept all California issuers together because that made the dashboard simple, one shard absorbed the whole event. If it hashed every issuer blindly, the write path became smoother but the dashboard turned into a scatter-gather query across the cluster.

That is the mental shift for sharding strategy design. A shard key is not only a data-layout choice. It is a statement about which access patterns deserve locality and which ones are allowed to pay fan-out cost. Range sharding, hash sharding, and hybrid schemes all work, but they optimize different failure stories. The next lesson, 06.md, follows naturally from this one: once Harbor Point picks a strategy, it still needs a safe way to move data when shard sizes and traffic stop matching the original plan.

Why This Matters

Harbor Point cannot treat sharding as a storage-only concern because the partition boundary shapes application behavior. The reservation API needs predictable low-latency reads and writes for one issuer at a time. The risk dashboard wants bounded scans over a business grouping such as one state. Compliance wants exports and backfills that do not surprise operators by touching every node in the cluster. A poor shard layout turns one of those local operations into cross-shard coordination even when the product requirement never asked for distributed work.

That is why teams get burned by apparently reasonable choices. A pure range layout often looks elegant in design review because the routing table is easy to explain and range queries stay local. Then a single hot range dominates p99 latency. A pure hash layout often looks safer because load spreads more evenly, but now every grouping query needs a coordinator, partial-failure handling, and enough headroom for the slowest shard to set the response time. The right strategy is the one that matches the real workload shape, not the one that sounds most generally scalable.

This lesson sits between replication and rebalancing for a reason. Replication tells Harbor Point how a shard stays durable. Sharding tells Harbor Point which data belongs in that shard. Rebalancing, which comes next, only becomes meaningful after those boundaries are chosen.

Learning Objectives

By the end of this session, you will be able to:

Explain how range, hash, and hybrid sharding route the same workload differently - Show how the shard map changes locality, hotspot risk, and fan-out.
Match a shard strategy to concrete access patterns - Reason from Harbor Point's issuer updates, state-level scans, and growth profile instead of from generic rules.
Evaluate the operational trade-offs of a shard layout - Predict when metadata churn, rebalancing cost, and cross-shard queries will dominate the design.

Core Concepts Explained

Concept 1: Range sharding keeps neighboring keys together, which is powerful until the workload skews toward one range

Harbor Point's first instinct was to shard by business locality. The routing service kept a table of key intervals such as CA/A..CA/M -> shard-3, CA/N..CA/Z -> shard-4, NY/* -> shard-5. A request for issuer CA-MUNI-77 was routed by comparing the key to those boundaries and sending the read or write to the shard that owned that interval. The appeal is immediate: the risk dashboard query "show all California issuers above 80% of limit" touches only the California shards, and the result already arrives in key order.

The internal mechanism is simple and that simplicity matters operationally. A metadata service stores the current range boundaries and their owning replica groups. The router performs a boundary lookup, then the chosen shard handles the request with whatever replication policy Harbor Point uses inside that group. When one shard grows too large, the system can split the range at a new boundary and move one half elsewhere. Because the keys are contiguous, backfills, scans, and retention jobs are predictable.

Range map

CA/A..CA/M  -> shard-3
CA/N..CA/Z  -> shard-4
NY/*        -> shard-5
TX/*        -> shard-6

The weakness appears as soon as traffic is uneven across the keyspace. The California budget event did not produce uniform pressure on every state. It concentrated traffic on the CA range, so the CA shard's CPU, WAL volume, and quorum latency all climbed together. Range sharding is excellent when the busiest queries are also range-oriented and the ranges are not wildly imbalanced. It is much less forgiving when one range becomes the product's center of gravity or when new writes always land at one end of the keyspace.

That is the real trade-off. Range sharding preserves locality for ordered scans and for business groupings that line up with the key order, but it makes hotspot management a first-class operational problem. The design is not wrong; it is only wrong when the business grouping that makes reads elegant also becomes the grouping that concentrates most writes.

Concept 2: Hash sharding spreads many keys well, but it destroys the natural neighborhood that range queries rely on

To escape the California hotspot, Harbor Point tried hashing issuer_id instead of routing on its lexical range. The router computed something like bucket = hash(issuer_id) % 16, then used that bucket to choose the shard. CA-MUNI-77 and CA-MUNI-78 no longer had to live together, which meant broad traffic across many issuers was much less likely to overload a single shard.

Hash sharding changes the router from "find the interval" to "compute the placement function." That removes the need for carefully chosen range boundaries, and it usually produces a much flatter distribution when the key cardinality is high enough. For Harbor Point's reservation API, which mostly touches one issuer at a time, that can be a major improvement because the write load from thousands of issuers is now spread across the fleet rather than pooled by state.

hash(issuer_id) % 4

CA-MUNI-77 -> shard-1
CA-MUNI-78 -> shard-3
CA-MUNI-79 -> shard-0
NY-MUNI-04 -> shard-1

But the dashboard problem did not disappear; it moved. The query "all California issuers above 80% of limit" no longer aligned with placement. Harbor Point now needed a coordinator to ask every shard for its CA issuers, merge the partial results, and wait for the slowest shard that still mattered. A cluster-wide fan-out is not automatically wrong, but it changes the system contract: partial failures, uneven shard lag, and tail latency now shape a query that used to be local.

Hash sharding also has an important limit that teams often miss. It smooths load across many keys, but it does not split one extremely hot key. If CA-MUNI-77 alone becomes the bottleneck because every reservation update hits that issuer, hashing issuer_id still routes all of that issuer's writes to one place. Hashing solves skew between keys more easily than heat inside one key. That is why hash sharding is best for workloads dominated by point operations over many independent keys, not as a universal cure for every hotspot.

Concept 3: Hybrid sharding narrows the fan-out surface by keeping one dimension local and spreading load inside that boundary

Harbor Point's eventual design was hybrid rather than pure. The team kept the state boundary because dashboards, alerting, and operational ownership were state-scoped, but it stopped putting all issuers from one state on one shard. Instead, the router first chose a state partition and then hashed issuer_id inside that partition. California therefore became "CA, bucket 0" through "CA, bucket 7" instead of one giant CA shard.

That two-stage route looked like this:

def route_issuer(state_code, issuer_id):
    partition = metadata.partition_for_state(state_code)
    bucket = murmur3(issuer_id) % partition.active_buckets
    return partition.bucket_owner(bucket)

The operational effect was exactly what Harbor Point needed. A single-issuer reservation write stayed local to one bucket in one state partition. The CA risk dashboard still fanned out, but only across California's buckets rather than across the whole cluster. New York activity no longer shaped the tail latency of a California dashboard refresh, and a hot state could be split or rebalanced independently from the rest of the country.

Hybrid schemes work because they admit that production workloads usually have more than one important dimension. Harbor Point needed business locality by state and load distribution by issuer. A time-series system might need range partitioning by day and hashing inside the current day. A SaaS database might partition first by tenant and then hash inside its largest tenants. The benefit is bounded fan-out instead of all-or-nothing locality.

The cost is metadata and migration complexity. Harbor Point now has to version both the state partition map and the per-state bucket layout. A query that ignores the chosen local dimension can still touch many shards. Resharding one state must not race with clients using stale metadata. Hybrid is therefore not "best of both worlds" for free. It is a deliberate decision to spend more control-plane complexity so the hottest business dimension stays manageable without turning every analytical grouping into a full-cluster broadcast.

Troubleshooting

Issue: The California shards are saturated even though total cluster utilization looks healthy.

Why it happens / is confusing: Range sharding preserved the exact business grouping the dashboard uses, so the market event concentrated both writes and reads on the same narrow part of the keyspace.

Clarification / Fix: Split the hot range sooner, or move to a hash or hybrid design that breaks the busiest state or range into smaller independently placeable units.

Issue: After switching to hash sharding, the risk dashboard became slower even though writes got faster.

Why it happens / is confusing: The write path benefited from better spread, but the dashboard lost locality and became a scatter-gather query whose latency is now set by coordinator overhead and the slowest participating shard.

Clarification / Fix: Accept the fan-out explicitly, build a derived state-scoped view, or choose a hybrid layout that keeps state-level queries bounded instead of cluster-wide.

Issue: One issuer is still a hotspot after hashing by issuer_id.

Why it happens / is confusing: Hashing distributes different issuers across shards, but every write for the same issuer still hashes to the same target.

Clarification / Fix: Treat this as a single-key hotspot problem. That may require a finer-grained key, write bucketing, or an application-level redesign rather than a different hash function.

Issue: During shard moves, some requests briefly hit the wrong owner.

Why it happens / is confusing: The data move may have completed on the storage side before every router refreshed its metadata view.

Clarification / Fix: Version the shard map, reject stale routing epochs, and use forwarding or dual-read guards during cutover. The next lesson, 06.md, focuses on that migration path in detail.

Advanced Connections

Connection 1: 04.md makes consistency a per-shard question; this lesson decides what a shard even contains

Quorum reads and writes help Harbor Point reason about freshness for one replica set. Sharding strategy decides which keys share that replica set and therefore which operations can rely on one quorum instead of cross-shard coordination. A clean quorum design can still feel bad in production if the shard boundary puts the wrong workload together.

Connection 2: 06.md turns static placement into live movement

Range boundaries, hash buckets, and hybrid partitions are only useful while they still match the traffic shape. Rebalancing is the mechanism that updates those placements under load. The difference between moving a contiguous range and moving a hash bucket is one of the main operational consequences of the strategy chosen here.

Resources

Optional Deepening Resources

[BOOK] Designing Data-Intensive Applications
- Focus: The partitioning chapter explains why range partitioning, hash partitioning, and secondary-index layout create different locality and fan-out behavior.
[DOC] MongoDB Manual: Choose a Shard Key
- Focus: Practical criteria for cardinality, monotonic keys, and query isolation when selecting a shard key in production.
[DOC] Vitess Docs: Vindexes
- Focus: How a production sharded SQL system routes rows with hash, lookup, and application-aware strategies instead of relying on one universal partitioning rule.
[DOC] Citus Docs: Data Modeling with Distributed Tables
- Focus: Co-location, distribution-column choice, and the cost of cross-shard joins in a PostgreSQL-based distributed system.

Key Insights

Range sharding preserves neighborhood - It makes ordered scans and business-group queries efficient, but it turns skewed ranges into operational hotspots.
Hash sharding preserves spread - It evens out many independent keys, but it breaks the locality that range queries and grouped reports depend on.
Hybrid sharding is a bounded-fan-out strategy - It spends extra metadata and routing logic so the system can keep one important dimension local while still distributing load inside that boundary.

← Back to Consistency and Replication

← Back to Learning Hub