Multi-Tenant Isolation in Shared Clusters

LESSON

Database Engine Internals and Implementation

045 30 min advanced

Day 446: Multi-Tenant Isolation in Shared Clusters

The core idea: In a shared database cluster, "tenant" has to be a real internal scheduling and accounting unit, not just an authentication label. Separate schemas protect names, but tenant-aware keyspaces, queues, memory budgets, and maintenance limits are what stop one customer's workload from silently spending another customer's latency budget.

Today's "Aha!" Moment

In 044.md, Harbor Point taught its shared SQL cluster to treat trader commits, desk reports, and overnight backfills as different workload classes. That improved life inside one organization. The next step is harsher: Harbor Point now offers the same reservations engine to three municipal-bond dealers, NorthPier, CedarHarbor, and BeaconStreet, on one physical cluster. At 09:31, NorthPier is submitting latency-sensitive reservation commits while CedarHarbor launches a historical reconciliation scan and BeaconStreet starts importing a month of corrections. If the engine only separates tenants at login time, the cluster is still effectively one shared pool once CPU, memory grants, WAL bandwidth, cache residency, and compaction workers come under pressure.

That is the key realization for multi-tenant isolation. It is not mainly about whether SELECT * FROM reservations resolves to the right schema. It is about whether tenant identity survives all the way down the engine so the system can answer questions like "whose memory grant is this?", "which tenant should absorb this compaction debt?", and "which requests may borrow idle capacity without endangering another tenant's commit SLO?" A shared cluster with weak answers to those questions is just a noisy-neighbor machine wearing a nicer catalog.

This corrects a common misconception. Multi-tenancy is often described as if row filters, separate databases, or API keys solve the hard part. They solve visibility. They do not solve interference. Harbor Point's problem is not merely keeping CedarHarbor from reading NorthPier data. It is preventing CedarHarbor's perfectly legal workload from turning NorthPier's reservation commits into a timeout incident.

Why This Matters

Harbor Point wants the economics of a shared cluster for a good reason. Three tenants with different time-of-day demand curves fit more efficiently on one fleet than on three mostly idle fleets. Shared replicas simplify upgrades, capacity planning, and disaster recovery. The product promise is appealing: each dealer gets its own logical database, but Harbor Point operates one engine.

The danger is that shared economics can collapse into shared pain. Suppose NorthPier pays for a 15 ms p99 commit SLO on its booking path, CedarHarbor runs a large end-of-day correction job, and BeaconStreet is experimenting with broad ad hoc analytics. If all three ultimately compete in one undifferentiated set of run queues, memory grants, buffer pages, and maintenance workers, then Harbor Point has not built tenant isolation. It has only hidden the shared cluster behind per-tenant credentials.

Good multi-tenant isolation changes the shape of failure. A burst from one tenant may still be slowed, queued, or rejected, but it does so against that tenant's own entitlement and burst policy rather than by unexpectedly degrading everyone else's. That makes SLOs, pricing tiers, and operational debugging line up with the engine's actual mechanics. It also prepares the ground for the next lesson: once the system can isolate tenants during ordinary traffic, it still has to preserve that isolation during heavy maintenance such as online reindexing.

Learning Objectives

By the end of this session, you will be able to:

  1. Explain why visibility isolation is not enough in a shared database cluster - Distinguish catalog separation from the deeper resource and maintenance controls that protect tenant SLOs.
  2. Trace how tenant identity propagates through the engine - Follow a request from authentication to key routing, admission control, execution, and background maintenance accounting.
  3. Evaluate multi-tenant isolation trade-offs in production - Compare hard reservations, burst borrowing, and tenant-aware maintenance policies for a cluster serving multiple customers at once.

Core Concepts Explained

Concept 1: Tenant identity has to survive from SQL session down to keys, locks, and background state

At Harbor Point, each dealer has a reservations table, an issuer_limits table, and its own reporting queries. A beginner-friendly design would stop at catalog isolation: authenticate the session, attach it to the right database, and prevent direct reads across tenant boundaries. That is necessary, but it is not the full engine design. The cluster still needs to know, internally, which tenant owns each piece of state it is touching.

In a real shared engine, tenant identity usually becomes part of multiple internal structures at once. Catalog resolution is scoped by tenant so NorthPier.reservations and CedarHarbor.reservations never collide. Storage keys are logically prefixed or partitioned so ranges, SSTables, or pages can be attributed to one tenant instead of living in an undifferentiated global heap. Lock table entries, table statistics, temporary spill files, query fingerprints, and quota ledgers also need that same tenant handle. If those structures are global without ownership, Harbor Point can tell who logged in but not who is consuming the cluster.

One useful way to visualize the path is:

session token
  -> tenant descriptor
  -> tenant-scoped catalog lookup
  -> plan + tenant_id
  -> key/range selection with tenant prefix
  -> execution, locks, temp files, metrics tagged by tenant_id

The practical payoff is not just security hygiene. Once the engine can attribute internal state correctly, it can make correct policy decisions later. A cache entry can be charged to CedarHarbor. A large spill file can count against BeaconStreet's temp-space budget. A range split can be explained as NorthPier growth rather than mysterious cluster churn. The trade-off is that shared-cluster simplicity disappears quickly: metadata becomes richer, some global scans become harder, and moving a tenant between service tiers is no longer just a billing change. But without this internal ownership model, the rest of multi-tenant isolation has nothing solid to attach to.

Concept 2: Real isolation is a hierarchy of budgets across CPU, memory, I/O, and background work

Once tenant identity is explicit, Harbor Point still has to decide how much of the cluster each tenant may consume and how borrowing works when capacity is idle. This is the same governance problem as 044.md, but with a stronger outer boundary. The cluster now classifies work twice: first by tenant entitlement, then by workload class inside that tenant. NorthPier might have a larger reserved share for foreground commits than BeaconStreet, while each tenant separately distinguishes booking traffic from analytical scans and maintenance.

That produces a path more like this:

tenant request
  -> classify tenant and workload class
  -> check reserved share + burst credits
  -> reserve CPU slots, memory grant, and IO/WAL tokens
  -> admit now | queue briefly | shed or defer

The key design detail is that the budgets must cover more than query concurrency. CPU slots matter, but so do memory grants for joins and sorts, temp spill space, buffer-pool residency, WAL flush bandwidth, compaction or vacuum workers, and storage queue depth. Harbor Point can set a per-tenant query limit and still fail badly if one tenant is allowed to accumulate huge hash tables, flood the WAL with dual writes, or monopolize compaction workers cleaning up its own backfill. Strong isolation therefore mixes engine-level scheduling with lower-level controls such as cgroups or container limits. Host controls stop the node from collapsing. Database controls stop one tenant from turning shared engine internals into a private resource pool.

The trade-off is utilization versus predictability. Hard reservations make Harbor Point's gold-tier tenants safer, but idle reservations can strand capacity. Soft borrowing improves cluster efficiency, but only if the engine can reclaim borrowed budget quickly when NorthPier's market-open burst arrives. Borrowing without fast clawback is just delayed interference. That is why the observability surface has to be tenant-specific: admission wait time, memory granted, spill bytes, WAL bytes, cache hit rate, compaction debt, and rejection rate need tenant labels, or operators will only discover bleed-over after a customer reports it.

Concept 3: Maintenance and failure handling are where fake isolation is exposed

Foreground query governance is the easy part to demo. The harder test arrives when a tenant starts doing maintenance-heavy work. Suppose CedarHarbor adds a new secondary index to support an audit query, while BeaconStreet bulk-loads corrected trade history from the previous month. The cluster now faces a multi-tenant isolation question that is not visible in simple query benchmarks: who pays for the extra WAL traffic, range splits, compaction debt, cache churn, and validation scans that these operations create?

If Harbor Point's maintenance subsystems are global and oblivious to tenant ownership, background work becomes a loophole through every foreground budget the team just designed. NorthPier may have protected commit slots and fair CPU scheduling, yet still suffer because another tenant's reindex or bulk ingest saturates storage queues and pushes compaction far behind. That is why mature shared systems make maintenance tenant-aware too. Index builds, backfills, vacuum debt, checkpoint pressure, and repair traffic should be pausable, rate-limited, and attributable per tenant rather than treated as one benevolent cluster background.

This is also where the limits of shared clusters become obvious. Some failure domains remain shared no matter how elegant the logical isolation is: the control plane, node kernel, storage devices, and sometimes metadata services or replication queues. For many SaaS-style workloads that is acceptable because the economics are strong and the SLOs can be enforced statistically. It is not always acceptable for tenants with strict regulatory isolation, custom extensions, hard real-time latency commitments, or highly skewed working sets. Harbor Point therefore needs an explicit graduation rule: when a tenant's steady-state demand, maintenance profile, or risk posture stops fitting the shared-cluster contract, move that tenant to dedicated hardware or a separate cluster instead of pretending policies can fix every mismatch.

That decision is the bridge to 046.md. Online reindexing sounds like a maintenance problem, but in a multi-tenant cluster it is also an isolation problem. A shadow index build and catch-up phase must consume the owning tenant's budget, not free cluster magic. Otherwise one customer's schema change becomes another customer's outage.

Troubleshooting

Issue: NorthPier's p99 commit latency spikes even though its own query volume is flat.

Why it happens / is confusing: The bottleneck may be a shared subsystem that Harbor Point is not measuring per tenant, such as WAL flush bandwidth, compaction workers, or buffer-pool residency. Tenant-local dashboards can look calm while another tenant is consuming the actual scarce resource.

Clarification / Fix: Break down shared bottlenecks by tenant ownership, not just by node totals. If commit latency rises with another tenant's WAL or maintenance surge, the isolation policy is incomplete even if SQL-level access control is correct.

Issue: Per-tenant CPU quotas look healthy, but one tenant can still trigger memory pressure and temp spill explosions.

Why it happens / is confusing: CPU fairness does not automatically imply memory fairness. Large joins, sorts, and spill files often bypass simple concurrency caps unless the engine tracks memory grants and temporary storage by tenant.

Clarification / Fix: Add tenant-scoped memory accounting, spill quotas, and queue admission tied to predicted memory cost. A shared cluster without tenant-aware memory control will usually fail in the allocator before it fails in the scheduler.

Issue: Foreground workloads are isolated correctly, yet schema changes or cleanup jobs still create noisy-neighbor incidents.

Why it happens / is confusing: Background work is often scheduled "for the cluster" rather than "for the tenant that caused it." That makes reindexing, backfills, vacuum, or compaction an isolation escape hatch.

Clarification / Fix: Charge maintenance to the owning tenant, make it pausable and rate-limited, and graph tenant-specific maintenance debt. If Harbor Point cannot explain who generated the background load, it cannot isolate it reliably.

Advanced Connections

Connection 1: 044.md teaches workload governance inside one business; this lesson turns it into an explicit customer boundary

The previous lesson protected commits from reports and backfills by classifying work inside Harbor Point's own workload mix. Multi-tenant isolation adds a stronger outer ring: first protect each tenant's entitlement, then schedule workload classes within that entitlement. The mechanics are similar, but the contract is sharper because a customer-facing SLO and pricing tier now depend on it.

Connection 2: 046.md asks whether online maintenance obeys the same tenant boundaries as foreground traffic

Online reindexing introduces shadow structures, catch-up writes, and cutover work that can generate large bursts of read and write amplification. In a shared cluster, those costs are not just maintenance details. They are another test of whether the engine can keep one tenant's operational debt from spilling into another tenant's serving path.

Resources

Optional Deepening Resources

Key Insights

  1. Tenant isolation is deeper than catalog isolation - Separate schemas and auth protect visibility, but shared-cluster safety depends on whether tenant ownership reaches keys, locks, caches, queues, and maintenance subsystems.
  2. Isolation only works when budgets cover every scarce resource that matters - CPU fairness alone is insufficient if memory grants, WAL bandwidth, temp spill space, or compaction workers remain globally first-come, first-served.
  3. Background work is the real honesty test - If index builds, bulk ingest, vacuum, or compaction are not charged and limited per tenant, the system still has a noisy-neighbor escape hatch.
PREVIOUS Admission Control and Workload Governance NEXT Online Reindexing and Maintenance at Scale

← Back to Database Engine Internals and Implementation

← Back to Learning Hub