Distributed Schedulers and Control Planes: Admission, Policy, and API Control Surfaces

LESSON

Distributed Schedulers and Control Planes

013 35 min advanced

Distributed Schedulers and Control Planes: Admission, Policy, and API Control Surfaces

The core idea: Admission control is the boundary where requests become desired state, so the design trade-off is between rejecting unsafe work early and keeping the control-plane front door available, understandable, and fast.

Core Insight

Suppose a team submits a new risk-api deployment while the scheduler policy canary is still active. The request asks for four GPU-backed replicas, a high-priority class, a zone preference near the fraud database, and a label that opts into scheduler-policy-v5. If the API server simply stores the object, every later controller must discover whether the request is allowed, well formed, within quota, compatible with tenant policy, and safe for the canary scope.

Admission control exists because some decisions should happen before an object enters desired state. The API surface is not just a data mailbox. It is the front door to the control plane: it authenticates callers, authorizes actions, fills defaults, rejects invalid combinations, enforces policy, and records objects in a form that downstream controllers can safely reconcile.

The tempting mistake is to put every rule into admission because early rejection feels cleaner. But admission is on the write path. If it becomes slow, unavailable, inconsistent, or too clever, the whole platform becomes harder to operate. Good admission design separates rules that must block writes from rules that can be handled later by reconciliation, scheduling, backpressure, or human review.

The API Write Path

A simplified write path looks like this:

request
  -> authenticate caller
  -> authorize verb and resource
  -> decode and schema-check object
  -> apply defaults and allowed mutations
  -> validate invariants and policy
  -> reserve or check quota when needed
  -> persist desired state
  -> controllers observe and reconcile

Each step answers a different question:

Authentication: who is making the request?
Authorization: may this caller perform this action on this resource?
Defaulting and mutation: what fields should be filled or normalized before storage?
Validation: is the object structurally and semantically acceptable?
Policy: does this request satisfy platform, tenant, security, or rollout rules?
Quota and admission accounting: is the caller allowed to claim the requested capacity?
Persistence: what desired state becomes authoritative for controllers?

For risk-api, defaulting might add a standard topology spread rule. Validation might reject an invalid priority class. Policy might allow scheduler-policy-v5 only in the canary namespace. Quota might reject a GPU request that would exceed the team's reservation. These checks happen before the scheduler ever sees a pending pod.

Defaulting, Mutation, and Validation

Admission systems often support both mutation and validation. Mutation changes the request before storage. Validation accepts or rejects the final object.

Mutation is useful for mechanical, predictable normalization:

fill missing resource requests from a tenant default
add an owner label or audit annotation
choose a default scheduler profile
inject a sidecar required by platform policy
normalize deprecated fields into the current shape

Validation is useful for invariants:

reject a high-priority class outside approved namespaces
require GPU workloads to declare memory and topology constraints
prevent tenants from selecting protected nodes directly
block mutually incompatible fields
require rollout opt-in labels to match the active canary scope

The order matters. A validator should usually inspect the final object after defaulting and mutation. Otherwise a request can be rejected for a field that the platform would have filled, or accepted before mutation creates a conflicting state.

Mutation also has a trust cost. If a request enters as one thing and gets stored as another, users and controllers need clear visibility into what changed. Hidden mutation makes debugging difficult. For important fields, explicit defaults, audit records, and stable API conventions are easier to operate than surprising rewrites.

Policy Placement

Not every policy belongs in the same place. A useful design asks where the decision has the best information and the safest failure mode.

Admission is a good home for rules that:

are cheap to evaluate on the write path
depend mostly on the submitted object and stable policy state
prevent unsafe desired state from existing at all
can produce a clear rejection reason for the caller
need to reserve quota or enforce ownership before other controllers act

Reconciliation is a better home for rules that:

need live cluster state that may be stale or expensive to read
can be repaired idempotently after storage
produce a pending or degraded state rather than a hard rejection
require long-running checks or external systems
should continue making progress when policy services are temporarily unavailable

Scheduling is a better home for placement choices:

which node or zone should run the work
whether current capacity satisfies constraints
whether preemption or backpressure is needed
which pending request should go first

For example, admission can reject risk-api if it claims a protected priority class without permission. The scheduler should decide which valid node gets the replica. The quota controller may own durable accounting. The rollout controller may decide whether the canary scope should expand. A clean API control surface keeps those responsibilities separate enough that each failure has a visible owner.

Staleness and Failure Policy

Admission often wants context: tenant quota, active rollout phase, allowed image registries, policy versions, or namespace labels. Some of that context may come from local caches like the ones in the previous lesson. That creates a hard question: what happens if the context is stale or the policy engine is unavailable?

There are two broad choices:

Fail closed: reject or hold requests when policy cannot be evaluated.
Fail open: allow requests and rely on later reconciliation, audit, or cleanup.

Fail closed protects the platform from unsafe writes, but it can turn a policy service outage into a platform-wide write outage. Fail open keeps users moving, but it may admit work that violates isolation, quota, or security expectations. The right answer depends on the consequence.

Security and ownership boundaries usually fail closed. A request that might run privileged code, consume protected GPU capacity, or bypass tenant isolation should not be admitted from uncertain policy state. Low-risk hints, labels, or non-critical defaults may fail open with an audit event and later repair.

Freshness should be explicit. If admission uses cached namespace labels to decide whether scheduler-policy-v5 is allowed, the decision should record which policy version and cache version it used. If a canary gate changed ten seconds ago, admission should not silently use yesterday's view.

Worked Example: Admitting a GPU Workload

Imagine a tenant submits:

service: risk-api
namespace: payments-prod
replicas: 4
resources: 1 GPU, 8 CPU, 32 GiB memory per replica
priorityClass: recovery-critical
schedulerPolicy: v5-canary
zonePreference: zone-b

A disciplined admission path can produce this result:

1. Authenticate the caller as payments-deployer.
2. Authorize create on deployments in payments-prod.
3. Default missing topology-spread and scheduler profile fields.
4. Validate that GPU memory and CPU requests are declared.
5. Check that payments-prod may use recovery-critical priority.
6. Check that v5-canary is active for this namespace and service.
7. Reserve or verify quota for the requested GPUs.
8. Store the normalized object with policy and quota decision annotations.

If the request fails, the rejection reason should be actionable:

rejected: schedulerPolicy v5-canary is not active for namespace analytics-dev

That is better than storing the object and letting it sit pending with a vague scheduling failure. It is also better than a generic "forbidden" message that forces the user to guess which policy blocked them.

Now imagine the quota service is unavailable. If the workload claims protected GPU capacity, fail closed may be appropriate because admitting it could overcommit a recovery lane. If the request only adds a non-critical annotation, fail open with audit may be enough. Admission design is partly about classifying these consequences before the outage happens.

Operational Failure Modes

Policy in the wrong layer: admission tries to make live placement decisions from stale state. The fix is to validate eligibility early and leave placement to the scheduler.
Slow admission chain: every write waits on many external policy calls. The fix is bounded latency, cached policy, clear ordering, and small policy surfaces.
Hidden mutation: users submit one object and controllers see another with no explanation. The fix is explicit defaults, audit annotations, and documented API behavior.
Fail-closed outage: a policy dependency blocks unrelated platform writes. The fix is class-specific failure policy and narrow admission scopes.
Fail-open boundary breach: unsafe work enters desired state during policy uncertainty. The fix is fail-closed for security, ownership, quota, and protected capacity boundaries.
Unclear rejection reasons: users cannot repair requests. The fix is precise messages that name the violated field, policy, and scope.

Connections

The previous lesson, 012.md, explained cache staleness. Admission often reads cached context, but must decide when freshness is required before accepting a write.
The next lesson, 014.md, applies these boundaries to multi-tenant isolation and noisy neighbor control.
identity-authorization-and-policy-systems goes deeper on authorization, policy languages, and enforcement models.

Resources

[DOC] Kubernetes Admission Controllers
- Focus: Study where admission fits in the Kubernetes request path and which built-in controls exist.
[DOC] Dynamic Admission Control
- Focus: Look at mutating and validating webhooks, ordering, failure policy, and operational risks.
[DOC] Validating Admission Policy
- Focus: Use declarative validation as an example of policy that stays close to the API server.
[DOC] Kubernetes RBAC Authorization
- Focus: Separate authorization decisions from admission and downstream scheduling decisions.
[DOC] OPA Gatekeeper
- Focus: Compare policy-as-code admission with constraint templates, audit, and enforcement modes.

Key Takeaways

Admission control is the write-path boundary where requests become desired state.
Good API control surfaces separate authentication, authorization, mutation, validation, policy, quota, and later reconciliation.
Admission should reject unsafe or unauthorized work early, but avoid becoming a slow, opaque, all-purpose control plane.
The central trade-off is early enforcement versus front-door availability, debuggability, and clear ownership of decisions.

← Back to Distributed Schedulers and Control Planes

← Back to Distributed Systems

← Back to Learning Hub