Day 057: Validation at Backend Boundaries

Validation gets much clearer once you stop asking "where does validation belong?" and start asking "what can this boundary actually know and reject with confidence?"

Today's "Aha!" Moment

Validation becomes easier to place once you stop treating it as one giant bucket. A backend does not perform one generic act called "validation." It crosses several trust boundaries, and each boundary can answer different questions. At the HTTP edge you can ask, "Is this request shaped correctly?" In the domain you can ask, "Does this action make sense according to business rules?" At the database you can ask, "Can this state be stored without violating integrity?"

Use a concrete example: POST /courses/{course_id}/reviews. A request arrives with rating, comment, and the caller identity. Several things can be wrong, but they are wrong in different ways. The JSON may be malformed. The caller may not be authenticated. The caller may be authenticated but never enrolled in the course. Or two valid requests may race and try to create a duplicate review. Those are not the same failure, so they should not all live in the same validation rule.

That is the real aha. Good validation is not about choosing one winning layer. It is about making each layer reject the class of mistake it is best positioned to detect. If you do that well, the backend becomes easier to reason about because each layer receives data that already satisfies the assumptions that layer is supposed to rely on.

This also resolves a common team argument. Controllers, services, and databases are not competing places to put one universal validation system. They are cooperating boundaries with different knowledge. The question is never "Which layer owns validation?" The question is "Which layer knows enough to reject this problem correctly and early?"

Why This Matters

The problem: Invalid input often travels too far before it is rejected, which makes failures harder to explain and encourages layers to depend on assumptions that were never made explicit.

Before:

Malformed requests leak into services that were written for structured data.
Business rules get mixed with parsing, auth, and storage errors.
Teams either duplicate the same check everywhere or trust the first check too much.

After:

Each boundary rejects the class of invalidity it can actually know.
Error handling becomes clearer because malformed, forbidden, domain-invalid, and integrity failures stay distinct.
Persistence constraints backstop the system where races or bypasses still exist.

Real-world impact: Better security posture, clearer API behavior, fewer surprise failures deep in the stack, and stronger protection against concurrency and integration edge cases.

Learning Objectives

By the end of this session, you will be able to:

Explain boundary-specific validation - Distinguish structural, authorization/context, domain, and persistence validation.
Place checks where they belong - Identify what each backend layer can know and should reject.
Reason about overlap without confusion - Understand why multiple layers can validate related things without pointless duplication.

Core Concepts Explained

Concept 1: Edge Validation Protects Structure and Basic Trust Assumptions

Start with the outermost boundary: the request enters the backend. At this point the system can know a few things with confidence. Is the JSON valid? Are required fields present? Are types correct? Does the path parameter parse? Is the caller authenticated strongly enough for the route to proceed at all?

For the review endpoint, a payload like { "rating": "great" } or {} should be rejected immediately. So should a request with no valid authentication token if the route requires one. These are edge failures because the system does not need domain state to reject them.

That makes edge validation valuable for two reasons. First, it keeps deeper layers from having to interpret nonsense. Second, it narrows the assumptions for the next layer. By the time the use case runs, it should be dealing with parsed, typed, minimally trustworthy input, not raw transport noise.

raw request
   -> parse / schema check
   -> auth context extraction
   -> typed command for the use case

The trade-off is that edge validation should stay focused on what the edge actually knows. If you push deep business policy into request schemas, the transport boundary starts pretending it understands domain state it has not even loaded yet.

Concept 2: Domain Validation Protects Meaning, Eligibility, and Workflow Rules

Once the request is structurally valid, the domain layer can ask a different class of question: "Should this action be allowed in the world of the product?" A review may have the right shape and come from an authenticated user, yet still be invalid because the learner never enrolled, the course is archived, or the review window is closed.

That is domain validation. It is not about syntax. It is about meaning.

def submit_review(command, enrollment_repo, review_repo):
    if not enrollment_repo.exists(command.user_id, command.course_id):
        raise ValueError("not_enrolled")
    if review_repo.already_submitted(command.user_id, command.course_id):
        raise ValueError("duplicate_review")

The example is intentionally small, but it shows the key distinction: the use case is validating against domain facts and invariants, not checking whether rating was a JSON number. The domain layer can do this because it has access to business state and business policy.

This is also where authorization sometimes meets validation. "Is this caller allowed to do this action?" can be partly an edge concern when it is just about basic identity, and partly a domain concern when it depends on ownership, enrollment status, role within a course, or product-specific rules. That is not duplication. It is layered knowledge.

The trade-off is that domain validation usually requires reads, policy code, and sometimes more latency than edge checks. That is fine. It is the right place for decisions that depend on the state of the business, not just the shape of the request.

Concept 3: Persistence Validation Is the Last Line, Not the Only Line

Some invariants are safest only at the storage boundary. Imagine two review submissions racing from different devices. The use case may check already_submitted == false twice and still end up with a duplicate unless the database enforces uniqueness on (user_id, course_id).

That is why persistence validation exists. It protects committed state where application-level checks can still race, be bypassed by another integration path, or simply fail under concurrency.

request valid?
   yes
domain action allowed?
   yes
state still commit-safe under concurrency?
   database decides here

This matters pedagogically because many students swing too far in one direction:

"The database will catch it anyway, so app validation is unnecessary."
"We already checked it in the service, so constraints are redundant."

Both are wrong. Database constraints are excellent for integrity, but they are a poor place to express user-facing business guidance on their own. If the only way the user learns a review is duplicate is via a raw unique-constraint explosion, the backend has done the minimum to protect state but not the maximum to explain behavior.

The trade-off is that persistence validation is strong but late. It is the best backstop for integrity and the wrong place to carry the whole user experience by itself.

Troubleshooting

Issue: Repeating the same check in several layers and calling it "defense in depth."

Why it happens / is confusing: Teams hear that validation should be layered, then copy the same rule mechanically instead of clarifying which boundary owns which knowledge.

Clarification / Fix: Write down the exact question each layer is answering. If two layers ask the same question for the same reason, that is probably duplication. If they protect different failure modes, the overlap may be justified.

Issue: Treating the database as the only serious validator.

Why it happens / is confusing: Database constraints feel authoritative, so teams sometimes stop designing clear application-level failures.

Clarification / Fix: Keep constraints for integrity, but reject obvious malformed and domain-invalid requests earlier so the system fails more clearly and cheaply.

Advanced Connections

Connection 1: Validation ↔ API Security

The parallel: Validation is part of the backend trust model because every boundary decides what assumptions may safely hold after crossing it.

Real-world case: Authentication and authorization checks narrow who may proceed, while schema checks narrow what shape of request may proceed. Together they reduce the uncertainty the domain layer must deal with.

Connection 2: Validation ↔ Data Flow Design

The parallel: Good validation turns a messy request into progressively better-formed internal data as it moves inward.

Real-world case: DTO mapping, use-case execution, and persistence become easier to reason about when each stage receives data that already satisfies the previous stage's assumptions.

Resources

Optional Deepening Resources

These resources are optional and are not required for the core 30-minute path.
[ARTICLE] OWASP Input Validation Cheat Sheet
- Link: https://cheatsheetseries.owasp.org/cheatsheets/Input_Validation_Cheat_Sheet.html
- Focus: Review layered validation as a security and correctness practice.
[DOC] Pydantic Documentation
- Link: https://docs.pydantic.dev/latest/
- Focus: See one practical approach to request-shape validation and typed boundary models in Python backends.
[DOC] PostgreSQL Constraints
- Link: https://www.postgresql.org/docs/current/ddl-constraints.html
- Focus: Review uniqueness, checks, and referential integrity as persistence-level safeguards.
[BOOK] Release It!
- Link: https://pragprog.com/titles/mnee2/release-it-second-edition/
- Focus: Connect defensive validation to broader production hardening and failure-handling practices.

Key Insights

Validation follows knowledge - Each layer should reject only what it can actually know with confidence.
"Valid" means different things at different boundaries - Good shape, valid identity, allowed action, and commit-safe state are distinct concepts.
Integrity and clarity both matter - Application validation gives better behavior; persistence validation gives stronger backstop guarantees.

Knowledge Check (Test Questions)

What is the best rule for deciding where a validation check belongs?
- A) Put it at the boundary that has enough knowledge to reject the problem correctly and early.
- B) Put all checks in controllers so requests fail as soon as possible.
- C) Put all checks in the database because it is the final source of truth.
Why is schema validation not enough for a review submission endpoint?
- A) Because a well-formed request can still violate domain rules like "only enrolled learners may review."
- B) Because schemas are incompatible with JSON APIs.
- C) Because schema validation already guarantees database integrity.
Why should a unique database constraint still exist even if the service checks for duplicates first?
- A) Because concurrent requests or bypassed application paths can still violate the invariant at commit time.
- B) Because unique constraints are the best place to explain business rules to API clients.
- C) Because service-layer checks should never exist.

Answers

1. A: Validation placement should follow knowledge and timing, not dogma about one universally correct layer.

2. A: Schema validation protects structure; it cannot decide business eligibility on its own.

3. A: Application checks improve clarity, but database constraints still protect integrity under concurrency and alternate write paths.

← Back to Learning