DTOs, Entities, and Data Mapping

LESSON

010 30 min intermediate

DTOs, Entities, and Data Mapping

The core idea: DTOs and entities are boundary-shape tools: they keep accepted input, internal truth, persistence details, and public output from pretending to be the same object.

Core Insight

Continue with the course review flow from the previous lesson. A learner submits a review with rating and comment. The backend validates the request, checks that the learner may review the course, stores the result, and returns a response for the client.

It is tempting to pass one object through that whole path. The handler receives a review object, the service mutates it, the repository stores it, and the handler returns it. Early on, that feels efficient. Later, it becomes a coupling trap because each boundary cares about a different shape.

The input boundary needs only the fields the client is allowed to send. The use case needs trusted caller context and the course id. The internal model may need review id, moderation status, timestamps, and invariants. The persistence layer may need foreign keys and storage-specific fields. The response contract may need reviewer display name and a formatted timestamp while hiding moderation internals.

That is the reason DTOs and mapping exist. A data transfer object, or DTO, is a deliberately shaped object for crossing a boundary. An entity is an internal representation with identity and meaning inside the backend. Mapping is the translation step that says, "This is what may cross this boundary, and this is what must stay inside."

The misconception to avoid is that DTOs are always boilerplate or always mandatory. They are valuable when boundaries have different concerns. If the API is public, security-sensitive, versioned, or likely to evolve, explicit shapes are usually worth the cost.

One Review, Four Shapes

Use the same review submission all the way through:

client input
  -> request DTO
  -> application command
  -> domain or persistence model
  -> response DTO

The request DTO defines what the client may send:

class CreateReviewRequest:
    rating: int
    comment: str

Notice what is absent. The client does not send reviewer_id, created_at, is_moderated, internal_spam_score, or course_id if that value comes from the route. That absence is design, not omission. The request DTO narrows the input contract so the client cannot accidentally or maliciously control fields the backend owns.

The application command is a different shape because the use case needs more than the raw body:

class CreateReviewCommand:
    user_id: str
    course_id: str
    rating: int
    comment: str

This object combines accepted input with trusted context. The client supplied the rating and comment. The route supplied the course id. Authentication supplied the user id. The use case can now work with a command that means "this known learner is attempting to review this known course."

The internal review model may include facts the client should never control:

class Review:
    id: str
    course_id: str
    reviewer_id: str
    rating: int
    comment: str
    moderation_status: str
    created_at: datetime

The response DTO is another deliberate shape:

class ReviewResponse:
    id: str
    rating: int
    comment: str
    reviewer_name: str
    created_at: str

It may include display information that is useful to the client and omit internal workflow state. The point is not the class names. The point is that each boundary gets a shape that matches its job.

Entities Are Internal, Not Public Promises

The word "entity" is overloaded. In domain modeling, an entity is often an object with identity and business meaning. In many frameworks, an entity also means an ORM-backed persistence object. Those may overlap, but they are not automatically the same thing.

That matters because a persistence entity is usually shaped by storage needs:

database primary keys
foreign keys
lazy-loaded relationships
audit fields
moderation columns
internal flags
timestamps in storage format

All of those can be correct internally and still wrong as an API response. Returning an ORM object directly from a handler silently turns internal structure into public contract:

def create_review_handler(request):
    review = service.create_review(request)
    return review  # risky: internal shape leaks outward

Now a database rename can break clients. A moderation flag can leak. A relationship can serialize too much data. A field added for internal workflow can become hard to remove because a mobile client started depending on it.

Keeping entities internal is not about ceremony. It preserves the backend's ability to evolve. The public API can stay stable while the database schema, domain model, or persistence strategy changes behind a mapping boundary.

The trade-off is extra translation code. That translation is worth it when the boundary protects security, versioning, or long-lived client contracts. It may be unnecessary for tiny internal tools where the shape is genuinely private and short-lived.

Worked Mapping Path

Here is the course review flow as a concrete mapping path:

def create_review_handler(http_request, auth_context):
    body = parse_json(http_request.body)
    dto = CreateReviewRequest(
        rating=body["rating"],
        comment=body["comment"],
    )

    command = CreateReviewCommand(
        user_id=auth_context.user_id,
        course_id=http_request.path_params["course_id"],
        rating=dto.rating,
        comment=dto.comment,
    )

    review = create_review(command)
    return review_to_response(review)

The mapper from internal review to response can make exposure rules explicit:

def review_to_response(review):
    return {
        "id": review.id,
        "rating": review.rating,
        "comment": review.comment,
        "reviewerName": review.reviewer_display_name,
        "createdAt": review.created_at.isoformat(),
    }

This mapping decides what the client sees. It also gives the backend a stable place to handle API evolution. Suppose the product later adds moderation status, internal spam scoring, or reviewer reputation. Those fields can exist internally without automatically crossing the public boundary.

Versioning becomes easier too. A v1 response may keep reviewerName, while a later response adds a richer reviewer object. The backend can support both response mappers without forcing the internal entity to imitate either public shape.

internal review model
     |
     +--> v1 response mapper
     |
     +--> v2 response mapper

The practical rule is: map explicitly when the boundary has a different audience, trust level, lifecycle, or compatibility promise.

Operational Failure Modes

Issue: Creating DTOs that mirror entities forever.

Clarification / Fix: Start from the boundary question. What is this shape restricting, stabilizing, or hiding? If the answer is nothing, the DTO may be ceremony. If the answer is input trust, public compatibility, or security, the mapper is doing real work.

Issue: Letting clients set internal fields through broad request models.

Clarification / Fix: Keep request DTOs smaller than internal models. Fields such as reviewer_id, created_at, moderation status, and internal scores should come from trusted backend context or internal workflow.

Issue: Returning persistence models directly from handlers.

Clarification / Fix: Treat database and ORM shapes as internal implementation details. Use response mapping to choose the public contract deliberately.

Issue: Mapping so early that domain code only sees transport vocabulary.

Clarification / Fix: Translate HTTP input into an application command before the use case. Domain code should receive meaningful business input, not raw web framework objects.

Close the lesson and draw the review flow from memory. Label which shape is accepted input, which shape is trusted command, which shape is internal truth, and which shape is public output. If one object crosses every boundary unchanged, name exactly what coupling you are accepting.

Connections

The previous lesson showed that validation follows boundary knowledge. This lesson shows the companion idea: data shapes should follow boundary responsibility.

The next lesson follows the full request lifecycle. DTOs and mapping are the visible shape changes inside that lifecycle: raw HTTP becomes accepted input, accepted input becomes a command, internal results become a public response.

This also connects back to API versioning. Stable public contracts are much easier to preserve when response mapping is deliberate instead of an accidental serialization of internal objects.

Resources

[ARTICLE] Data Transfer Object
- Focus: Use DTOs as boundary-crossing objects, not as generic extra classes.
[ARTICLE] Data Mapper
- Focus: Compare boundary mapping with the broader separation between object models and persistence structures.
[DOC] Pydantic Documentation
- Focus: See a practical tool for explicit request and response models in Python backends.
[BOOK] API Design Patterns
- Focus: Connect response shape, resource contracts, and API evolution decisions.

Key Takeaways

One data shape rarely serves input acceptance, business workflow, persistence, and public output equally well.
Request DTOs restrict what clients may send; response DTOs define what the API promises to expose.
Entities and persistence models are internal representations, not automatic public contracts.
Mapping is worthwhile when it protects trust, versioning, compatibility, or security across a boundary.

← Back to Backend and API Architecture

← Back to Architecture And Platforms

← Back to Learning Hub