Pagination, Filtering, Sorting, and List API Semantics
LESSON
Pagination, Filtering, Sorting, and List API Semantics
The core idea: list endpoints are contracts for traversing changing data, so pagination, filters, and sort order must define stable evidence rather than just convenient query parameters.
Core Insight
Imagine a course page that shows reviews with GET /courses/42/reviews?sort=newest&limit=20. At first, the endpoint is simple. The database has a few hundred reviews, the web client asks for page 1, and the response looks correct.
Then the product grows. New reviews arrive while a mobile client is scrolling. A moderation job hides some reviews. A partner integration asks for only five-star reviews. The product team wants sorting by "most helpful." Suddenly the list endpoint is not just returning rows. It is defining how clients move through a changing set of data.
The common mistake is to treat pagination as presentation: page number, page size, next button. For a backend API, pagination is a consistency and contract problem. The client needs to know what order it is traversing, what filters were applied, whether the next page continues the same view, and what kind of duplicates or gaps are possible when the underlying data changes.
The central trade-off is convenience versus stability and cost. Offset pagination is easy to understand and simple for small lists, but it can become expensive and unstable when rows are inserted or removed. Cursor pagination requires a clearer contract, but it gives the server and client a better way to continue from a known position in a stable order.
The List Contract
A list endpoint has more public semantics than its URL suggests. For GET /courses/{course_id}/reviews, the contract should answer at least these questions:
- Which rows are eligible for the list?
- What filters can narrow those rows?
- What sort orders are supported?
- Is the order stable enough to page through?
- How does the client ask for the next page?
- Does the response include an exact total, an estimated total, or no total?
- What happens if data changes while the client is traversing?
Those details are not implementation trivia. A client may cache the first page, keep a cursor for offline continuation, display "showing 20 of 1,247", or retry a request after a network failure. If the API does not define the semantics, every client invents assumptions.
One clean response shape is:
{
"data": [
{
"id": "rev_901",
"course_id": "42",
"rating": 5,
"created_at": "2026-06-15T10:03:21Z"
}
],
"page": {
"limit": 20,
"next_cursor": "eyJjcmVhdGVkX2F0IjoiMjAyNi0wNi0xNVQxMDowMzoyMVoiLCJpZCI6InJldl85MDEifQ",
"has_more": true
}
}
The cursor is an opaque token. The client should not parse it or construct it. It is evidence from the server that says, "continue after this position in the same traversal contract." That distinction matters. If clients build cursors themselves, internal sort keys and database choices leak into the public API.
Offset and Cursor Pagination
Offset pagination usually looks like this:
GET /courses/42/reviews?limit=20&offset=40
It means "skip 40 matching rows and return the next 20." This is convenient for human page numbers and small administrative lists. It also maps cleanly to many SQL queries:
SELECT *
FROM reviews
WHERE course_id = '42'
ORDER BY created_at DESC
LIMIT 20 OFFSET 40;
The problem appears when the list changes. Suppose the client fetches offset 0, then three new reviews arrive at the top, then the client fetches offset 20. The second request now skips three rows the client never saw and may repeat rows that moved across the boundary. Large offsets can also become costly because the database may still need to scan or count many rows before returning the requested window.
Cursor pagination uses a position instead of a count:
GET /courses/42/reviews?sort=newest&limit=20
GET /courses/42/reviews?sort=newest&limit=20&cursor=<opaque-token>
Internally, the cursor might encode the last item's sort key:
{
"created_at": "2026-06-15T10:03:21Z",
"id": "rev_901"
}
The next query can continue from that point:
SELECT *
FROM reviews
WHERE course_id = '42'
AND (created_at, id) < ('2026-06-15T10:03:21Z', 'rev_901')
ORDER BY created_at DESC, id DESC
LIMIT 20;
The extra id matters. Sorting only by created_at is not stable if multiple reviews share the same timestamp. A stable traversal needs a deterministic tie-breaker. The client does not need to know that tie-breaker, but the API design does.
Cursor pagination is not perfect. It is less friendly for "jump to page 10." It usually requires indexes that match the filter and sort. It can be harder to combine with arbitrary user-selected sorting. But for feeds, timelines, audit logs, review lists, notifications, and most infinite-scroll APIs, it usually gives a better contract under change.
Filters, Sorting, and Totals
Filters and sorting are part of the same traversal contract. If the first request is:
GET /courses/42/reviews?rating=5&sort=newest&limit=20
the next_cursor should continue that exact filtered and sorted view. A cursor produced for rating=5&sort=newest should not be accepted silently for rating=1&sort=oldest. The server can reject mismatched cursor/query combinations with a clear 400, or encode enough signed context in the cursor to detect misuse.
Sorting also needs product discipline. A stable sort such as newest can be backed by (created_at, id). A sort such as most_helpful may depend on votes that change over time. If helpfulness changes while a client pages, items may move between pages. That can still be acceptable, but the contract should not pretend the traversal is a fixed snapshot unless the backend can actually provide one.
Total counts deserve special care. Exact totals can be expensive on large filtered datasets. They can also become stale immediately. Some APIs should return no total. Some should return an estimate. Some administrative screens need an exact count and can pay the cost. The mistake is returning a field named total without defining whether it is exact, approximate, cached, or scoped to the current filter.
OpenAPI can document these semantics directly:
parameters:
- name: rating
in: query
schema:
type: integer
minimum: 1
maximum: 5
- name: sort
in: query
schema:
type: string
enum: [newest, oldest, most_helpful]
- name: cursor
in: query
schema:
type: string
responses:
"200":
description: Page of reviews
The schema cannot explain every operational nuance by itself, but it can make supported parameters, enum values, cursor fields, and response shapes visible enough for client code and tests.
Operational Failure Modes
Issue: Offset pagination creates duplicates or gaps while data changes.
Why it is tempting: Page numbers and offsets are easy to understand and quick to implement.
Correction: Use cursor pagination for lists that mutate frequently or are consumed by infinite scroll, background sync, or integrations. Define the stable sort keys and tie-breaker.
Issue: Sorting is not deterministic.
Why it is tempting: ORDER BY created_at DESC looks sufficient until two rows share the same timestamp.
Correction: Add a unique tie-breaker such as id. The traversal order should be deterministic even when many rows have the same visible sort value.
Issue: Cursors leak database structure.
Why it is tempting: A raw timestamp or integer ID is easy to pass around.
Correction: Treat cursors as opaque public tokens. Encode or sign internal details so clients cannot depend on private storage choices.
Issue: Filters and cursors drift apart.
Why it is tempting: The server accepts any combination of query parameters because each one is valid alone.
Correction: Bind the cursor to the filter and sort context, or reject incompatible combinations. A cursor should continue one traversal, not become a universal pointer.
Issue: Exact totals are promised accidentally.
Why it is tempting: Product UIs often ask for "total results" without considering cost or freshness.
Correction: Decide whether the endpoint returns no count, an estimate, or an exact count. Name and document the field honestly.
Close the lesson and design the smallest list contract for course reviews. Choose allowed filters, one default sort, the tie-breaker, the cursor response shape, and whether you will expose a total. Then ask which of those promises a client test should protect.
Connections
The previous lesson showed that OpenAPI turns API assumptions into a checkable contract. List APIs make that concrete: query parameters, cursor tokens, sort enums, and page metadata are all client-visible promises.
The next lesson on contract testing asks how to prevent those promises from breaking. A compatibility gate should catch changes such as removing a sort value, changing cursor shape, or turning an optional next_cursor into a required field.
This lesson also connects back to error semantics. Invalid cursors, unsupported filters, and impossible sort/filter combinations should fail with stable, documented client errors rather than vague server failures.
Resources
- [DOC] Stripe API Pagination
- Focus: Study a production API that uses cursor-like list traversal with
starting_afterandending_before.
- Focus: Study a production API that uses cursor-like list traversal with
- [DOC] GitHub REST API Pagination
- Focus: Compare link-based pagination and how traversal metadata is exposed to clients.
- [SPEC] GraphQL Cursor Connections Specification
- Focus: Review the cursor, edge, node, and page info model for stable graph traversal.
- [SPEC] OpenAPI Specification
- Focus: Document query parameters, response schemas, enums, and error shapes as part of the list endpoint contract.
Key Takeaways
- A list endpoint is a traversal contract over changing data, not just a route that returns an array.
- Offset pagination is simple, but cursor pagination usually gives better stability for large or frequently changing lists.
- Stable pagination requires deterministic sort keys, usually including a unique tie-breaker.
- Filters, sort order, cursor tokens, totals, and error responses should be documented as client-visible API promises.
← Back to Backend and API Architecture