Request and Response Bodies: Streaming, Compression, and Range Delivery

LESSON

HTTP Protocol and Content Delivery

012 25 min intermediate

Request and Response Bodies: Streaming, Compression, and Range Delivery

The core idea: HTTP body headers turn a stream of bytes into an inspectable transfer contract, so clients and servers can trade off memory, bandwidth, CPU, caching, and recovery after partial failure.

Core Insight

Imagine a mobile app downloading a 600 MB training video for offline viewing. The user starts on Wi-Fi, walks into an elevator, and the connection drops after 180 MB. A naive client starts over from byte zero. The user waits, the origin pays for duplicate bandwidth, and support sees "downloads randomly fail" even though every individual HTTP request looked ordinary.

HTTP gives you better tools than "send the whole thing and hope." A response can say how many bytes are coming. A server can stream bytes without buffering the whole representation first. A client can ask for a byte range after an interruption. A server can compress text responses when bandwidth is scarce, while avoiding wasted CPU on media that is already compressed.

The hard part is that these are not separate tricks. They are one body-transfer contract. Content-Length, Transfer-Encoding, Content-Encoding, Accept-Encoding, Range, Content-Range, validators, and cache variation all describe what bytes are being transferred, how the receiver knows where the body begins and ends, and whether a partial transfer can be joined with another one safely.

The trade-off is bandwidth and recovery versus CPU, memory, and cache complexity. Buffering everything gives simple sizes but poor memory behavior. Streaming lowers latency and memory pressure but makes progress and failure handling more explicit. Compression saves bandwidth for text but costs CPU and creates cache variants. Ranges make resume possible but only when the client and server agree about exactly which representation the byte offsets refer to.

What a Body Contract Names

An HTTP message has metadata and, sometimes, a body. The body is not self-explanatory. The surrounding headers tell the receiver how to interpret and delimit it.

For a simple JSON response:

HTTP/1.1 200 OK
Content-Type: application/json
Content-Length: 32
Cache-Control: private, no-store

{"id":"842","state":"ready"}

Content-Type says what kind of representation the body contains. Content-Length says exactly how many octets are in the body. Once the client has read 32 bytes after the header section, the response body is complete. This is easy to reason about, easy to measure, and efficient when the server already knows the size.

For a generated report, the server may not know the final size before it starts sending. In HTTP/1.1, it can stream with chunked transfer coding:

HTTP/1.1 200 OK
Content-Type: text/csv
Transfer-Encoding: chunked

7
id,name

A
842,Ada

0

Chunked transfer coding is message framing. The chunks tell the HTTP recipient where each piece starts and where the body ends. They are not the application's business records, and application code should not treat chunk boundaries as meaningful CSV row boundaries. A proxy might rechunk the body differently.

The plain-language distinction is:

representation metadata -> what the bytes mean
message framing         -> where this HTTP body starts and ends
application structure   -> records, JSON fields, rows, frames, or media segments inside the bytes

Keeping those layers separate prevents bugs. Content-Type: application/json says how to parse the bytes. Content-Length or chunked transfer says how many bytes belong to this message. The JSON parser, CSV parser, image decoder, or video player handles the structure inside the representation.

Streaming Is a Backpressure Contract

Streaming means the sender can produce the body incrementally and the receiver can consume it incrementally. That can improve time-to-first-byte and memory behavior. It also exposes backpressure: if the receiver is slow, the sender eventually has to slow down, buffer, or fail.

Consider a service that exports a customer's order history as CSV:

request arrives
-> authorize user
-> query rows in pages
-> write CSV header
-> stream rows as they are fetched
-> finish response when the final page is written

This avoids building a 2 GB string in memory. It lets the user start receiving data quickly. It also changes what "success" means operationally. The server might have sent the first 100 MB before the database cursor fails. The client sees a broken download, not a clean JSON error body. Metrics need to distinguish "headers sent" from "body completed."

For request bodies, the same pressure appears in the other direction. A user uploads a 2 GB video. The API should authenticate and authorize the request before spending work on the body. It should enforce size limits, content type expectations, timeouts, and disk or object-storage backpressure. If the server reads the entire body into memory before checking policy, one upload can become a reliability problem.

A useful upload path is:

headers received
-> authenticate and authorize
-> reject early if content type or size is not acceptable
-> stream body to storage while hashing or scanning
-> commit metadata only after storage confirms completion

Sometimes a client uses Expect: 100-continue to avoid sending a large body until the server confirms that the request headers are acceptable:

PUT /videos/abc HTTP/1.1
Host: api.shop.test
Authorization: Bearer token_for_user_123
Content-Type: video/mp4
Content-Length: 629145600
Expect: 100-continue

If the token is expired or the size is over the limit, the server can reject before the client transmits the whole body. Not every client and intermediary path handles this perfectly, but the design point is clear: expensive bodies should have an early boundary decision when possible.

Compression Changes the Representation Variant

Compression is negotiated. The client advertises what encodings it can accept:

GET /orders/842/receipt HTTP/1.1
Host: api.shop.test
Accept-Encoding: br, gzip

The server can choose a compressed representation:

HTTP/1.1 200 OK
Content-Type: text/html
Content-Encoding: br
Vary: Accept-Encoding
Cache-Control: private

Content-Encoding says the representation was encoded with Brotli before transfer. The client decodes it to recover the usable HTML. Vary: Accept-Encoding tells caches that a response for a client accepting Brotli is not necessarily reusable for a client that only accepts gzip or no compression.

Compression is usually excellent for text: HTML, CSS, JavaScript, JSON, CSV, and logs. It is usually wasteful for already-compressed formats such as JPEG, PNG, MP4, ZIP, and many PDFs. Compressing those can spend CPU without saving meaningful bandwidth, and sometimes increases size.

Compression also affects observability. A dashboard that compares "body bytes sent" across endpoints needs to know whether it is counting encoded bytes on the wire or decoded representation bytes. A latency spike after enabling Brotli might be CPU time, not network time. A cache hit ratio can drop when Vary: Accept-Encoding creates multiple variants for the same URL.

The safe mental model is:

URL + request headers select a representation variant
-> optional content coding transforms the bytes
-> caches must keep variants separate
-> clients decode before application parsing

Do not compress everything just because it looks like a free win. Compression is a negotiation with CPU, memory, cache keys, and sometimes security considerations. For authenticated private API responses, the bandwidth win may still be worth it, but measure encoded size, CPU, and tail latency together.

Range Requests Make Resume Possible

Now return to the interrupted 600 MB video. The first successful response might be:

HTTP/1.1 200 OK
Content-Type: video/mp4
Content-Length: 629145600
Accept-Ranges: bytes
ETag: "video-abc-v3"
Cache-Control: private

Accept-Ranges: bytes advertises that the server supports byte ranges. Content-Length gives the full representation size. ETag identifies the selected representation version. The client stores the first 188743680 bytes before the connection drops.

On retry, the client asks for the rest:

GET /videos/abc HTTP/1.1
Host: media.shop.test
Range: bytes=188743680-
If-Range: "video-abc-v3"

Range says "send bytes from this offset to the end." If-Range says "only do that if the representation is still the one identified by this validator; if it changed, send the whole current representation instead." That prevents stitching the first 180 MB of version 3 to the remaining bytes of version 4.

A good partial response looks like:

HTTP/1.1 206 Partial Content
Content-Type: video/mp4
Content-Range: bytes 188743680-629145599/629145600
Content-Length: 440401920
ETag: "video-abc-v3"

...remaining bytes...

The client can now append the new bytes after the previously stored bytes because the offsets and validator line up. If the requested range is impossible, the server can return 416 Range Not Satisfiable with a Content-Range that tells the current total size.

Range support is most useful for large stable representations: videos, archives, large PDFs, model files, and static assets. It is less useful for highly dynamic JSON where the representation changes often, or for responses that are generated differently on every request. The more stable the representation and the stronger the validator, the safer resume becomes.

Compression and ranges require care together. Byte ranges apply to the selected representation. If one client receives a compressed variant and another receives an uncompressed variant, byte offsets are not interchangeable. For large media, servers often avoid content compression and serve the exact media bytes. For text downloads where range support matters, decide deliberately whether ranges are over encoded bytes, decoded bytes, or not supported.

Worked Path: A Resumable Download

A robust video download path has visible intermediate states.

Initial request:

GET /videos/abc HTTP/1.1
Host: media.shop.test
Authorization: Bearer token_for_user_123
Accept-Encoding: identity

The server checks authorization before opening the large object. It chooses the identity encoding because the file is already compressed media and range offsets should match stored object bytes. It replies:

HTTP/1.1 200 OK
Content-Type: video/mp4
Content-Length: 629145600
Accept-Ranges: bytes
ETag: "video-abc-v3"
Cache-Control: private

The client records:

url: /videos/abc
etag: "video-abc-v3"
total size: 629145600
bytes stored: 188743680
complete? no

After interruption, the client retries:

GET /videos/abc HTTP/1.1
Host: media.shop.test
Authorization: Bearer token_for_user_123
Range: bytes=188743680-
If-Range: "video-abc-v3"
Accept-Encoding: identity

The server confirms the validator, then returns 206 Partial Content with the remaining byte range. The client verifies that Content-Range starts exactly where its stored file ends. If it does, it appends. If it does not, it discards the partial state and starts over.

The naive approach was "retry the same GET and overwrite everything." The better approach records the representation identity and byte position, then asks for exactly the missing suffix. That saves bandwidth and improves user experience, but it requires the server to keep range, validators, encoding, and authorization consistent.

Operational Failure Modes

Failure: buffering large bodies in memory. A server that builds a full export before sending it can run out of memory under normal user demand. Stream large generated responses, but measure body completion and handle mid-stream failures honestly.

Failure: trusting Content-Length without limits. A client can lie, omit the length, or stream slowly. Enforce maximum body sizes, read deadlines, and upload storage limits. Treat request body handling as part of your reliability boundary.

Failure: compressing the wrong thing. Brotli on HTML may be excellent. Gzip on MP4 is wasted CPU. Compression policy should depend on media type, size, CPU budget, client support, and cache behavior.

Failure: missing Vary: Accept-Encoding. If a cache stores a Brotli response and serves it to a client that cannot decode Brotli, the response is broken. Every negotiated representation variant needs a cache key that includes the deciding request headers.

Failure: resuming against a changed representation. A range retry without a validator can produce a corrupted file if the object changed between requests. Use ETag or Last-Modified, and prefer If-Range for safe resume.

Useful signals include response body completion rate, upload rejection before body read, encoded versus decoded byte counts, compression CPU, range request rates, 206 and 416 counts, client resume success, and cache hit ratio by encoding variant. These metrics show whether the body contract is saving work or hiding partial failures.

Connections

The previous lesson put identity and authorization at the boundary. That still matters here: reject unauthorized large uploads or downloads before spending network, CPU, or storage work on the body.

The next lesson moves from body transfer to connection reuse. Body streaming interacts with connection behavior: one slow response body can occupy a connection for a long time, and HTTP/1.1 has limits that later protocols try to improve.

Close the lesson and trace one large file response from memory: representation type, known or unknown length, compression decision, cache variation, range support, validator, and what the client records after an interrupted transfer.

Resources

Key Takeaways

PREVIOUS HTTP Authentication and Authorization Headers NEXT Persistent Connections, HOL Blocking, and HTTP/1.1 Limits