Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: FileServer: Content-Length is not set when Content-Encoding was set #66735

Open
paralin opened this issue Apr 9, 2024 · 18 comments
Open
Labels
help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@paralin
Copy link
Contributor

paralin commented Apr 9, 2024

Go version

go version go1.22.2 darwin/arm64

Output of go env in your module/workspace:

N/A

What did you do?

I am trying to set Content-Encoding for .br brotli compressed files with http.FileServer.

This code then skips setting Content-Length because Content-Encoding is set:

if len(ranges) > 0 || w.Header().Get("Content-Encoding") == "" {

This causes my httptest to fail with an empty Content-Length, as well as other cases where I call ServeHTTP without using the full http client: https://go.dev/play/p/2UvgBSH8FWx

When using http.Get it works correctly, so the client must be filling in the Content-Length: https://go.dev/play/p/xp08V07UrtQ

What did you see happen?

The content-length header is not set when the content-encoding header is set.

What did you expect to see?

I expect the content-length header to be set even if the content-encoding header is set.

@seankhliao
Copy link
Member

I believe the comment preceding the line sufficiently explains why it isn't set.

@seankhliao seankhliao closed this as not planned Won't fix, can't repro, duplicate, stale Apr 9, 2024
@paralin
Copy link
Contributor Author

paralin commented Apr 9, 2024

@seankhliao It does not sufficiently cover it. The comment even says within that this is a hacky fix.

I am raising a situation where this needs to be set and there should be some kind of way to force that code branch to not be taken.

@paralin
Copy link
Contributor Author

paralin commented Apr 9, 2024

CC @neild who introduced this check & the comment: fdc21f3

@Jorropo
Copy link
Member

Jorropo commented Apr 15, 2024

I don't find the comment above this if to be clear.


@paralin your code is incorrect, Range gives offsets in the plaintext, not compressed archive, however because you give the already compressed archive to net/http's ranging code, it will interpret them as offsets into the archive and answer with invalid truncated brotli streams no valid client would know to do anything with.

Your best bet is to not use http.FS and use a custom handler instead,
short as an if check on Accept-Encoding, then grab the compressed archive, set Content-Length to the compressed size, write status and finally io.Copy.
If you also want to support Range, then you can use the same thing as above, but also send Accept-Ranges: bytes and then add one more if which instead call http.ServeFile with the plaintext if Range header is set.

@paralin
Copy link
Contributor Author

paralin commented Apr 15, 2024

@Jorropo This is not correct. The file is already compressed with .br in advance. The decompression step is happening on the client side after the Range request is complete. So, I am actually asking for that specific range of the compressed file.

@neild
Copy link
Contributor

neild commented Apr 15, 2024

When Content-Encoding is set, the Content-Length refers to the size of the encoded content. So, Content-Length: 100, Content-Encoding: gzip refers to 100 bytes after compression.

(In contrast, Content-Length: 100, Transfer-Encoding: gzip refers to 100 bytes before compression.)

So what @paralin is doing here is correct.

Unfortunately, as the comment in fs.go says, it is the unfortunate reality that there is code out there in the wild that:

  • PassesserveContent a Content-Length referring to the pre-compressed content size; and
  • Compresses the response in the ResponseWriter.

This is wrong, and breaks Range requests, but some amount of code does it and it will break for all requests if we set Content-Length unconditionally here. It's unfortunate.

As the comment in fs.go says, we could possibly check the type of the ResponseWriter and set Content-Length unconditionally if we recognize it as one that we control.

@neild neild reopened this Apr 15, 2024
@paralin
Copy link
Contributor Author

paralin commented Apr 15, 2024

Is there any way the fs http handler could be configured to always set this header and ignore content-encoding? It's a decent heuristic to check if the response writer is the standard one, but what if it's not? (Like in my case, unfortunately).

At the moment I'm working around this by opening the file and getting the size in advance before calling the http ServeHTTP. But this is opening the file twice and seems inherently inefficient / hacky and also bypasses the other logic in the fs http handler

@cherrymui cherrymui added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Apr 15, 2024
@cherrymui cherrymui added this to the Backlog milestone Apr 15, 2024
@Jorropo
Copy link
Member

Jorropo commented Apr 16, 2024

The file is already compressed with .br in advance. The decompression step is happening on the client side after the Range request is complete. So, I am actually asking for that specific range of the compressed file.

Exactly, so we agree on what the code do, and that disagree with the HTTP RFC, Content-Encoding is meant to be transparent, so you get offsets into the plain text:
I modified your code to expose a localhost HTTP server: https://go.dev/play/p/UB1en6ZlrRB

> curl --compressed -i http://localhost:55555/hello-world.br
HTTP/1.1 200 OK
Accept-Ranges: bytes
Access-Control-Expose-Headers: Content-Encoding, Content-Length
Content-Encoding: br
Content-Type: application/octet-stream
Date: Mon, 15 Apr 2024 23:27:35 GMT
Content-Length: 16

hello world

Accept-Ranges: bytes tell me I can range the file, so let me try:

> curl --compressed -H "Range: bytes=1-" http://localhost:55555/hello-world.br
curl: (61) Unrecognized or bad HTTP Content or Transfer-Encoding

And it fails with error code 61 because it received a corrupted brotli stream.

Original Response:              8f05 8068 656c 6c6f 2077 6f72 6c64 0a03  ...hello world..
What your code send with Range:   05 8068 656c 6c6f 2077 6f72 6c64 0a03  ..hello world..
Here it is fixed:               0f05 8065 6c6c 6f20 776f 726c 640a 03    ...ello world..

The difference is that your code truncates the brotli header, while Content-Encoding must be transparent, so it need to answer with a complete brotli stream of the partial plaintext.

Here I modified the code to skip http.FileServer if the request is simple to serve: https://go.dev/play/p/_OR1Gx3rg_l

> curl --compressed -i http://localhost:55555/hello-world.br
HTTP/1.1 200 OK
Accept-Ranges: bytes
Access-Control-Expose-Headers: Content-Encoding, Content-Length
Content-Encoding: br
Content-Length: 16
Date: Tue, 16 Apr 2024 00:03:20 GMT

hello world
> curl --compressed -i -H "Range: bytes=1-" http://localhost:55555/hello-world.br
HTTP/1.1 206 Partial Content
Accept-Ranges: bytes
Content-Length: 11
Content-Range: bytes 1-11/12
Content-Type: text/plain; charset=utf-8
Date: Tue, 16 Apr 2024 00:06:08 GMT

ello world

This also solve your original issue and sets Content-Length.

@Jorropo

This comment was marked as outdated.

@gopherbot
Copy link

Change https://go.dev/cl/579095 mentions this issue: net/http: correct workaround comment for Content-Encoding in serveContent

@paralin
Copy link
Contributor Author

paralin commented Apr 16, 2024

In the case when requesting a Range and have set Content-Encoding I agree that this would not work because the browser will try to decompress the result and get invalid compressed data.

My particular case is serving the entire .br file with content encoding set (not using Range). To be clear my specific use case is serving a .wasm file brotli compressed.

@Jorropo I got a bit confused with the specifics of this issue when you mentioned the Range requests because I'm using Range requests to access an encoded file, but without Content-Encoding set, decompressing on the client side, in a different context. With respect to this issue it was about setting the Content-Length and Content-Encoding when requesting the entire file.

@Jorropo

This comment was marked as outdated.

@paralin
Copy link
Contributor Author

paralin commented Apr 16, 2024

The problem is that Content-Encoding nor content-type are set properly for wasm and wasm.br. I am otherwise using the filesystem as designed. Just for this file type it's a special case.

@Jorropo

This comment was marked as outdated.

@neild
Copy link
Contributor

neild commented Apr 16, 2024

Exactly, so we agree on what the code do, and that disagree with the HTTP RFC, Content-Encoding is meant to be transparent, so you get offsets into the plain text:

I'm sorry, but this is not accurate.

Unlike Transfer-Encoding (Section 6.1 of [HTTP/1.1]), the codings listed in Content-Encoding are a characteristic of the representation; the representation is defined in terms of the coded form, and all other metadata about the representation is about the coded form unless otherwise noted in the metadata definition.
-- https://www.rfc-editor.org/rfc/rfc9110.html#section-8.4-7

The "Range" header field on a GET request modifies the method semantics to request transfer of only one or more subranges of the selected representation data (Section 8.1), rather than the entire selected representation.
-- https://www.rfc-editor.org/rfc/rfc9110.html#section-14.2-1

The Range and Content-Range headers refer to ranges of the "representation", and the representation is the coded form. When Content-Encoding: br, ranges are defined in terms of the compressed bytes, not the uncompressed bytes.

This is in contrast to Transfer-Encoding.

Your curl example (curl --compressed -H "Range: bytes=1-") does encounter a decompression failure, but that's just demonstrating that it's possible to construct a curl command that fails.

@neild
Copy link
Contributor

neild commented Apr 16, 2024

Is there any way the fs http handler could be configured to always set this header and ignore content-encoding? It's a decent heuristic to check if the response writer is the standard one, but what if it's not? (Like in my case, unfortunately).

The best I can come up with is to have an optional ResponseWriter method that lets the FileServer know that the ResponseWriter isn't going to misbehave. SetContentLength, maybe, where implementing it implies that the ResponseWriter understands that it isn't allowed to mess around with the data?

New magic ResponseWriter methods are generally exposed via ResponseController, but doing so in this case might be a problem: ResponseController provides a way for middleware to wrap an underlying ResponseWriter without losing its methods, but if a misbehaving ResponseWriter implements that unwrapping it would subvert the check intended to verify that it doesn't misbehave. Bit of a mess there, not sure how much of a concern it would be.

Or I suppose we could go the other way, and have a magic method on the Handler returned by FileServer that lets you tell it that the ResponseWriter isn't broken. That's also kind of terrible.

Or add a new FileServerFSOptions function that takes a struct of options to configure the file server with. Maybe there are other options we'd like to put in there?

I don't like any of the options I can think of.

@paralin
Copy link
Contributor Author

paralin commented Apr 16, 2024

It's not pretty, but passing the options at construct time as a new optional parameter with perhaps a NewFileServerWithOptions or so, probably is the least hacky way. Then there are probably other flags that could be set too. Like disabling the index page, adding additional file extension to content type mappings, setting Content-Encoding, etc.

@Jorropo
Copy link
Member

Jorropo commented Apr 17, 2024

I'm sorry, but this is not accurate.

Whoa mb thx.
I guess that useful to resume compressed downloads but I tried multiple webservers and they don't handle Range when sending back compressed responses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

6 participants