Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: net/http: delete inappropriate headers in func Error #66343

Open
rsc opened this issue Mar 15, 2024 · 18 comments
Open

proposal: net/http: delete inappropriate headers in func Error #66343

rsc opened this issue Mar 15, 2024 · 18 comments

Comments

@rsc
Copy link
Contributor

rsc commented Mar 15, 2024

#50905 reported a bug about ServerContent serving bad headers when the request range is invalid. An example usage is:

w.Header().Set("Content-Type", "application/json")
w.Header().Set("Content-Length", strconv.Itoa(len(jsonBody)))
w.Header().Set("Etag", etag)
http.ServeContent(w, req, "", time.Time{}, bytes.NewReader(jsonBody))

In this case, if http.ServeContent calls http.Error, it already forces Content-Type back to text/plain, but it leaves the Content-Length and Etag headers in place. This ends up being wrong for the error, though they would have been correct for a successful response.

CL 554216 added a deletion of Content-Length in http.Error. That's obviously correct, since Error is writing the content: the caller cannot predict how large it is. There was no problem with that CL.

CL 544109 did more: it deletes Content-Encoding, Etag, and Last-Modified, and it forces Cache-Control to no-cache.

Deleting Content-Encoding is correct for the same reasons as Content-Length.

I am not convinced that deleting Etag and Last-Modified is correct. Those describe the resource at the URL, and it seems sensible to me to say that a response can be an error that says “something about your request was invalid” but still also send back information about the underlying resource. It's not obviously wrong in the way that using the underlying resource's Content-Encoding or Content-Length is obviously wrong.

I am also not convinced that forcing Cache-Control to no-cache is correct. Error results can absolutely be cached. https://cloud.google.com/media-cdn/docs/caching, for example, describes how Google's CDN handles cached errors. An argument might be made that in situations like http.ServeContent, the caller can only set Cache-Control for the successful response, and so Error should assume that's not the expiry for an error. Perhaps that is true.

CL 544109 forced Cache-Control: no-cache instead of deleting the header. That broke many tests inside and outside Google.

CL 569815 changed the logic to only force Cache-Control to no-cache when it was already set to something else. This still breaks some tests inside Google; I am not sure about outside. It still seems incorrect to me. If we believe that the content being returned by Error is not accurately described by w.Header() on entry in certain ways, then why should these two code snippets produce different headers?

w.Header().Set("Cache-Control", "public, max-age=86400")
http.ServeFile(w, r, "/tmp/x.txt")

versus

http.ServeFile(w, r, "/tmp/x.txt")

I cannot justify why one error should have “Cache-Control: no-cache” while the other should have no header at all. Always or never is easier to justify than “depends on a header we already established shouldn't apply to the error response”.

I rolled back CLs 544109 and CL 569815 because of the Cache-Control breakage, and I am making this a proposal because of the compatibility implications (we've already had many broken tests).

It seems plenty defensible to me to change Error to do:

  • Delete Content-Length (already done, obviously inappropriate)
  • Delete Content-Encoding
  • Delete Etag
  • Delete Last-Modified

I propose we do those. I don't expect any objections here but won't be surprised if I am missing something.

For Cache-Control specifically, it seems like there are four options:

  1. Leave Cache-Control unmodified (historical Go behavior)
  2. Force Cache-Control to no-cache unconditionally (CL 544109)
  3. Replace extant Cache-Control with no-cache but leave unset unset (CL 569815)
  4. Delete Cache-Control unconditionally.

I propose we do (4), because:

  • We have already identified problems with (1), (2), and (3).
  • (4) aligns with the other deletions we already do.
  • (4) matches (1) for the many Go programs in the world that never think about Cache-Control at all, so it creates the least amount of churn (tied with (3), but (3) has the contradiction noted above).

If people are in favor of that proposal, then the next question is whether to gate these with a GODEBUG setting. The first two in the list seem like obvious bug fixes, because they are about the actual content form, and that's the job of func Error; the caller does not control that content. The last two are less defensible, because they are about the semantics of the response, and the caller may well want to be describing the error by setting those headers. For example consider code that tries to create a cacheable error today:

w.Header().Set("Cache-Control", "public, max-age=3600")
http.Error(w, "you're a teapot", 410)

Perhaps we should have a GODEBUG that controls the last three deletions?
Let's say httpcleanerrorheaders=0 preserves Etag, Last-Modified, and Cache-Control.

Then the complete proposal is

  • Delete Content-Length (already done, obviously correct)
  • Delete Content-Encoding (obviously correct)
  • Delete Etag
  • Delete Last-Modified
  • Delete Cache-Control
  • GODEBUG=httpcleanerrorheaders=0 disables the Etag, Last-Modified, and Cache-Control deletions
@ianlancetaylor
Copy link
Contributor

CC @neild @bradfitz

@mitar
Copy link
Contributor

mitar commented Mar 15, 2024

I am not convinced that deleting Etag and Last-Modified is correct. Those describe the resource at the URL, and it seems sensible to me to say that a response can be an error that says “something about your request was invalid” but still also send back information about the underlying resource. It's not obviously wrong in the way that using the underlying resource's Content-Encoding or Content-Length is obviously wrong.

It is not always wrong, but it can be wrong. In my own use case, I set immutable caching header with etag for the contents I am planing to serve (which I have in metadata in db) and then I go to serve the file. If that file serving fails for some reason (Read returns an error, or if range request is invalid, etc.) an error is returned. That error might be permanent or transient. For transient errors those caching headers and etag and last-modified should be removed. But it is hard to know if it is transient, so removing those headers is reasonable in my view.

Another perspective on this is that etag and last-modified is about content to be served or returned, but Error serves some completely different content. So they are probably misleading.

I propose we do (4), because:

I am OK with that.

GODEBUG=httpcleanerrorheaders=0 disables the Etag, Last-Modified, and Cache-Control deletions

One approach done in one iteration of past CLs was to extend Error with varargs list of headers to keep. So by default a set of headers is deleted, but one can pass Etag string as the varag argument to make sure it is kept. httpcleanerrorheaders could still control the default list of headers to keep.

@rittneje
Copy link

Perhaps we should have a GODEBUG that controls the last three deletions?
Let's say httpcleanerrorheaders=0 preserves Etag, Last-Modified, and Cache-Control.

That would mean that if, say, a library wants to call http.Error with Cache-Control showing up in the response, they now have to require all consumers to set httpcleanerrorheaders=0 instead of it "just working".

One approach done in one iteration of past CLs was to extend Error with varargs list of headers to keep.

This is a breaking change, since today I can do:

var f func(http.ResponseWriter, string, int) = http.Error

@mitar
Copy link
Contributor

mitar commented Mar 16, 2024

This is a breaking change, since today I can do:

True. That was an internal error function used just by ServeContent.

Maybe one solution could be that http.Error remove all headers listed above except for Cache-Control and that ServeContent additionally removes Cache-Control on errors (by using a small internal wrapper around http.Error).

External callers who want to remove Cache-Control as well on errors, could use a similar wrapper they make, or we could make some additional http.ErrorBis or something which does that as well and users can slowly migrate to it if they want. But the issue is that this should be then also called from most other places in stdlib and elsewhere.

My personal take is that I like current proposal that http.Error does remove all those headers because in almost all cases this is what improves the response and httpcleanerrorheaders seems like a good alternative to switch this off.

@apparentlymart
Copy link

apparentlymart commented Mar 18, 2024

ETag and Last-Modified are Validator Header Fields, for which there is some specific language in the RFC relating to this question:

Note that, depending on the status code semantics, the selected representation for a given response is not necessarily the same as the representation enclosed as response payload.

The part "depending on the status code semantics" suggests that whether it is correct to delete these depends on which status code is being returned.

However, I wasn't able to determine which part of this spec (or any other spec) explicitly describes "the status code semantics" for the purpose of this rule. Some of the status codes are described as "cacheable", which seems at least plausibly related to whether the response payload is "the selected representation", because otherwise it would not make sense to use If-None-Match/If-Modified-Since/etc to check if that cached representation is still current, but I can't find an explicit connection between those two ideas in the specification text.

With all of that said, it does seem like unsetting them is a good conservative default. It seems to me that doing anything "better" than that would require figuring out which status codes the above rule applies to and making Error behave differently depending on the status code, which seems unnecessarily complicated and perhaps a little too clever/surprising.

@neild
Copy link
Contributor

neild commented Mar 18, 2024

(Apologies for not being around to deal with rollbacks; four years of dodging COVID-19 have come to an end, and I was out sick last week.)

Error can't be all things to all users: It's reasonable to send an error with a Cache-Control header. It's also reasonable to want to send an error with no Cache-Control. We can satisfy some users by clearing Cache-Control and others by not, but either way someone is going to get behavior they don't want.

Error is also a very simple function; it isn't difficult for users to write their own version that does exactly what they want.

I think, therefore, that we should err on the side of preserving the current behavior, except in cases where the current behavior is objectively wrong in all cases. I'd really like to avoid introducing a GODEBUG and the concomitant cognitive complexity. "Error does this simple thing, which might not be perfect in all circumstances" is better than "Error does this or that depending on a global variable".

  • Content-Type: Since its introduction, Error has always set Content-Type: text/plain; charset=utf-8.
  • Content-Length: This obviously shouldn't be set, and so Error should delete it. (As it does now.)
  • Content-Encoding: I can't come up with a scenario where this can reasonably have a value. You theoretically could provide gzipped content to Error and set Content-Type: gzip, but that seems unlikely in practice. Error could probably delete it.
  • Etag, Last-Modified, Cache-Control, X-Content-Type-Options: There are scenarios where a user might want to delete these. There are also scenarios where a user might be setting one of them deliberately. Error should leave them alone.

@mitar
Copy link
Contributor

mitar commented Mar 18, 2024

(Apologies for not being around to deal with rollbacks; four years of dodging COVID-19 have come to an end, and I was out sick last week.)

I hope it was not a rough case.

Error is also a very simple function; it isn't difficult for users to write their own version that does exactly what they want.

Yes, unless Error is called from inside the standard library directly. In that case it should do the right thing and the result should be correct. The motivation behind all this was that http.ServeContent was returning invalid headers and that it has a contract that you have to set some headers (like Etag) before calling it, but it was not clearing them.

My first CL to fix http.ServeContent had internal "error" function doing the correct thing (wrapping http.Error). Then we moved its logic to Error and now there is push back on that.

Maybe we should reintroduce this new "error" function which does more header cleanup and call it from standard library (at least from http.ServeContent) while keeping changes to http.Error limited to only those which work in all (reasonable) cases.

The question might then be: should this new "error" function be made public or should it be just internal? In a way, it is easy also for others to wrap http.Error in a similar way? But maybe it is useful to make it public?

@neild
Copy link
Contributor

neild commented Mar 18, 2024

I agree that we should add an internal error function for ServeContent. I think it's fine for it to stay internal; it would be easy for us to provide something that is useful for some users some of the time, but the bar for adding a new API is higher than that. (If we can do better than Error for most users most of the time, that'd probably meet the bar, but I'm not sure that any of the suggested options would satisfy that.)

@mitar
Copy link
Contributor

mitar commented Mar 19, 2024

I also noticed in our own code, that we do set Content-Length before calling http.Error. We in fact have such a wrapper:

func Error(w http.ResponseWriter, _ *http.Request, code int) {
	body := http.StatusText(code)
	w.Header().Set("Cache-Control", "no-cache")
	// http.Error appends a newline so we have to add 1.
	w.Header().Set("Content-Length", strconv.Itoa(len(body)+1))
	http.Error(w, body, code)
}

So not sure even about removing Content-Length. Maybe Content-Length should be removed if the value does not match the length of the content to be written (+1)?

@rsc
Copy link
Contributor Author

rsc commented Mar 26, 2024

Removing Content-Length is at best important and at worst harmless, since the protocol will figure out the content length either way.

The GODEBUG was only meant as a transitional mechanism so that people control when the change happens, not a long-term knob. If we did the extra deletions, code that wanted to send those headers would inline the pieces they wanted of http.Error, not set the GODEBUG.

That said I am also happy to keep deleting Content-Length, delete Content-Encoding, and stop there.

@rsc
Copy link
Contributor Author

rsc commented Mar 27, 2024

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc
Copy link
Contributor Author

rsc commented Apr 3, 2024

Following @neild's suggestion, I revise this proposal to:

  • Delete Content-Length (already done, obviously correct)
  • Delete Content-Encoding (obviously correct)

and nothing else.

@mitar
Copy link
Contributor

mitar commented Apr 3, 2024

But we can still delete the other headers in the response from ServeContent, yes? So the proposal is just to not delete them in general from Error?

@rsc
Copy link
Contributor Author

rsc commented Apr 10, 2024

@mitar what specific headers would ServeContent delete before calling Error? From https://pkg.go.dev/net/http#ServeContent it looks like just Etag?

@mitar
Copy link
Contributor

mitar commented Apr 10, 2024

@rsc: Etag, Last-Modified, Cache-Control (or override it to no-cache).

@rsc
Copy link
Contributor Author

rsc commented Apr 24, 2024

Having ServeContent delete Etag, Last-Modified, and Cache-Control on its error path sounds fine.

@mitar
Copy link
Contributor

mitar commented Apr 24, 2024

Great.

What about setting Cache-Control to no-cache?

@rsc
Copy link
Contributor Author

rsc commented Apr 24, 2024

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

The proposal is that http.Error will delete these headers:

  • Content-Length (already done, obviously correct)
  • Content-Encoding (obviously correct)

and http.ServeContent will delete these headers before calling http.Error:

  • Etag, Last-Modified, Cache-Control

And these would be documented in the doc comments for these functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Likely Accept
Development

No branches or pull requests

7 participants