Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: UTF in HTTP header field value #49627

Closed
JonasJasas opened this issue Nov 17, 2021 · 8 comments
Closed

net/http: UTF in HTTP header field value #49627

JonasJasas opened this issue Nov 17, 2021 · 8 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.

Comments

@JonasJasas
Copy link

What version of Go are you using (go version)?

1.17.3

Does this issue reproduce with the latest release?

Yes

What did you do?

Using this: https://cs.opensource.google/go/go/+/refs/tags/go1.17.3:src/net/http/header.go;l=28

w.Header().Add("utf-test", "ąčęėįšųū„“ž")

What did you expect to see?

utf-test: %C4%85%C4%8D%C4%99%C4%97%C4%AF%C5%A1%C5%B3%C5%AB%E2%80%9E%E2%80%9C%C5%BE

What did you see instead?

utf-test: ����įšųū��ž

Current http/header implementation is automatically producing invalid HTTP requests if UTF is used in the header field value. So I guess it would be way better to use percent-encoding even though it is not in RFC.

@dr2chase
Copy link
Contributor

@neild, any idea what to do here?

@dr2chase dr2chase added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Nov 17, 2021
@dr2chase
Copy link
Contributor

This looks like there must be a duplicate, or perhaps there is some other reason.
@FiloSottile is this something relevant to you?

@neild
Copy link
Contributor

neild commented Nov 17, 2021

Percent-encoding the header is clearly not correct. If you want to encode a header value you should do that yourself, since there is (so far as I know) no universal specification for the encoding of header values.

We do perform some validation of header values, but it is currently limited to changing newlines to whitespace. This does prevent an invalid header value from completely changing the content of a request, but it permits invalid header values.

If we do anything here, I think it should be to drop invalid header values entirely. I suspect someone has managed to depend on the current behavior, however, and sending a bad header and receiving an error back is arguably more informative to the user than dropping the header altogether.

@seankhliao seankhliao changed the title UTF in HTTP header field value net/http: UTF in HTTP header field value Nov 17, 2021
@JonasJasas
Copy link
Author

https://datatracker.ietf.org/doc/html/rfc2616
*TEXT MAY contain characters from character sets other than ISO-8859-1 only when encoded according to the rules of RFC 2047

https://datatracker.ietf.org/doc/html/rfc2047

As it is documented standard would be nice to have it in implementation.

@neild
Copy link
Contributor

neild commented Nov 18, 2021

RFC 7230 supersedes 2616, and says:

Historically, HTTP has allowed field content with text in the
ISO-8859-1 charset [ISO-8859-1], supporting other charsets only
through use of [RFC2047] encoding. In practice, most HTTP header
field values use only a subset of the US-ASCII charset [USASCII].
Newly defined header fields SHOULD limit their field values to
US-ASCII octets. A recipient SHOULD treat other octets in field
content (obs-text) as opaque data.

I also note that the example above doesn't appear to be an invalid header value, since it contains only characters which match the VCHAR and obs-text rules.

I still think that if you've got a header field that should be RFC-2047 encoded, then you should apply that encoding yourself.

@FiloSottile
Copy link
Contributor

That does not look like an invalid header field value. It would be an invalid header field name, but as @neild pointed out it's valid to have arbitrary data in header fields according to RFC 7230.

The reason it renders as Ä�Ä�Ä�Ä�įšųūâ��â��ž instead of ąčęėįšųū„“ž must lie in whatever was used to print it. (If the bytes on the wire are not the valid UTF-8 representation of ąčęėįšųū„“ž, that would be a bug, but we'd need a reproducer to look at that.)

@neild
Copy link
Contributor

neild commented Nov 18, 2021

Note that we will send an invalid header field value if you put actually invalid data in the value. e.g., w.Header().Add("x", "\x00"). Perhaps we should not do that.

We don't have a good path to return errors in header value validation to the user, however, so our options are limited to eliding invalid data, dropping the header, sending the bad value and letting the server sort it out, or (not a good idea) panicking.

@JonasJasas
Copy link
Author

  1. I didn't know that almost anything is valid for header field value (made wrong assumptions about UTF).
  2. The issue caused string was not the one I provided in this bug.
  3. Fore some reason Chinese string started with the \x03 character that made Envoy upset.

If there are characters that should never go to the header field value they should be escaped or removed. When I use a library I expect that it will do these validations for me.

Thank you for your patience. Golang has the best community!

@golang golang locked and limited conversation to collaborators Nov 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

5 participants