Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: Transport doesn't support deflate (and Twitter is broken) #18779

Closed
ThomasHabets opened this issue Jan 24, 2017 · 16 comments
Closed

net/http: Transport doesn't support deflate (and Twitter is broken) #18779

ThomasHabets opened this issue Jan 24, 2017 · 16 comments

Comments

@ThomasHabets
Copy link
Contributor

ThomasHabets commented Jan 24, 2017

What version of Go are you using (go version)?

go version go1.7.4 linux/amd64

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH=""
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build044368586=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"

What did you do?

https://play.golang.org/p/ArzIv0uBlQ

What did you expect to see?

I see binary pre-decompression crap.

net/http doc says "If the Transport requests gzip on its own and gets a gzipped response, it's transparently decoded in the Response.Body. However, if the user explicitly requested gzip it is not automatically uncompressed".

I'm not requesting it, so it's requested on its own.

What did you see instead?

I expect it to be decompressed, since I didn't request compression.

I don't know if this is because blog.twitter.com does something out of spec.

@bradfitz
Copy link
Contributor

Go is telling Twitter "Accept-Encoding: gzip", and Twitter is replying "Content-Encoding: deflate".

Seems like a bug on their side.

2017/01/24 21:59:23 http2: Transport failed to get client conn for blog.twitter.com:443: http2: no cached connection was available
2017/01/24 21:59:23 http2: Transport creating client conn 0xc4200016c0 to 199.59.150.42:443
2017/01/24 21:59:23 http2: Framer 0xc4201181a0: wrote SETTINGS len=18, settings: ENABLE_PUSH=0, INITIAL_WINDOW_SIZE=4194304, MAX_HEADER_LIST_SIZE=10485760
2017/01/24 21:59:23 http2: Framer 0xc4201181a0: wrote WINDOW_UPDATE len=4 (conn) incr=1073741824
2017/01/24 21:59:23 http2: Transport encoding header ":authority" = "blog.twitter.com"
2017/01/24 21:59:23 http2: Transport encoding header ":method" = "GET"
2017/01/24 21:59:23 http2: Transport encoding header ":path" = "/2017/the-infrastructure-behind-twitter-scale"
2017/01/24 21:59:23 http2: Transport encoding header ":scheme" = "https"
2017/01/24 21:59:23 http2: Transport encoding header "accept-encoding" = "gzip"
2017/01/24 21:59:23 http2: Transport encoding header "user-agent" = "Go-http-client/2.0"
2017/01/24 21:59:23 http2: Framer 0xc4201181a0: wrote HEADERS flags=END_STREAM|END_HEADERS stream=1 len=69
2017/01/24 21:59:23 http2: Framer 0xc4201181a0: read SETTINGS len=6, settings: INITIAL_WINDOW_SIZE=65536
2017/01/24 21:59:23 http2: Transport received SETTINGS len=6, settings: INITIAL_WINDOW_SIZE=65536
2017/01/24 21:59:23 http2: Framer 0xc4201181a0: wrote SETTINGS flags=ACK len=0
2017/01/24 21:59:23 http2: Framer 0xc4201181a0: read SETTINGS flags=ACK len=0
2017/01/24 21:59:23 http2: Transport received SETTINGS flags=ACK len=0
2017/01/24 21:59:24 http2: Framer 0xc4201181a0: read HEADERS flags=END_HEADERS stream=1 len=1030
2017/01/24 21:59:24 http2: decoded hpack field header field ":status" = "200"
2017/01/24 21:59:24 http2: decoded hpack field header field "age" = "0"
2017/01/24 21:59:24 http2: decoded hpack field header field "cache-control" = "public, max-age=60"
2017/01/24 21:59:24 http2: decoded hpack field header field "content-encoding" = "deflate"
2017/01/24 21:59:24 http2: decoded hpack field header field "content-language" = "en"
2017/01/24 21:59:24 http2: decoded hpack field header field "content-security-policy" = "default-src https: data:; report-uri https://twitter.com/i/csp_report?a=M5QXUZLCN4%3D%3D%3D%3D%3D%3D&ro=false; img-src https: data: ; script-src https://*.twitter.com https://*.twimg.com https://*.vine.co https://ssl.google-analytics.com https://bat.bing.com 'unsafe-eval' ; font-src https: data: ; frame-src https://* chrome-extension: about: javascript: ; connect-src https: ; media-src https: ; object-src https: ; style-src https:"
2017/01/24 21:59:24 http2: decoded hpack field header field "content-type" = "text/html; charset=utf-8"
2017/01/24 21:59:24 http2: decoded hpack field header field "date" = "Tue, 24 Jan 2017 21:59:24 GMT"
2017/01/24 21:59:24 http2: decoded hpack field header field "expires" = "Tue, 24 Jan 2017 22:00:23 +0000"
2017/01/24 21:59:24 http2: decoded hpack field header field "last-modified" = "Tue, 24 Jan 2017 21:59:23 GMT"
2017/01/24 21:59:24 http2: decoded hpack field header field "link" = "</node/8676>; rel=\"shortlink\",<https://blog.twitter.com/2017/the-infrastructure-behind-twitter-scale>; rel=\"canonical\",<https://blog.twitter.com/sites/all/themes/gazebo/img/twitter-bird-white-on-blue.png>; rel=\"image_src\""
2017/01/24 21:59:24 http2: decoded hpack field header field "server" = "tsa_a"
2017/01/24 21:59:24 http2: decoded hpack field header field "set-cookie" = "guest_id=v1%3A148529516384605422; Domain=.twitter.com; Path=/; Expires=Thu, 24-Jan-2019 21:59:24 UTC"
2017/01/24 21:59:24 http2: decoded hpack field header field "strict-transport-security" = "max-age=631138519"
2017/01/24 21:59:24 http2: decoded hpack field header field "vary" = "Cookie"
2017/01/24 21:59:24 http2: decoded hpack field header field "x-connection-hash" = "1d0990af0b5cbb39d969c1d7c4a5c7b2"
2017/01/24 21:59:24 http2: decoded hpack field header field "x-content-type-options" = "nosniff"
2017/01/24 21:59:24 http2: decoded hpack field header field "x-drupal-cache" = "MISS"
2017/01/24 21:59:24 http2: decoded hpack field header field "x-frame-options" = "sameorigin"
2017/01/24 21:59:24 http2: decoded hpack field header field "x-gazebo-app-rev" = "v414"
2017/01/24 21:59:24 http2: decoded hpack field header field "x-gazebo-git-rev" = "097c9a635c93adea404a02bc2abd3578ab76d43d"
2017/01/24 21:59:24 http2: decoded hpack field header field "x-gazebo-host" = "s14"
2017/01/24 21:59:24 http2: decoded hpack field header field "x-response-time" = "1067"
2017/01/24 21:59:24 http2: decoded hpack field header field "x-ua-compatible" = "IE=edge,chrome=1"
2017/01/24 21:59:24 http2: decoded hpack field header field "x-varnish" = "1759374077"
2017/01/24 21:59:24 http2: decoded hpack field header field "x-varnish-cache" = "MISS"
2017/01/24 21:59:24 http2: decoded hpack field header field "x-varnish-l-curl" = "0"
2017/01/24 21:59:24 http2: decoded hpack field header field "x-xss-protection" = "1; mode=block"
2017/01/24 21:59:24 http2: Transport received HEADERS flags=END_HEADERS stream=1 len=1030
2017/01/24 21:59:24 http2: Framer 0xc4201181a0: read DATA stream=1 len=1832 data="x\x9c\xec\x18\xdbR\xe38\xf6\x9d\xafк\xab橅s#\x17\x9aP\x9b\x04\x1a\x02M\b\x04:\x84\xed\xa9\x94l+\xb6\x88,\x19K\xb9\x98٩\xda\xdf\xd8\xdf\xdb/\xd9#9!\xa1\x87\xde\xedڢ{_\xe6%9>ҹ\xe8\xe8\\\xb5s\xf0\x97\xa3\xcb\xceͨ\u007f\x8c\"\x1d\xf3Ý\x03\xf3\x87\x02\x966\x1d\xaeSg\a\xa1e̅ڗa\xf3\x97Ǚ\xd4\x1f\"\xad\x93}וa\xb2\x1bSW\xa8w9\x1aq\"¦C\x85\x83|N\x94j:\xce\xe1\x0eP\x1fD\x94\x04\x87\x00\x00\x18SM\x90\xa1\xc7\xf4q\xc6\xe6M\xa7#\x85\xa6B\xe3\x9b,\xa1@\x97\u007f5\x1dM\x97\xda5z|@~DREus\xa6'\xb8\xee \x17\x14\xb4\\\x04\x89)l\\0\xadi\xba?K\xf9\x16\xb9\x91\xa0@E\x8f\xcbpw\xb5eח\xb1[*\x14k\xae\x8e(fb\x92\x12\xa5ә\xafg)\xc5\x1e\x8d\x98\b\xf0j+V>\xe1\xf4\x9b\xb2\x14" (1576 bytes omitted)

See http://stackoverflow.com/questions/388595/why-use-deflate-instead-of-gzip-for-text-files-served-by-apache for why gzip is preferred over deflate.

Not sure there's anything for us to do here. I suppose we could attempt to un-deflate things.

/cc @tombergan @dsnet

@bradfitz bradfitz added this to the Go1.9Maybe milestone Jan 24, 2017
@bradfitz bradfitz changed the title net/http: Results not decompressed sometimes net/http: Transport doesn't support deflate (and Twitter is broken) Jan 24, 2017
@dsnet
Copy link
Member

dsnet commented Jan 24, 2017

This is definitely a bug on twitter's end; we asked for "gzip", and they gave us "deflate".

I don't think we should transparently decompress things with the "deflate" TE since there is some confusion about what the actual format of "deflate" is. RFC 2616 specifies that "deflate" should actually be the zlib format (RFC 1950), but there are some implementations that accidentally treat it as raw DEFLATE (RFC 1951). If we tried to transparently decompress, we could run into decompression errors because of the wrong format. Worse yet, neither raw DEFLATE or zlib have any magic values, so you cannot differentiate it from the zlib format. The internet ended up avoiding "deflate" as a TE and moved to "gzip" to avoid this confusion.

I wouldn't want the Go implementation to get mixed up in all that confusion.

@bradfitz
Copy link
Contributor

Copying https://github.com/twitter/netty-http2 folk who can maybe help: @yschimke, @atollena

Twitter appears to be returning "deflate"-compressed responses when a client only asks for "gzip".

@tombergan
Copy link
Contributor

This is definitely a bug on twitter's end; we asked for "gzip", and they gave us "deflate".

If we want to language-lawyer RFC 7231, I believe that servers are technically not required to obey Accept-Encoding; that header is just a recommendation. For example, note the use of SHOULD instead of MUST:

https://tools.ietf.org/html/rfc7231#section-5.3.4

If an Accept-Encoding header field is present in a request and none of the available representations for the response have a content-coding that is listed as acceptable, the origin server SHOULD send a response without any content-coding.

That said, I would still consider this a bug in Twitter's server ... any other interpretation invites insanity.

Some clients will decode any response with a known Content-Encoding, independent of whether or not the encoding was explicitly allowed via the request's Accept-Encoding. One such client is Chrome. It would not be entirely unreasonable for a Go program to do the same, however, I probably would not bake that behavior into the standard library.

I don't think we should transparently decompress things with the "deflate" TE since there is some confusion about what the actual format of "deflate" is. RFC 2616 specifies that "deflate" should actually be the zlib format (RFC 1950), but there are some implementations that accidentally treat it as raw DEFLATE (RFC 1951).

I think you're saying that the meaning of "Content-Encoding: deflate" is ambiguous, due to the existence of some broken implementations? This problem seems orthogonal to the Accept-Encoding issue, and in any case, I'd be inclined to consider an implementation broken if it does not follow RFC 2616. The updated text from RFC 7230/7234 agrees with the old text from RFC 2616:
https://tools.ietf.org/html/rfc7231#section-8.4
https://tools.ietf.org/html/rfc7230#section-4.2

@luciferous
Copy link

@bradfitz I doubt Yuri or Antoine are listening in to this nowadays. @mosesn

@mattn
Copy link
Member

mattn commented Jan 27, 2017

@mosesn
Copy link

mosesn commented Jan 27, 2017

Cannot reproduce:

$ curl -s -D - -H 'Accept-Encoding: gzip' https://blog.twitter.com/2017/the-infrastructure-behind-twitter-scale -o /dev/null
HTTP/1.1 200 OK
age: 0
cache-control: public, max-age=60
content-encoding: gzip
content-language: en
content-security-policy: default-src https: data:; report-uri https://twitter.com/i/csp_report?a=M5QXUZLCN4%3D%3D%3D%3D%3D%3D&ro=false; img-src https: data: ; script-src https://*.twitter.com https://*.twimg.com https://*.vine.co https://ssl.google-analytics.com https://bat.bing.com 'unsafe-eval' ; font-src https: data: ; frame-src https://* chrome-extension: about: javascript: ; connect-src https: ; media-src https: ; object-src https: ; style-src https:
content-type: text/html; charset=utf-8
date: Fri, 27 Jan 2017 07:43:28 GMT
expires: Fri, 27 Jan 2017 07:44:27 +0000
last-modified: Fri, 27 Jan 2017 07:43:27 GMT
link: </node/8676>; rel="shortlink",<https://blog.twitter.com/2017/the-infrastructure-behind-twitter-scale>; rel="canonical",<https://blog.twitter.com/sites/all/themes/gazebo/img/twitter-bird-white-on-blue.png>; rel="image_src"
server: tsa_a
set-cookie: guest_id=v1%3A148550300722182200; Domain=.twitter.com; Path=/; Expires=Sun, 27-Jan-2019 07:43:28 UTC
strict-transport-security: max-age=631138519
transfer-encoding: chunked
vary: Cookie
x-connection-hash: 49936bf73c44abf5429426fc8cbc66bd
x-content-type-options: nosniff
x-drupal-cache: MISS
x-frame-options: sameorigin
x-gazebo-app-rev: v414
x-gazebo-git-rev: 097c9a635c93adea404a02bc2abd3578ab76d43d
x-gazebo-host: s4
x-response-time: 1072
x-ua-compatible: IE=edge,chrome=1
x-varnish: 2125593008
x-varnish-cache: MISS
x-varnish-l-curl: 0
x-xss-protection: 1; mode=block

@mosesn
Copy link

mosesn commented Jan 27, 2017

Ah, I wrote too soon. Let me try with http/2. Getting a curl which can do http/2, will report back soon.

@mosesn
Copy link

mosesn commented Jan 27, 2017

As far as I can tell, on http/2, twitter.com always gzips, and blog.twitter.com always deflates. Seems odd to me, I'll poke around tomorrow.

@yschimke
Copy link

I worked around this twitter bug 6 months ago. Apologies, I should have actually worked with you guys to fix it.

https://github.com/yschimke/oksocial/blob/master/src/main/java/com/baulsupp/oksocial/services/twitter/TwitterDeflatedResponseInterceptor.java

@bradfitz
Copy link
Contributor

@mosesn, thanks!

gaul added a commit to gaul/anaconda that referenced this issue Apr 18, 2017
Twitter returns deflate data despite the client only requesting gzip
data.  net/http automatically handles the latter but not the former:
golang/go#18779

Fixes ChimeraCoder#170.
gaul added a commit to gaul/anaconda that referenced this issue May 18, 2017
Twitter returns deflate data despite the client only requesting gzip
data.  net/http automatically handles the latter but not the former:
golang/go#18779

Fixes ChimeraCoder#170.
@bradfitz bradfitz modified the milestones: Go1.10, Go1.9Maybe Jun 7, 2017
gaul added a commit to gaul/anaconda that referenced this issue Aug 31, 2017
Twitter returns deflate data despite the client only requesting gzip
data.  net/http automatically handles the latter but not the former:
golang/go#18779

Fixes ChimeraCoder#170.
ChimeraCoder pushed a commit to gaul/anaconda that referenced this issue Sep 3, 2017
Twitter returns deflate data despite the client only requesting gzip
data.  net/http automatically handles the latter but not the former:
golang/go#18779

Fixes ChimeraCoder#170.
ChimeraCoder pushed a commit to ChimeraCoder/anaconda that referenced this issue Sep 3, 2017
Twitter returns deflate data despite the client only requesting gzip
data.  net/http automatically handles the latter but not the former:
golang/go#18779

Fixes #170.
@rsc
Copy link
Contributor

rsc commented Nov 22, 2017

This is still happening but it still seems like it must be Twitter's fault, unless HTTP/2 requires all clients to support deflate decoding. Moving to Go 1.11.

@rsc rsc modified the milestones: Go1.10, Go1.11 Nov 22, 2017
@yschimke
Copy link

@rsc see my comment above, Twitter may fix it, but it is a known twitter bug you will probably need to workaround.

@mosesn
Copy link

mosesn commented Dec 20, 2017

@rsc @bradfitz this is in flight, targeted for early January. sorry for the long delay, it fell off my plate for a bit. https://twittercommunity.com/t/improving-the-twitter-api-support-for-http-2/98728

@mosesn
Copy link

mosesn commented Jan 4, 2018

@rsc @bradfitz this has been fixed, we now handle accept-encoding correctly.

@bradfitz
Copy link
Contributor

bradfitz commented Jan 4, 2018

@mosesn, thanks for the fix and update! Will close.

@bradfitz bradfitz closed this as completed Jan 4, 2018
@golang golang locked and limited conversation to collaborators Jan 4, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

10 participants