Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: http.Get unnecessary data transfer #36242

Open
johnfitzz opened this issue Dec 21, 2019 · 8 comments
Open

net/http: http.Get unnecessary data transfer #36242

johnfitzz opened this issue Dec 21, 2019 · 8 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@johnfitzz
Copy link

johnfitzz commented Dec 21, 2019

http roundtrippers transfer unnecessary amount of data even if nothing is read from the response body.

In my production code (which is not demonstrated here) I'm reading the first 10-15 bytes from a large file over http. However, http roundtrippers transfer unnecessary data. In the real code I'm using io.LimitedReader and/or CopyN or something to limit the data. However, this issue is not about how to transfer data efficiently, this issue is about roundtrippers' inefficient behavior.

What version of Go are you using (go version)?

go version go1.13 darwin/amd64

What version of OS X are you using?

ProductName:	Mac OS X
ProductVersion:	10.15.1
BuildVersion:	19B88

What did you do?

See the code on playground.
Note: I created this code only to demonstrate the issue.

resp, err := http.Get("https://raw.githubusercontent.com/IBM/MAX-Image-Resolution-Enhancer/master/samples/test_examples/original/airplane.png")
if err != nil {
	log.Fatal(err)
}
defer resp.Body.Close()

// Here, before the transfer, http2 roundtripper buffers up 150KB to 2MB or so of the airplane.png.
// Please see the code for details. 
// ...

What did you expect to see?

10 KiB is arbitrary here but I wanted to see less amount of data is being transferred.

$ nettop -m tcp -p $PID
                                                  bytes_in   bytes_out
gettest.2241                                      10 KiB     610 B
   tcp4 192.168.1.36:50082<->151.101.12.133:443   10 KiB     610 B

What did you see instead?

This is what I saw — it transferred unnecessary amount of data:

# when the transport is pconn.RoundTrip(req)
$ nettop -m tcp -p $PID
                                                 bytes_in   bytes_out
gettest.2241                                     178 KiB    610 B
   tcp4 192.168.1.36:50082<->151.101.12.133:443  178 KiB    610 B

# when the transport is pconn.alt.RoundTrip(req)
$ nettop -m tcp -p $PID
                                                 bytes_in   bytes_out
gettest.2241                                     2200 KiB   610 B
   tcp4 192.168.1.36:50082<->151.101.12.133:443  2200 KiB   610 B

Investigation:

transport.go#L530-L536 here, when it selects the alternative transport (pconn.alt.RoundTrip(req)), it buffers up a lot of data even before I read from the response.Body.

@networkimprov
Copy link

networkimprov commented Dec 21, 2019

Did you try a Range header? https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Range

Re playground link, it has a for {...} which reads the whole body, and io.Copy() instead of io.CopyN().

PS: You'll get faster help with questions by posting to golang-nuts, reddit, etc...

@johnfitzz
Copy link
Author

johnfitzz commented Dec 21, 2019

@networkimprov This issue isn't a question, it's about reporting the inefficient behavior of http roundtrippers. They read a lot of data before a read from a response body.

The playground code is about testing the behavior of http roundtrippers on a higher-level, I'm not using that code in production—I created it just to demonstrate the issue.

@agnivade agnivade added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Dec 22, 2019
@agnivade agnivade added this to the Unplanned milestone Dec 22, 2019
@RodionGork
Copy link

RodionGork commented Dec 28, 2019

@johnfitzz sorry for I dared to look into this too. Can you please specify which version of nettop you use (I see several over the web and none in the default ubuntu repo for example). Can't yet verify exactly the same behavior with other tools.

Also as a side note I'm not sure "inefficient behavior" is the correct term. Connection first fetches chunk of data and only then we parse where is the header end. It's rather question of what prefetch size we think is better and whether we can control this. As you use GET instead of HEAD it is normal to expect you would like to receive body and so prefetching part of it doesn't look wrong. In this sense approach with LimitedReader which you describe looks quite proper for altering normal behavior.

@johnfitzz
Copy link
Author

johnfitzz commented Dec 30, 2019

@RodionGork In http/2, the prefetched chunk of data is about 50% of the whole body, it consumes network and memory resources.

To me inefficiency is the correct term here because the package hands us a reader but it prefetches a lot of data without asking us. It's uncontrollable. You never know how many bytes you get when you use the http package. Another problem is that there is no good documentation that it prefetches a huge amount of body data. It's a blackbox.

I've debugged and analyzed the rountripping code several times. It fires up goroutines and continuously fetches data from the server before you read a single byte from the body reader.

It may be normal to get some body data (that the server sends) but the http package itself prefetches megabytes of data (50% of the data, especially in http/2) and to me this is not "efficient". It's an implicit, inefficient behavior, and almost uncontrollable.

The LimitedReader approach doesn't make sense because the roundtrippers already cache a lot of data before I read.


I'm on OS X and nettop version depends on the version of OS X which is 10.15.1. I validated the same issue using various tools like OS X's Activity Monitor, Wireshark etc. For Ubuntu, check this out. This issue is not related to a specific tool but rather it's about the http package. Also make sure that you're fetching with http/2 to see the whole problem.

@networkimprov
Copy link

You didn't comment on my question about the HTTP Range header above.

@johnfitzz
Copy link
Author

johnfitzz commented Dec 30, 2019

@networkimprov Say you're going to download the whole body but if you read some unwanted bytes down the stream, you want to interrupt the download. If it's not clear where you can find those bytes in the body, you can't use a range header. In the current http package you can't do that efficiently because it already prefetches a lot of data.

As I said, this issue is not about http range header etc. It's about the behavior of the http package itself.

@networkimprov
Copy link

Do a series of GETs with consecutive ranges; check for unwanted bytes after each. http.Client can't divine the optimal segment size for your download.

I wouldn't assume the behavior of this widely used API is wrong for most use cases, so if I'd noticed this, I'd have posted a Q on golang-nuts.

@johnfitzz
Copy link
Author

@networkimprov As I said before, I'm not looking for solutions for my specific case, thank you. Rather, I'm talking about that the http package is doing unnecessary data transfer, it's uncontrollable and undocumented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

4 participants