Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/http2: client can't fully utilize network resource #37373

Open
ochanism opened this issue Feb 22, 2020 · 13 comments
Open

x/net/http2: client can't fully utilize network resource #37373

ochanism opened this issue Feb 22, 2020 · 13 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@ochanism
Copy link

ochanism commented Feb 22, 2020

What version of Go are you using (go version)?

$ go version
go version go1.13.6 darwin/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/jayden/Library/Caches/go-build"
GOENV="/Users/jayden/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/jayden/Workspace/GoProject"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Users/jayden/Workspace/GoProject/src/golang.org/x/net/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/j1/94761d9d49j6gz5p55375sbr0000gn/T/go-build875938892=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

I ran an http2 webdav server and I made a client with golang x/net/http2 for a test.
The client uploaded a 1 MB file infinitely with a few hundred goroutine.
MaxConcurrentStreams of the server was 100 and StrictMaxConcurrentStreams of the client was false.
I measured the throughput between the server and the client on AWS.
And network bandwidth between the server and the client was 150 Gbps in the infrastructure level.

What did you expect to see?

I expected that the throughput would be around 150 Gbps.

What did you see instead?

The measured throughput was 90 Gbps.

Proposal

I figured out what caused this. I modified the http2 connection pool a little as shown in the below link.
(ochanism/net@b2cc93d)
With the above code, I could achieve 150 Gbps throughput.
For achieving high throughput, multiple TCP connections are needed.
But the current version of the http2 connection pool increases TCP connections only when there is no available TCP connection (CurrentMaxStreams == MaxConcurrentStreams for all the connections).
I reduced MaxConcurrentStreams value at the server-side, but the number of TCP connections wasn't increased than I expected.

So I added a MinConcurrentConns field to http2 Transport.
MinConcurrentConns is the minimum number of TCP connections and ClientConnPool tries to maintain MinConcurrentConns TCP connections at least. If 0, ClientConnPool creates a TCP connection only when needed.

Please review my proposal.

@fraenkel
Copy link
Contributor

Curious if you just tried to walk backwards through the connections?

@ochanism
Copy link
Author

Curious if you just tried to walk backwards through the connections?

@fraenkel
Could you elaborate your question?

@fraenkel
Copy link
Contributor

@ochanism Your fix was to randomly start somewhere in the list. The current code walks forward. Since connections are added to the end, in most cases, the next connection with availability will be the last one. Obviously it all depends on which streams in which connections are closed.

Ultimately the bigger issue is that one has no way of configuring your new option from net/http. Now if this is only for use from net/http2 its less of an issue.

Its also related to #34511

@ochanism
Copy link
Author

@fraenkel
Client in net/http doesn't have this issue that I raised since the number of TCP connections for connection pooling can be adjusted via MaxIdleConns. However, there is no way to control the number of TCP connections in net/http2. So I've added MinConcurrentConns to control it.
Moreover, walking connections backward cannot resolve this issue. In this case, every request will select the last connection at a high probability and the remaining connections in the pool will be expired
due to IdleTimeout. So I've made the client randomly select a connection within MinConcurrentConns window to prevent the expiration.

@fraenkel
Copy link
Contributor

Your selection will have similar issues in practice. Randomly selecting won't solve optimizing the network traffic across multiple connections. The underlying locking is a bit loose and writes currently lock reads. Which means depending on how loaded (# of streams) a connection is, and how much data is being sent back, the network may appear to be utilized but you are still dealing with lots of contention.

The fact that you are creating a new rand for each request does not provide any guarantees over the selection across the connections.

@ochanism
Copy link
Author

@fraenkel
Yes, you're right. All the random algorithms in real world have the same problem. The goal of my proposal is not to utilize network resource optimally. The main issue of the current http2 client is to use a small number of tcp connections even if there is a large traffic (not high frequent request). I can make it optimal, but I don't want to make it complex. I think the random algorithm shows a good performance for generic use cases. Of course, as you mentioned, there could be an issue for some edge cases.

@cagedmantis cagedmantis added this to the Backlog milestone Feb 27, 2020
@cagedmantis cagedmantis added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Feb 27, 2020
@cagedmantis
Copy link
Contributor

/cc @bradfitz @tombergan

@ochanism
Copy link
Author

@cagedmantis @bradfitz @tombergan
Please give me any feedback.
I've been waiting for your review for 1 month.

@bradfitz
Copy link
Contributor

It sounds like a knob we wouldn't want to expose to users.

How would you document how to tune it?

I'd rather just see it be automatically fast instead.

@ochanism
Copy link
Author

It sounds like a knob we wouldn't want to expose to users.

How would you document how to tune it?

I'd rather just see it be automatically fast instead.

@bradfitz
I agree with your opinion. It would be very nice if the value is adjusted automatically according to the current traffic pattern. But as you know, estimating and understanding the traffic patterns is not an easy problem. Many variables are related to it.

I think the MinConcurrentConns field has a similar role to the MaxIdleConns field in HTTP transport. (Not exact same, but similar)
Normally, most users don't need to care about MinConcurrentConns value. When MinConcurrentConns = 0, it will work as the previous. But some users will face the low-throughput problem like me, and they need to find an appropriate MinConcurrentConns value for their traffic pattern.

I've been running a server with this library in production, and there was a huge gain in terms of throughput. When I used the vanilla net/http library, an HTTP client uses only one or two TCP connections even if there is huge traffic. When I set strictMaxConcurrentStreams to false, I expected that the number of TCP connections would be increased if there is burst traffic. But it didn't work as I expected.

What do you think about this? Do you have any idea?

@ncw
Copy link
Contributor

ncw commented Sep 2, 2020

We've noticed this in rclone too rclone/rclone#3631 (comment)

When uploading 4 files at once to Google Drive

Without HTTP/2: 4 Connections (on netstat) & Total Speed: 320 MB/s
With HTTP/2: 1 Connection (on netstat) & Total Speed: 160 MB/s

So that is 50% of the speed using HTTP/2. My conjecture is that would go faster if we could persuade the HTTP/2 transport to use more TCP connections but I don't know how to do that.

@aojea
Copy link
Contributor

aojea commented Nov 26, 2021

Without HTTP/2: 4 Connections (on netstat) & Total Speed: 320 MB/s
With HTTP/2: 1 Connection (on netstat) & Total Speed: 160 MB/s

I can reproduce similar values with following snippet, by changing the DisableKeepAlives option, that allows to create multiple connections per request when is set to true

package main

import (
	"fmt"
	"io/ioutil"
	"log"
	"net/http"
	"net/http/httptest"
	"strings"
	"sync"
	"time"
)

func handler(w http.ResponseWriter, r *http.Request) {
	start := time.Now()
	n := 1 << 20
	chunk := strings.Repeat("x", 1024)
	for i := 0; i < n; i++ {
		_, err := fmt.Fprintf(w, chunk)
		if err != nil {
			log.Printf("Error writing %v", err)
		}
	}

	size := float64(n) / 1024
	duration := time.Since(start).Seconds()
	log.Printf("Server: %f MBps", size/duration)
}

func main() {
	sv := httptest.NewUnstartedServer(http.HandlerFunc(handler))
	sv.EnableHTTP2 = true
	sv.StartTLS()

	var wg sync.WaitGroup

	tr, ok := sv.Client().Transport.(*http.Transport)
	if !ok {
		panic("transport unsupported")
	}
	tr.DisableKeepAlives = false

	workers := 4
	for i := 1; i <= workers; i++ {
		wg.Add(1)
		i := i
		go func() {
			defer wg.Done()
			start := time.Now()

			resp, err := sv.Client().Get(sv.URL + "/")
			if err != nil {
				log.Fatal(err)
			}
			defer resp.Body.Close()

			data, err := ioutil.ReadAll(resp.Body)
			if err != nil {
				log.Printf("Error reading %v", err)
			}

			size := float64(len(data)) / (1024 * 1024)
			duration := time.Since(start).Seconds()

			log.Printf("Client%d: %f MBps ", i, size/duration)
		}()
	}
	wg.Wait()

}
  • tr.DisableKeepAlives = true
go run server.go 
2021/11/26 11:19:01 Server: 96.358534 MBps
2021/11/26 11:19:01 Client4: 96.338120 MBps 
2021/11/26 11:19:02 Server: 92.666090 MBps
2021/11/26 11:19:02 Client2: 92.651739 MBps 
2021/11/26 11:19:02 Server: 88.723179 MBps
2021/11/26 11:19:02 Client1: 88.693800 MBps 
2021/11/26 11:19:03 Server: 86.221247 MBps
2021/11/26 11:19:03 Client3: 86.193994 MBps 
  • tr.DisableKeepAlives = false
$ go run server.go 
2021/11/26 11:19:27 Server: 63.471122 MBps
2021/11/26 11:19:27 Client4: 63.462420 MBps 
2021/11/26 11:19:29 Server: 58.067915 MBps
2021/11/26 11:19:29 Client3: 58.061386 MBps 
2021/11/26 11:19:35 Server: 43.437425 MBps
2021/11/26 11:19:35 Client1: 43.433545 MBps 
2021/11/26 11:19:38 Server: 38.357418 MBps
2021/11/26 11:19:38 Client2: 38.355121 MBps 

Setting tr.DisableKeepAlives = true creates one different connection per request, if I set to false I have half the throughput,

@Rootax
Copy link

Rootax commented Jul 3, 2022

I hope this bug is not forgotten, it's a pretty important one...

moadz added a commit to moadz/configuration that referenced this issue Sep 1, 2023
* Disabled http2 due to http2 streaming in net/http2 performance issues golang/go#37373
* RCLONE_S3_UPLOAD_CONCURRENCY, RCLONE_S3_CHUNK_SIZE to improve parallelism and download efficiency

Signed-off-by: mzardab <mzardab@redhat.com>
moadz added a commit to rhobs/configuration that referenced this issue Sep 1, 2023
* Disabled http2 due to http2 streaming in net/http2 performance issues golang/go#37373
* RCLONE_S3_UPLOAD_CONCURRENCY, RCLONE_S3_CHUNK_SIZE to improve parallelism and download efficiency

Signed-off-by: mzardab <mzardab@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

7 participants