Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: potential client transport lock contention on high CPU core count machines #41238

Open
Vascko opened this issue Sep 5, 2020 · 1 comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@Vascko
Copy link

Vascko commented Sep 5, 2020

What version of Go are you using (go version)?

$ go version
go version go1.15.1 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

OS: Fedora/YUM based OS (RHEL, CentOS, OEL, etc)
Architecture: amd64
Kernel: Reproduced on anything from 4.1 to 5.8

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOENV="/root/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/root/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/root/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/root/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/root/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build513552128=/tmp/go-build -gno-record-gcc-switches"

What did you do?

While writing an extremely basic HTTP load tool, I noticed some funky behavior when re-using the *http.Client (and it's embedded *http.Transport).
The tool prepares a *http.Client and a *http.Request, spins up multiple goroutines that clone the *http.Request (as *http.Requests aren't supposed to be used in this concurrent fassion) and uses the *http.Client (which is re-usable concurrently) to send as many requests per second as possible at a server of the user's choice.

Here is a hacky, simplified and shortened but functioning example that tries to showcase my observation. It sends requests to a URL specified by the command line and goes through 2 passes of 30sec each where the first pass is using a shared client/transport and the second pass is using a new client/transport for each worker.

Code Example
package main

import (
"context"
"crypto/tls"
"flag"
"fmt"
"io"
"io/ioutil"
"net/http"
"sync"
"time"
)

func fire(ctx context.Context, wg *sync.WaitGroup, r *http.Request, c *http.Client, chGo chan struct{}, chResult chan bool) {
defer wg.Done()
<-chGo

for {
	select {
	case <-ctx.Done():
		return
	default:
		chResult <- true

		resp, err := c.Do(r)
		if err != nil {
			chResult <- false
			continue
		}

		_, err = io.Copy(ioutil.Discard, resp.Body)
		if err != nil {
			chResult <- false
		}
		resp.Body.Close()
	}
}

}

func main() {
flagURL := flag.String("url", "", "URL for the request")
flag.Parse()

if *flagURL == "" {
	panic(fmt.Errorf("URL can not be empty"))
}

req, err := http.NewRequest("GET", *flagURL, nil)
if err != nil {
	panic(err)
}

tp := http.DefaultTransport.(*http.Transport).Clone()
// In the case of HTTPS, disable verification for testing purposes
tp.TLSClientConfig = &tls.Config{InsecureSkipVerify: true}

// Bump up connection limits
tp.MaxConnsPerHost = 200
tp.MaxIdleConns = 200
tp.MaxIdleConnsPerHost = 200

// Disable HTTP/2
tp.TLSNextProto = make(map[string]func(string, *tls.Conn) http.RoundTripper)

// First pass is with a shared transport
hClt := &http.Client{Transport: tp}

chGo := make(chan struct{}, 0)
chResult := make(chan bool, 200)

ctx, cancel := context.WithCancel(context.Background())
d := time.Duration(30 * time.Second)

var wgFire sync.WaitGroup
for i:=1; i<=200; i++ {
	wgFire.Add(1)

	// Need to clone request
	//
	// See https://github.com/golang/go/issues/21780
	go fire(ctx, &wgFire, req.Clone(context.Background()), hClt, chGo, chResult)
}

var wgStat sync.WaitGroup
wgStat.Add(1)
go func() {
	defer wgStat.Done()
	var reqs uint64
	var errs uint64

	for {
		v, more := <-chResult
		if !more {
			break
		}
		switch v {
		case true:
			reqs++
		case false:
			errs++
		}
	}
	fmt.Println("\nWorker sharing HTTP transport")
	fmt.Printf("Duration: %v\nRequests: %v\nRequests per second: %v\nErrors: %v\n\n", d, reqs, reqs / uint64(d.Seconds()), errs)
}()

close(chGo)
time.Sleep(d)
cancel()
wgFire.Wait()
close(chResult)
wgStat.Wait()

// Let's go again but with non-shared transports
chGo = make(chan struct{}, 0)
chResult = make(chan bool, 200)

ctx, cancel = context.WithCancel(context.Background())

for i:=1; i<=200; i++ {
	wgFire.Add(1)

	// Here each worker gets their own transport
	hClt := &http.Client{Transport: tp.Clone()}

	// Need to clone request
	//
	// See https://github.com/golang/go/issues/21780
	go fire(ctx, &wgFire, req.Clone(context.Background()), hClt, chGo, chResult)
}

wgStat.Add(1)
go func() {
	defer wgStat.Done()
	var reqs uint64
	var errs uint64

	for {
		v, more := <-chResult
		if !more {
			break
		}
		switch v {
		case true:
			reqs++
		case false:
			errs++
		}
	}
	fmt.Println("Each worker getting a HTTP transport copy")
	fmt.Printf("Duration: %v\nRequests: %v\nRequests per second: %v\nErrors: %v\n", d, reqs, reqs / uint64(d.Seconds()), errs)
}()

close(chGo)
time.Sleep(d)
cancel()
wgFire.Wait()
close(chResult)
wgStat.Wait()

}

What did you expect to see?

The same RPS throughput for both scenarios in any environment

What did you see instead?

When running this on my local DEV machine (Intel(R) Core(TM) i5-8259U CPU @ 2.30GHz) against a server (Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz), I get the expected results:

$ ./http_perf_compare -url http://perf.domain.local

Worker sharing HTTP transport
Duration: 30s
Requests: 1640557
Requests per second: 54685
Errors: 0

Each worker getting a HTTP transport copy
Duration: 30s
Requests: 1553561
Requests per second: 51785
Errors: 0

Both approaches result in a similar RPS.

However, when I run the same code on a client that is the same as the server (Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz), I see a ~90% increase of RPS throughput when using a separate client/transport for each worker goroutine:

$ ./http_perf_compare -url http://perf.domain.local

Worker sharing HTTP transport
Duration: 30s
Requests: 2292353
Requests per second: 76411
Errors: 0

Each worker getting a HTTP transport copy
Duration: 30s
Requests: 4442219
Requests per second: 148073
Errors: 0

This happens independent of plain HTTP or HTTP-over-TLS (HTTPS).

Is this expected?

Thank you in advance for the time anyone takes to look into this.

@mdlayher mdlayher changed the title Potential HTTP client transport lock contention on high CPU core count machines net/http: potential client transport lock contention on high CPU core count machines Sep 5, 2020
@andybons
Copy link
Member

andybons commented Sep 8, 2020

@fraenkel @bcmills @rsc @neild

@andybons andybons added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Sep 8, 2020
@andybons andybons added this to the Unplanned milestone Sep 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
None yet
Development

No branches or pull requests

3 participants