Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/http2: implementation issue #48111

Open
nekomeowww opened this issue Sep 1, 2021 · 3 comments
Open

x/net/http2: implementation issue #48111

nekomeowww opened this issue Sep 1, 2021 · 3 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@nekomeowww
Copy link

What version of Go are you using (go version)?

$ go version
go version go1.16.6 darwin/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE="on"
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/neko/Library/Caches/go-build"
GOENV="/Users/neko/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/neko/golang/pkg/mod"
GONOPROXY="github.com/nekomoewww/harmony"
GONOSUMDB="github.com/nekomoewww/harmony"
GOOS="darwin"
GOPATH="/Users/neko/golang"
GOPRIVATE="github.com/nekomoewww/harmony"
GOPROXY="https://goproxy.cn,https://gocenter.io,https://goproxy.io,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GOVCS=""
GOVERSION="go1.16.6"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Users/neko/golang/work/missevan-go/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch x86_64 -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/m0/k_38ftb53yg0mqbcrrjypr3m0000gn/T/go-build4128625603=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

Used default http2.Transport{} and set to a http.Client.Transport, and established many connections to the http2 server.

What did you expect to see?

There are three different problems we are expecting to improve:

  1. We shouldn't establish more connections when it reaches a limitation. http2.Transport needs to export a MaxConcurrentStreams field to achieve this
  2. If one connection pipeline encountered an error, we should break all the connections in the same pool and try to start over again. http.Client or http2.Transport needs to export some error indication fields to achieve this

Base on all the above, can we expose the

What did you see instead?

  1. We cannot retrieve connection when timeout, it will result in an error which we cannot control at all. (Example error: context deadline exceeded (Client.Timeout exceeded while awaiting headers))
  2. There are no exported fields to adjust connection limitations
  3. If one connection pipeline encountered an error, it hangs over and we lost control of it
@nekomeowww
Copy link
Author

I have mentioned one of the problems in #39389 along with a review message inside its Pull Request golang/net#73.
Those all should reference to https://go-review.googlesource.com/c/net/+/236497/

@nekomeowww nekomeowww changed the title Problems in implementation of golang.org/x/net/http2 x/net/http2: implementation issue Sep 1, 2021
@cherrymui
Copy link
Member

@nekomeowww could you share the source code of an example for this issue? Also could you reword and re-title the issue to remove the duplicates from #39389 so it is clearer what the new issue is? Thanks.

cc @neild

@cherrymui cherrymui added WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Sep 1, 2021
@cherrymui cherrymui added this to the Unreleased milestone Sep 1, 2021
@nekomeowww
Copy link
Author

nekomeowww commented Sep 3, 2021

Source code to reproduce:

func ServeTLS(certPath, certKeyPath string, r *gin.Engine, port int, h ...GinHandler) string {
	for i := range h {
		h[i].Mount(r)
	}

	l, err := net.Listen("tcp", net.JoinHostPort("127.0.0.1", strconv.Itoa(port)))
	if err != nil {
		panic(err)
	}

	go func() {
		err := http.ServeTLS(l, r, certPath, certKeyPath)
		if err != nil {
			panic(err)
		}
	}()
	time.Sleep(100 * time.Millisecond)
	return l.Addr().String()
}

func TestTimeout(t *testing.T) {
	assert := assert.New(t)
	require := require.New(t)

	caCertPath := "../testdata/ca.crt"
	certPath := "../testdata/server.crt"
	certKeyPath := "../testdata/server.pem"

	err := generateCerts()
	require.NoError(err)

	defer func() {
		err = os.Remove(caCertPath)
		assert.NoError(err)
		err = os.Remove(certPath)
		assert.NoError(err)
		err = os.Remove(certKeyPath)
		assert.NoError(err)
	}()

	r := gin.New()
	h := timeoutHandler{}
	addr := ServeTLS(certPath, certKeyPath, r, 0, &h)

	c := &http.Client{
		Timeout: time.Second,
	}

	c.Transport = http2.Transport{}
	maxTryout := 100000
	var wg sync.WaitGroup
	errorCount := 0
	for i := 0; i < maxTryout; i++ {
		wg.Add(1)
		go func(idx int) {
			defer wg.Done()
			uri := fmt.Sprintf("https://%s/normal/%d?timeout=1", addr, idx)
			req, err := http.NewRequest(http.MethodGet, uri, nil)
			require.NoError(err)
			_, _ = c.Do(req)
		}(i)
		wg.Add(1)
		go func(idx int) {
			defer wg.Done()
			uri := fmt.Sprintf("https://%s/normal/%d?timeout=0", addr, idx)
			req, err := http.NewRequest(http.MethodGet, uri, nil)
			require.NoError(err)
			_, err = c.Do(req)
			if err != nil {
				errorCount++
			}
		}(i)
	}
	Debugf("current error count: %d", errorCount)
	for i := 0; i < 100; i++ {
		uri := fmt.Sprintf("https://%s/normal/%d?timeout=0", addr, i)
		req, err := http.NewRequest(http.MethodGet, uri, nil)
		require.NoError(err)
		_, err = c.Do(req)
		if err != nil {
			errorCount++
		}
	}
	wg.Wait()
	Debugf("ended, report: %d/%d, errored percentage: %d%%", errorCount, maxTryout, (errorCount/maxTryout)*100)
}

There were some chang later we researched the source code, net/http2 provides a new method called http2.ConfigureTransports to auto configure the instance from http.Transport. We did replaced the code:

c.Transport = http2.Transport{} // old

c.Transport, err = http2.ConfigureTransports(&http.Transport{
	TLSClientConfig: &tls.Config{
		InsecureSkipVerify: true,
	},
}) // new

By using the test code above, we can find out many problems:

  1. The client cannot find out the socket hangs up and terminate it by itself.
  2. The client can establish infinite connections, this can be improved by http2: limit client initial MAX_CONCURRENT_STREAMS net#73 , this isn't enough though, I think we should export a field to configurate the value programmically
  3. The client should be able to be aware of the connection time by itself and terminate those connections established a long time ago.

Well, we could solve all these problems by overriding the implementation of http2.Transport{}.RoundTrip(), however, the implementation was missing.

We shouldn't establish more connections when it reaches a limitation. http2.Transport needs to export a MaxConcurrentStreams field to achieve this
If one connection pipeline encountered an error, we should break all the connections in the same pool and try to start over again. HTTP.Client or http2.Transport needs to export some error indication fields to achieve this

I just mentioned it above.

And also, most of the implementation of http2 seems to consider the usage environment in an End User to Service way, not the Service to Service way (Such as gRPC).
To summarize this, let's consider a case where the servers of microservice would participate in:

  1. Both the client and server have the ability to serve high concurrency requests.
  2. The client we mentioned above will provide other connections to end-user clients (maybe millions).

In this case, we need more control of the client (microservice), dynamically change the limitation of the client can be established at different time aspect by referencing the bandwidth which user and services consume. To make sure the calling stacks are stable, we will also need the ability to control the pipeline, terminate the long and dead connections for better performance.

@cherrymui cherrymui removed the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Sep 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

2 participants