Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: clients do not use full quota of MaxIdleConnsPerHost when called from multiple goroutines #24822

Closed
ikenchina opened this issue Apr 12, 2018 · 15 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@ikenchina
Copy link

ikenchina commented Apr 12, 2018

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go 1.9.2

Does this issue reproduce with the latest release?

can't reproduce

What operating system and processor architecture are you using (go env)?

centos

scenario

create a new http client and default transport (client's timeout is 5s) in one goroutine to access http2 server every time.
but i have limit the amount of goroutine ( maximum is 100 goroutine ).
sometimes service didn't work, i found one goroutine was blocking others gorutines(99 goroutines).

blocked goroutines

in total, 98 goroutine have been blocked.

goroutine 220791540 [semacquire]: sync.runtime_SemacquireMutex(0xc429e00a14, 0xc427945500) /usr/local/go/src/runtime/sema.go:71 +0x3d sync.(*Mutex).Lock(0xc429e00a10) /usr/local/go/src/sync/mutex.go:134 +0xee net/http.(*http2ClientConn).streamByID(0xc429e009c0, 0x100000889, 0x0) /usr/local/go/src/net/http/h2_bundle.go:7758 +0x43 net/http.(*http2ClientConn).forgetStreamID(0xc429e009c0, 0x889) /usr/local/go/src/net/http/h2_bundle.go:7754 +0x38 net/http.(*http2ClientConn).RoundTrip(0xc429e009c0, 0xc425179200, 0x0, 0x0, 0x0) /usr/local/go/src/net/http/h2_bundle.go:7376 +0xb59 net/http.(*http2Transport).RoundTripOpt(0xc420837220, 0xc425179200, 0xc422d30600, 0xc425f546c0, 0x40000000040b130, 0xffffffffffffffff) /usr/local/go/src/net/http/h2_bundle.go:6858 +0x145 net/http.(*http2Transport).RoundTrip(0xc420837220, 0xc425179200, 0x654201, 0xc422d306f8, 0xc425f548f0) /usr/local/go/src/net/http/h2_bundle.go:6820 +0x3a net/http.http2noDialH2RoundTripper.RoundTrip(0xc420837220, 0xc425179200, 0xc426359f80, 0x5, 0xc420813048) /usr/local/go/src/net/http/h2_bundle.go:990 +0x39 net/http.(*Transport).RoundTrip(0x137f9a0, 0xc425179200, 0x137f9a0, 0xbea9f7ed1c00e6c5, 0x9a938b916ea9) /usr/local/go/src/net/http/transport.go:371 +0xd47 net/http.send(0xc425179100, 0x12f72e0, 0x137f9a0, 0xbea9f7ed1c00e6c5, 0x9a938b916ea9, 0x138d2a0, 0xc4201430d0, 0xbea9f7ed1c00e6c5, 0xc425f54b08, 0x1) /usr/local/go/src/net/http/client.go:249 +0x1a9 net/http.(*Client).send(0xc4208f15c0, 0xc425179100, 0xbea9f7ed1c00e6c5, 0x9a938b916ea9, 0x138d2a0, 0xc4201430d0, 0x0, 0x1, 0xbfb900) /usr/local/go/src/net/http/client.go:173 +0xfd net/http.(*Client).Do(0xc4208f15c0, 0xc425179100, 0x18, 0xc4200166c0, 0x18) /usr/local/go/src/net/http/client.go:602 +0x28d ......

blocking other goroutines

seems that this goroutine blocking other goroutines

goroutine 220791555 [IO wait]: internal/poll.runtime_pollWait(0x7f51f01de058, 0x77, 0x0) /usr/local/go/src/runtime/netpoll.go:173 +0x57 internal/poll.(*pollDesc).wait(0xc42545c098, 0x77, 0xffffffffffffff00, 0x12fa5e0, 0x12f46f0) /usr/local/go/src/internal/poll/fd_poll_runtime.go:85 +0xae internal/poll.(*pollDesc).waitWrite(0xc42545c098, 0xc42264b800, 0x2f, 0x400) /usr/local/go/src/internal/poll/fd_poll_runtime.go:94 +0x3d internal/poll.(*FD).Write(0xc42545c080, 0xc42264b800, 0x2f, 0x400, 0x0, 0x0, 0x0) /usr/local/go/src/internal/poll/fd_unix.go:227 +0x244 net.(*netFD).Write(0xc42545c080, 0xc42264b800, 0x2f, 0x400, 0xc427942f20, 0x3, 0xa022617980) /usr/local/go/src/net/fd_unix.go:220 +0x52 net.(*conn).Write(0xc426176e38, 0xc42264b800, 0x2f, 0x400, 0x0, 0x0, 0x0) /usr/local/go/src/net/net.go:188 +0x6d crypto/tls.(*Conn).write(0xc422b99500, 0xc42264b800, 0x2f, 0x400, 0xc421d39ac0, 0xc427942f01, 0x613054) /usr/local/go/src/crypto/tls/conn.go:832 +0x1a7 crypto/tls.(*Conn).writeRecordLocked(0xc422b99500, 0xdad617, 0xc426fd0000, 0x12, 0x1000, 0x0, 0x0, 0x0) /usr/local/go/src/crypto/tls/conn.go:909 +0x360 crypto/tls.(*Conn).Write(0xc422b99500, 0xc426fd0000, 0x12, 0x1000, 0x0, 0x0, 0x0) /usr/local/go/src/crypto/tls/conn.go:1068 +0x1bb net/http.http2stickyErrWriter.Write(0x7f51f45266d0, 0xc422b99500, 0xc429e00b48, 0xc426fd0000, 0x12, 0x1000, 0xccc240, 0x7f51f455f8b0, 0x80000000000) /usr/local/go/src/net/http/h2_bundle.go:6803 +0x7a net/http.(*http2stickyErrWriter).Write(0xc42273b280, 0xc426fd0000, 0x12, 0x1000, 0x12, 0x412577, 0xc426150840) <autogenerated>:1 +0x74 bufio.(*Writer).Flush(0xc4241c3d80, 0x895, 0xc42cca8c00) /usr/local/go/src/bufio/bufio.go:567 +0x7e net/http.(*http2ClientConn).writeHeaders(0xc429e009c0, 0x895, 0xc42cca8c09, 0x0, 0xb6, 0x0, 0x0) /usr/local/go/src/net/http/h2_bundle.go:7436 +0x1ec net/http.(*http2ClientConn).RoundTrip(0xc429e009c0, 0xc4239b1200, 0x0, 0x0, 0x0) /usr/local/go/src/net/http/h2_bundle.go:7289 +0x46b net/http.(*http2Transport).RoundTripOpt(0xc420837220, 0xc4239b1200, 0xc424c5fb00, 0xc4241396c0, 0x60000000040b130, 0xffffffffffffffff) /usr/local/go/src/net/http/h2_bundle.go:6858 +0x145 net/http.(*http2Transport).RoundTrip(0xc420837220, 0xc4239b1200, 0x654201, 0xc424c5fb98, 0xc4241398f0) /usr/local/go/src/net/http/h2_bundle.go:6820 +0x3a net/http.http2noDialH2RoundTripper.RoundTrip(0xc420837220, 0xc4239b1200, 0xc422c7e030, 0x5, 0xc420813048) /usr/local/go/src/net/http/h2_bundle.go:990 +0x39 net/http.(*Transport).RoundTrip(0x137f9a0, 0xc4239b1200, 0x137f9a0, 0xbea9f7ed1c77e1fd, 0x9a938c0869e0) /usr/local/go/src/net/http/transport.go:371 +0xd47 net/http.send(0xc4239b1100, 0x12f72e0, 0x137f9a0, 0xbea9f7ed1c77e1fd, 0x9a938c0869e0, 0x138d2a0, 0xc42062af70, 0xbea9f7ed1c77e1fd, 0xc424139b08, 0x1) /usr/local/go/src/net/http/client.go:249 +0x1a9 net/http.(*Client).send(0xc4208f15c0, 0xc4239b1100, 0xbea9f7ed1c77e1fd, 0x9a938c0869e0, 0x138d2a0, 0xc42062af70, 0x0, 0x1, 0xbfb900) /usr/local/go/src/net/http/client.go:173 +0xfd net/http.(*Client).Do(0xc4208f15c0, 0xc4239b1100, 0x18, 0xc4200166c0, 0x18) /usr/local/go/src/net/http/client.go:602 +0x28d .......

@andybons
Copy link
Member

You say you can't reproduce this in the latest release. Perhaps it's been fixed since 1.9.2?

@andybons andybons added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Apr 12, 2018
@andybons andybons added this to the Unplanned milestone Apr 12, 2018
@ikenchina
Copy link
Author

no, sorry, i haven't explained clearly.
i mean i can't reproduce this problem.

@andybons
Copy link
Member

Since you’re unable to reproduce this, closing for now. If it pops up again, please feel free to reopen.

@cespare
Copy link
Contributor

cespare commented Apr 13, 2018

(@andybons since non-members can't reopen issues they didn't close themselves, @ikenchina would have to either comment here asking someone to reopen it or else just open a new issue.)

@ikenchina
Copy link
Author

@andybons please reopen it.
this problem occurred sometimes, and i think it was caused by http2 client of golang

@agnivade
Copy link
Contributor

@ikenchina - I would request you to try and repro with the latest 1.10.1 release. And if possible provide a complete working version of the code that causes the issue. Without proper repro steps and a working code sample, it is very hard to even start debugging.

And IIRC, we don't support centos officially, but I might be mistaken.

@ikenchina
Copy link
Author

ikenchina commented Apr 15, 2018

@agnivade here is the code, i always new a client to send a request, but all clients are using same transport, and its MaxIdleConnsPerHost is 100, but i saw there were 2 or 3 connections in production server (QPS is 200 - 300).


var (
	defaultTransport *http.Transport
)


func init() {
	defaultTransport = &http.Transport{
		Proxy: http.ProxyFromEnvironment,
		DialContext: (&net.Dialer{
			Timeout:   5 * time.Second,
			KeepAlive: 300 * time.Second,
			DualStack: true,
		}).DialContext,
		MaxIdleConns:          150,
		IdleConnTimeout:       90 * time.Second,
		TLSHandshakeTimeout:   5 * time.Second,
		ExpectContinueTimeout: 1 * time.Second,
	}
	defaultTransport.MaxIdleConnsPerHost = 100
}

func NewClient(api, key, secret string, timeout time.Duration) *Client {

	var netClient = &http.Client{
		Timeout:   timeout,
		Transport: defaultTransport,
	}

	cc := &Client{
		urlBase:    api,
		appKey:     key,
		appSecret:  secret,
		httpClient: netClient,
	}

	return cc
}



func (self *Client) Post(bs []byte) error {
	body := &bytes.Buffer{}
	body.Write(bs)

	req, err := http.NewRequest("POST", self.urlBase, body)
	if err != nil {
		return err
	}

	req.SetBasicAuth(self.appKey, self.appSecret)
	resp, err := self.httpClient.Do(req)
	if err != nil {
		return err
	}
	defer resp.Body.Close()

	bs, err = ioutil.ReadAll(resp.Body)
	if err != nil {
		return err
	}

    log.Println(string(bs))
	
	return nil
}



func main() {


    dataChan := make(chan []byte, 600)
    go func() {
        for {
            dataChan <- []byte("test")
        }
    }()


    concurrency := make(chan struct{}, 500)
    for {
        
        concurrency <- struct{}{}
        
        go func() {
            data := <- dataChan
            client := NewClient("", "", "", 5*time.Second)   
            client.Post(data)

            <- concurrency
        }()
    }
}


@agnivade
Copy link
Contributor

Your code is not a "complete working version". But anyways, why are you creating a new client for every request ? According to the docs, a single client is perfectly capable of being called from multiple goroutines and it uses underlying TCP connections.

The Client's Transport typically has internal state (cached TCP connections), so Clients should be reused instead of created as needed. Clients are safe for concurrent use by multiple goroutines.

So IIUC, your issue is that you are using multiple clients with the same transport having MaxIdleConnsPerHost set to 100. But you are seeing only 2 to 3 connections on the server.

Further questions -

  1. Is your server a Go http2 server ? Please post a working version of the server code. Ideally, attach 2 files, one client and one server. And mention instructions of how to repro the issue.
  2. How are you checking connections on your server ?

I am pretty sure this is not a Go bug, but I will ping @bradfitz to rule out any silly mistakes. Reopening.

@agnivade agnivade reopened this Apr 15, 2018
@agnivade agnivade added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. and removed WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. labels Apr 15, 2018
@agnivade agnivade changed the title http2 client : dead lock while flush data to peer net/http: clients do not use full quota of MaxIdleConnsPerHost when called from multiple goroutines Apr 15, 2018
@ikenchina
Copy link
Author

server is http2, it's a provider of push, i haven't their code.
i have improved the code as your said( using one client) , but problem still exist ( might be dead lock ).

@ikenchina
Copy link
Author

@agnivade
Please see net/http/h2_bundle.go:737 (go 1.9.2) , here will loop all connections and find which one is idle.
But, cc.CanTakeNewRequest require http2ClientConn's lock, and if one of them (http2ClientConn) has been blocked (h2_bundle.go:7289 and h2_bundle.go:7436), then will dead lock.
you can see the stacks

@bradfitz
Copy link
Contributor

By the design of the http2 package, no I/O should be happening while any of those mutexes are held, so I don't think the two goroutine stacks you cite are related. Or, why do you think they're related?

Generally, debugging mutex contention can't be done just by looking at stack traces, as the stacks don't say who had acquired which mutexes.

I strongly suspect this is a bug in your own code. If you have a self-contained repro showing the problem we could take a look, but otherwise there's not enough information in this bug to help out more. Sorry.

@ikenchina
Copy link
Author

i have posted the snippet,


var (
	defaultTransport *http.Transport
)


func init() {
	defaultTransport = &http.Transport{
		Proxy: http.ProxyFromEnvironment,
		DialContext: (&net.Dialer{
			Timeout:   5 * time.Second,
			KeepAlive: 300 * time.Second,
			DualStack: true,
		}).DialContext,
		MaxIdleConns:          150,
		IdleConnTimeout:       90 * time.Second,
		TLSHandshakeTimeout:   5 * time.Second,
		ExpectContinueTimeout: 1 * time.Second,
	}
	defaultTransport.MaxIdleConnsPerHost = 100
}

func NewClient(api, key, secret string, timeout time.Duration) *Client {

	var netClient = &http.Client{
		Timeout:   timeout,
		Transport: defaultTransport,
	}

	cc := &Client{
		urlBase:    api,
		appKey:     key,
		appSecret:  secret,
		httpClient: netClient,
	}

	return cc
}



func (self *Client) Post(bs []byte) error {
	body := &bytes.Buffer{}
	body.Write(bs)

	req, err := http.NewRequest("POST", self.urlBase, body)
	if err != nil {
		return err
	}

	req.SetBasicAuth(self.appKey, self.appSecret)
	resp, err := self.httpClient.Do(req)
	if err != nil {
		return err
	}
	defer resp.Body.Close()

	bs, err = ioutil.ReadAll(resp.Body)
	if err != nil {
		return err
	}

    log.Println(string(bs))
	
	return nil
}



func main() {


    dataChan := make(chan []byte, 600)
    go func() {
        for {
            dataChan <- []byte("test")
        }
    }()

    client := NewClient("", "", "", 5*time.Second)   
    concurrency := make(chan struct{}, 500)
    for {
        
        concurrency <- struct{}{}
        
        go func() {
            data := <- dataChan
            
            client.Post(data)

            <- concurrency
        }()
    }
}


@bradfitz
Copy link
Contributor

@ikenchina, I don't want a snippet. I'd want to see a complete program (client & server) that actually compiles. Your snippet above doesn't even include, say, the type Client.

@ikenchina
Copy link
Author

ikenchina commented Apr 17, 2018

first of all, i haven't server's code.

why i thought it was dead lock, because there were :

  1. many goroutines have been blocked for a long time(seems flush goroutine acquired lock) and didn't resume
  2. i have tcpdump but there was no data over that tcp connection
  3. it worked fine after i restart service.

@ikenchina
Copy link
Author

and today, problem occurred again:
tcp connection still exists, but there was no data over it.

this goroutine acquired the lock

1 @ 0x42f36c 0x42a47a 0x429a77 0x4959de 0x495acd 0x4973c4 0x4f6002 0x5084ed 0x610a67 0x610f80 0x611e9b 0x6977ea 0x6e1714 0x5b594e 0x6a31c3 0x69ba46 0x697d15 0x6978da 0x67c1b9 0x6c5277 0x6748e9 0x67457d 0x675c1d 0xb3a0fb 0xb3b845 0xb3b3d2 0xb54007 0x45e701
#	0x429a76	internal/poll.runtime_pollWait+0x56						/usr/local/go/src/runtime/netpoll.go:173
#	0x4959dd	internal/poll.(*pollDesc).wait+0xad						/usr/local/go/src/internal/poll/fd_poll_runtime.go:85
#	0x495acc	internal/poll.(*pollDesc).waitWrite+0x3c					/usr/local/go/src/internal/poll/fd_poll_runtime.go:94
#	0x4973c3	internal/poll.(*FD).Write+0x243							/usr/local/go/src/internal/poll/fd_unix.go:227
#	0x4f6001	net.(*netFD).Write+0x51								/usr/local/go/src/net/fd_unix.go:220
#	0x5084ec	net.(*conn).Write+0x6c								/usr/local/go/src/net/net.go:188
#	0x610a66	crypto/tls.(*Conn).write+0x1a6							/usr/local/go/src/crypto/tls/conn.go:832
#	0x610f7f	crypto/tls.(*Conn).writeRecordLocked+0x35f					/usr/local/go/src/crypto/tls/conn.go:909
#	0x611e9a	crypto/tls.(*Conn).Write+0x1ba							/usr/local/go/src/crypto/tls/conn.go:1068
#	0x6977e9	net/http.http2stickyErrWriter.Write+0x79					/usr/local/go/src/net/http/h2_bundle.go:6803
#	0x6e1713	net/http.(*http2stickyErrWriter).Write+0x73					<autogenerated>:1
#	0x5b594d	bufio.(*Writer).Flush+0x7d							/usr/local/go/src/bufio/bufio.go:567
#	0x6a31c2	net/http.(*http2ClientConn).writeStreamReset+0x72				/usr/local/go/src/net/http/h2_bundle.go:8498
#	0x69ba45	net/http.(*http2ClientConn).RoundTrip+0xba5					/usr/local/go/src/net/http/h2_bundle.go:7378
#	0x697d14	net/http.(*http2Transport).RoundTripOpt+0x144					/usr/local/go/src/net/http/h2_bundle.go:6858
#	0x6978d9	net/http.(*http2Transport).RoundTrip+0x39					/usr/local/go/src/net/http/h2_bundle.go:6820
#	0x67c1b8	net/http.http2noDialH2RoundTripper.RoundTrip+0x38				/usr/local/go/src/net/http/h2_bundle.go:990
#	0x6c5276	net/http.(*Transport).RoundTrip+0xd46						/usr/local/go/src/net/http/transport.go:371
#	0x6748e8	net/http.send+0x1a8								/usr/local/go/src/net/http/client.go:249
#	0x67457c	net/http.(*Client).send+0xfc							/usr/local/go/src/net/http/client.go:173
#	0x675c1c	net/http.(*Client).Do+0x28c							/usr/local/go/src/net/http/client.go:602

and these goroutines have been blocked


98 @ 0x42f36c 0x42f45e 0x440844 0x44055d 0x47581e 0x69e8f3 0x69e898 0x69b9f9 0x697d15 0x6978da 0x67c1b9 0x6c5277 0x6748e9 0x67457d 0x675c1d 0xb3a0fb 0xb3b845 0xb3b3d2 0xb54007 0x45e701
#	0x44055c	sync.runtime_SemacquireMutex+0x3c						/usr/local/go/src/runtime/sema.go:71
#	0x47581d	sync.(*Mutex).Lock+0xed								/usr/local/go/src/sync/mutex.go:134
#	0x69e8f2	net/http.(*http2ClientConn).streamByID+0x42					/usr/local/go/src/net/http/h2_bundle.go:7758
#	0x69e897	net/http.(*http2ClientConn).forgetStreamID+0x37					/usr/local/go/src/net/http/h2_bundle.go:7754
#	0x69b9f8	net/http.(*http2ClientConn).RoundTrip+0xb58					/usr/local/go/src/net/http/h2_bundle.go:7376
#	0x697d14	net/http.(*http2Transport).RoundTripOpt+0x144					/usr/local/go/src/net/http/h2_bundle.go:6858
#	0x6978d9	net/http.(*http2Transport).RoundTrip+0x39					/usr/local/go/src/net/http/h2_bundle.go:6820
#	0x67c1b8	net/http.http2noDialH2RoundTripper.RoundTrip+0x38				/usr/local/go/src/net/http/h2_bundle.go:990
#	0x6c5276	net/http.(*Transport).RoundTrip+0xd46						/usr/local/go/src/net/http/transport.go:371
#	0x6748e8	net/http.send+0x1a8								/usr/local/go/src/net/http/client.go:249
#	0x67457c	net/http.(*Client).send+0xfc							/usr/local/go/src/net/http/client.go:173
#	0x675c1c	net/http.(*Client).Do+0x28c							/usr/local/go/src/net/http/client.go:602

@golang golang locked and limited conversation to collaborators Apr 17, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

6 participants