Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: using Transport.IdleConnTimeout may lead to goroutine leak in an edge case #25621

Closed
aryszka opened this issue May 29, 2018 · 9 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@aryszka
Copy link

aryszka commented May 29, 2018

What version of Go are you using (go version)?

1.10.2

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

GOOS="linux"
GOARCH="amd64"

What did you do?

I ran an HTTP short-polling client and the polling timeout value was equal to the IdleConnTimeout of the used http.Transport.

package main

import (
	"fmt"
	"io/ioutil"
	"log"
	"net/http"
	"runtime"
	"time"
)

func main() {
	// IdleConnTimeout close to the polling timeout
	pollTO := 30 * time.Millisecond
	connTO := pollTO

	// avoid the leak by not setting IdleConnTimeout:
	// connTO = 0
	// or by using a value different from the poll timeout:
	// connTO = pollTO + time.Millisecond

	// start a trivial server, it can be in a separate process:
	go func() {
		if err := http.ListenAndServe(":8080", http.HandlerFunc(
			func(w http.ResponseWriter, _ *http.Request) {
				w.Write([]byte("Hello, world!"))
			},
		)); err != nil {
			log.Fatalln(err)
		}
	}()

	// create a client with the IdleConnTimeout configured:
	c := &http.Client{
		Transport: &http.Transport{
			IdleConnTimeout: connTO,
		},
	}

	// run periodic requests, count the requests, goroutines and the heap in use:
	var (
		m runtime.MemStats
		reqs int
	)

	for {
		func() {
			req, err := http.NewRequest("GET", "http://localhost:8080", nil)
			if err != nil {
				log.Fatalln(err)
			}

			rsp, err := c.Do(req)
			if err != nil {
				log.Fatalln(err)
			}

			defer rsp.Body.Close()
			if _, err := ioutil.ReadAll(rsp.Body); err != nil {
				log.Fatalln(err)
			}

			reqs++
			runtime.ReadMemStats(&m)
			fmt.Print("\r",
				time.Now().Format("15:04:05"),
				", requests: ", reqs,
				", goroutines: ", runtime.NumGoroutine(),
				", heap in use: ", m.HeapInuse,
			)

			time.Sleep(pollTO)
		}()
	}
}

What did you expect to see?

I expected the existing connection from the pool either reused or not, without other issues (even if having the same timeouts for the short-polling and the IdleConnTimeout is quite an edge case).

What did you see instead?

I saw a steadily growing number of goroutines and memory usage.

@ianlancetaylor ianlancetaylor added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label May 29, 2018
@ianlancetaylor ianlancetaylor added this to the Go1.11 milestone May 29, 2018
@ianlancetaylor
Copy link
Contributor

CC @bradfitz

@bradfitz
Copy link
Contributor

A belated thanks for the fun bug report. Sorry for missing this earlier.

I'm able to reproduce it and have started debugging. It's not immediately obvious what's happening, but I'll keep looking.

@fraenkel
Copy link
Contributor

One problem is the rsp.Body.Close() shouldn't be deferred since we don't ever exit the block. That doesn't fix the underlying issue.

@bradfitz
Copy link
Contributor

@fraenkel, no that's not the issue. It is in a func scope above, so the defer does work. But the bug also repros with an immediate Body.Close before the sleep.

I've found that it's related to the Transport's idle connection management, but then I had to do something else. I'll look at it again today.

@bradfitz
Copy link
Contributor

Each increase of 3 goroutines (the Transport persistConn's readLoop+writeLoop + the server side handling it) happens right after each time this "not idle" case sometimes fires:

func (pc *persistConn) closeConnIfStillIdle() {
// ....
        if _, ok := t.idleLRU.m[pc]; !ok {
                // Not idle.                                                                                                                                                 
                return
        }

So whatever is removing it from that map isn't also closing the conn, which would break the readLoop's Peek which would make that goroutine return, closing the writeLoop, and the server goroutine would end when the conn closed also.

@fraenkel
Copy link
Contributor

The cases which fail always are connections pulled from getIdleConn.

@fraenkel
Copy link
Contributor

So from what I see, we call getIdleConn, and then see there is an idleTimer associated with it so its removed from the LRU but not returned. Then the closeConnIfStillIdle is invoked and decides its not idle, therefore we lose the pconn and its never closed.

@bradfitz
Copy link
Contributor

Yeah, that check just seems wrong. Sending a CL with explanation.

@gopherbot
Copy link

Change https://golang.org/cl/123315 mentions this issue: net/http: fix rare Transport leak, remove incorrect defensive logic

@golang golang locked and limited conversation to collaborators Jul 11, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

5 participants