Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: graceful shutdown race condition #36819

Closed
zerkms opened this issue Jan 28, 2020 · 8 comments
Closed

net/http: graceful shutdown race condition #36819

zerkms opened this issue Jan 28, 2020 · 8 comments
Assignees
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@zerkms
Copy link

zerkms commented Jan 28, 2020

Update: I have found a super trivial way to reproduce it, see the step at the very bottom

Preamble: I think it's relevant to #23829

What version of Go are you using (go version)?

$ go _version_
go version go1.13.6 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

GOHOSTARCH="amd64"
GOHOSTOS="linux"

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/ivan/.cache/go-build"
GOENV="/home/ivan/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/srv/work/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/ivan/.local/share/umake/go/go-lang"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/ivan/.local/share/umake/go/go-lang/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build903857222=/tmp/go-build -gno-record-gcc-switches"

What did you do?

package main

import (
	"context"
	"fmt"
	"net/http"
	"os"
	"os/signal"
	"syscall"
)

func main() {
	sig := make(chan os.Signal, 1)
	signal.Notify(sig, syscall.SIGTERM, os.Interrupt)

	mux := http.NewServeMux()
	mux.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {})

	srv := &http.Server{
		Addr:    ":8080",
		Handler: mux,
	}

	go func() {
		fmt.Println(srv.ListenAndServe())
	}()

	<-sig
	fmt.Println(srv.Shutdown(context.Background()))
}

With the following code it's possible to get a case when the new connection in accepted, then closed within milliseconds, which makes the shutdown not graceful

What did you expect to see?

A graceful shutdown

What did you see instead?

Screenshot from 2020-01-28 11-29-15

Screenshot from 2020-01-28 13-04-16

How to reproduce:

Just ab -t 20 -c 5 -A foo:foo http://localhost:8080/ is enough to capture the case. The former is much more rare, the latter is quite frequent.

What you can see on screenshots: the connection is accepted, then as per https://go-review.googlesource.com/c/go/+/121419/ it should be treated as "new" (but it's not due to race), and is closed.
In first case the client had chance to send a request.
In second - it was closed even before that.

In both cases it happens with the Connection reset by peer on the client side, which should never be the case with non-keepalived connections and graceful termination.

Both pcap files are available at: https://www.dropbox.com/s/2co7wkucfey90zz/bug.tgz?dl=0

My assumption on why it's the race and how it happens:

  1. the connection is accepted with rw, e := l.Accept() successfully
  2. A moment later a shutdown request is made
  3. Before the c.setState(c.rwc, StateNew) statement is reached the server completes shutdown
  4. Because the connection is not tracked - http.Server is not aware of it
  5. The process quits, network sockets are closed by kernel

UPDATE:

It can be easier reproducible by adding time.Sleep(3 * time.Second) right before c := srv.newConn(rw)

Steps:

  1. Patch the http/server.go with time.Sleep(3 * time.Second) before c := srv.newConn(rw)
  2. Start the app
  3. Make a request
  4. Send ctrl-c

Expected: the request should complete
Actual:

➜ curl -v --user foo:foo http://localhost:8080
* Rebuilt URL to: http://localhost:8080/
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8080 (#0)
* Server auth using Basic with user 'foo'
> GET / HTTP/1.1
> Host: localhost:8080
> Authorization: Basic Zm9vOmZvbw==
> User-Agent: curl/7.58.0
> Accept: */*
> 
* Recv failure: Connection reset by peer
* stopped the pause stream!
* Closing connection 0
curl: (56) Recv failure: Connection reset by peer

And a corresponding pcap: https://www.dropbox.com/s/5vsvwoow4ydcs3f/with-timeout.tgz?dl=0

And a corresponding wireshark screenshot:

Screenshot from 2020-01-28 14-02-22

@zerkms zerkms changed the title http: graceful shutdown sometimes closes (RST) the new connection without waiting for a response net/http: graceful shutdown sometimes closes (RST) the new connection without waiting for a response Jan 28, 2020
@zerkms zerkms changed the title net/http: graceful shutdown sometimes closes (RST) the new connection without waiting for a response net/http: graceful shutdown race condition Jan 28, 2020
@ALTree ALTree added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Feb 3, 2020
@odeke-em odeke-em self-assigned this Feb 5, 2020
@AlexanderYastrebov
Copy link
Contributor

#48642 seems to be a duplicate, I've proposed a fix https://golang.org/cl/353714 for it

@AlexanderYastrebov
Copy link
Contributor

#48642 was fixed

@zerkms
Copy link
Author

zerkms commented May 9, 2022

		c := srv.newConn(rw)
		if !srv.trackConn(c) {
			rw.Close()
			return ErrServerClosed
		}

ooooooh, this change makes the solution non-graceful either, right?

A graceful shutdown should either:

  1. Serve the accepted connection, or
  2. Don't accept it

But with the code above it accepts the connection then closes it.

@AlexanderYastrebov
Copy link
Contributor

I think you are right, moreover the problem seems to be a duplicate of #33313 which was fixed by https://golang.org/cl/213442 but now the "fix" for #48642 https://golang.org/cl/353714 reintroduced it back - client gets Connection reset by peer.

Patch the http/server.go with time.Sleep(3 * time.Second) before c := srv.newConn(rw)

To avoid patching the runtime a delay could be achieved via ConnContext:

package main

import (
	"context"
	"fmt"
	"net"
	"net/http"
	"os"
	"os/signal"
	"syscall"
	"time"
)

func main() {
	sig := make(chan os.Signal, 1)
	signal.Notify(sig, syscall.SIGTERM, os.Interrupt)

	mux := http.NewServeMux()
	mux.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {})

	srv := &http.Server{
		Addr:    ":8080",
		Handler: mux,
		ConnContext: func(ctx context.Context, c net.Conn) context.Context {
			time.Sleep(3 * time.Second)
			return ctx
		},
	}

	go func() {
		fmt.Println(srv.ListenAndServe())
	}()

	<-sig
	fmt.Println(srv.Shutdown(context.Background()))
}

The proper fix for #48642 should close the connection but only when server is Closed and not gracefully Shutdown.

@gopherbot
Copy link

Change https://go.dev/cl/405454 mentions this issue: Revert "net/http: close accepted connection"

AlexanderYastrebov added a commit to AlexanderYastrebov/go that referenced this issue May 11, 2022
The https://golang.org/cl/353714 wrongly closes accepted connection
if server is shutting down.

Updates golang#33313
Fixes golang#36819
Fixes golang#48642
AlexanderYastrebov added a commit to AlexanderYastrebov/go that referenced this issue May 11, 2022
The https://golang.org/cl/353714 wrongly closes accepted connection
if server is shutting down.

Updates golang#33313
Fixes golang#36819
Fixes golang#48642
AlexanderYastrebov added a commit to AlexanderYastrebov/go that referenced this issue May 11, 2022
The https://golang.org/cl/353714 wrongly closes accepted connection
if server is shutting down.

Updates golang#33313
Fixes golang#36819
Fixes golang#48642
AlexanderYastrebov added a commit to AlexanderYastrebov/go that referenced this issue May 11, 2022
The https://golang.org/cl/353714 wrongly closes accepted connection
if server is shutting down.

Updates golang#33313
Fixes golang#36819
Fixes golang#48642
gopherbot pushed a commit that referenced this issue May 11, 2022
This reverts CL 353714.

The change closes accepted connection also in graceful shutdown which
breaks the fix for #33313 (and apparent duplicate #36819).

The proper fix should close accepted connection only if server is closed
but not in graceful shutdown.

Updates #48642

Change-Id: I2f7005f3f3037e6563745731bb2693923b654004
GitHub-Last-Rev: f6d885a
GitHub-Pull-Request: #52823
Reviewed-on: https://go-review.googlesource.com/c/go/+/405454
Reviewed-by: Damien Neil <dneil@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
AlexanderYastrebov added a commit to AlexanderYastrebov/go that referenced this issue May 27, 2022
Previous attempt https://golang.org/cl/353714 also incorrectly closed
accepted connection while in shutdown.

Fixes golang#48642
Updates golang#33313, golang#36819
@gopherbot
Copy link

Change https://go.dev/cl/409154 mentions this issue: net/http: close accepted connection when Closed

@gopherbot
Copy link

Change https://go.dev/cl/409537 mentions this issue: net/http: wait for listeners to exit in Server.Close and Shutdown

gopherbot pushed a commit that referenced this issue Jul 8, 2022
Avoid race conditions when a new connection is accepted just after
Server.Close or Server.Shutdown is called by waiting for the
listener goroutines to exit before proceeding to clean up active
connections.

No test because the mechanism required to trigger the race condition
reliably requires such tight coupling to the Server internals that
any test would be quite fragile in the face of reasonable refactorings.

Fixes #48642
Updates #33313, #36819

Change-Id: I109a93362680991bf298e0a95637595dcaa884af
Reviewed-on: https://go-review.googlesource.com/c/go/+/409537
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Damien Neil <dneil@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
jproberts pushed a commit to jproberts/go that referenced this issue Aug 10, 2022
Avoid race conditions when a new connection is accepted just after
Server.Close or Server.Shutdown is called by waiting for the
listener goroutines to exit before proceeding to clean up active
connections.

No test because the mechanism required to trigger the race condition
reliably requires such tight coupling to the Server internals that
any test would be quite fragile in the face of reasonable refactorings.

Fixes golang#48642
Updates golang#33313, golang#36819

Change-Id: I109a93362680991bf298e0a95637595dcaa884af
Reviewed-on: https://go-review.googlesource.com/c/go/+/409537
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Damien Neil <dneil@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
@seankhliao seankhliao added this to the Unplanned milestone Aug 27, 2022
@AlexanderYastrebov
Copy link
Contributor

I think this was fixed by the https://go.dev/cl/409537

@zerkms zerkms closed this as completed Oct 4, 2022
@golang golang locked and limited conversation to collaborators Oct 4, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

6 participants