Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: net/http: context based graceful server shutdown #52805

Closed
marwan-at-work opened this issue May 9, 2022 · 20 comments
Closed

proposal: net/http: context based graceful server shutdown #52805

marwan-at-work opened this issue May 9, 2022 · 20 comments

Comments

@marwan-at-work
Copy link
Contributor

marwan-at-work commented May 9, 2022

Proposal

I propose adding two new methods to the net/http *Server struct: ListenAndServeContext and ListenAndServeTLSContext that will handle gracefully shutting down the server upon a context cancellation signal.

To accompany these two methods, we need a graceful shutdown timeout which can either live as a field in the Server struct, or an argument to the new methods.

The signature can either be:

// ListenAndServeContext is like ListenAndServer but will call s.Shutdown() when the given
// context sends a cancellation signal. If the ShutdownTimeout field is set, it will be used
// as timeout duration for the Shutdown method.
func (s *Server) ListenAndServeContext(ctx context.Context) error
func (s *Server) ListenAndServeTLSContext(ctx context.Context, certFile, keyFile string) error

type Server struct {
  ...
  ShutdownTimeout time.Duration
  ...
}

OR

func (s *Server) ListenAndServeContext(ctx context.Context, timeout time.Duration) error
func (s *Server) ListenAndServeTLSContext(ctx context.Context, timeout time.Duration, certFile, keyFile string) error

It might also be worth noting that there are two more methods (Serve, and ServeTLS) that can arguably take a context as well. I rarely ever encounter explicit net.Listeners being passed in when starting an http server so I'll leave that up for discussion.

The examples below will assume option 2 above as it makes it more explicit for the user to pass a Shutdown timeout instead of it being tucked away as a struct field.

Code Examples

The code from the caller's perspective will look something like this:

ctx, cancel := signal.NotifyContext(context.Background(), os.Interrupt)
defer cancel()

server := &http.Server{
  Addr: addr,
  Handler: handler,
}

if err := server.ListenAndServeContext(ctx, time.Second); err != nil {
  log.Fatal(err)
}

Implementation

The underlying implementation would abstract the subtle details to handle graceful shutdowns:

func (s *Server) ListenAndServe(ctx context.Context, shutdownTimeout time.Duration) error {
  return s.ListenAndServeTLS(ctx, shutdownTimeout, "", "")
}

func (s *Server) ListenAndServeTLS(ctx context.Context, shutdownTimeout time.Time, certFile, keyFile string) error {
	serverErr := make(chan error, 1)
	go func() {
		// Capture ListenAndServe errors such as "port already in use".
		// However, when a server is gracefully shutdown, it is safe to ignore errors
		// returned from this method (given the select logic below), because
		// Shutdown causes ListenAndServe to always return http.ErrServerClosed.
		if certFile != "" && keyFile != "" {
			serverErr <- s.ListenAndServeTLS(certFile, keyFile)
		} else {
			serverErr <- s.ListenAndServe()
		}
	}()
	var err error
	select {
	case <-ctx.Done():
		ctx, cancel := context.WithTimeout(context.Background(), shutdownTimeout)
		defer cancel()
		err = s.Shutdown(ctx)
	case err = <-serverErr:
	}
	return err
}

Notably, the implementation above would return a nil error when a server is gracefully and successfully shutdown without hitting the timeout. We could also explicitly return the ctx.Error() or a new error variable if we never want to return a nil error. But I think a nil error is a fine signal for saying a server was successfully terminated.

Finally, there's a question of what a 0 duration would mean:

  1. Either it means no graceful shutdown since the context being passed to s.Shutdown will immediately be cancelled.
  2. Or it means wait forever such as the equivalent of s.Shutdown(context.Background().

The first option seems more rational though the second might be nice for convenience but it would have to be explicitly documented.

Rationale

Many people today run Go servers while not implementing graceful shutdown in the first place (for example calling the global http.ListenAndServe function) or implementing it but missing edge cases such as the following:

  1. Misusing shutdown.
go func() {
  if err := srv.ListenAndServer(); err != nil {
    log.Fatal(err) // will immediately log http.ErrServerClosed and exit.
  }
}()
  1. Ignoring legitimate ListenAndServe errors:
go func() { srv.ListenAndServe() }()

And probably other subtleties. I have myself made those mistakes many items and ended up abstracting the logic into its own context: https://github.com/marwan-at-work/serverctx/blob/main/serverctx.go

Looking at private code within my company I also noticed each team does a flavor of the above which seems like a good case for code re-use.

Furthermore, looking at the standard library we already have similar patterns such as exec.Command and exec.CommandContext which could make this a natural fit for Go developers already familiar with these APIs.

@gopherbot gopherbot added this to the Proposal milestone May 9, 2022
@ianlancetaylor
Copy link
Contributor

CC @neild @bradfitz

@ianlancetaylor ianlancetaylor added this to Incoming in Proposals (old) May 9, 2022
@neild
Copy link
Contributor

neild commented May 9, 2022

This proposal passes a Context and a graceful shutdown timeout to Server.ListenAndServeContext. The Context bounds the time until the server starts graceful shutdown, and the timeout bounds how long the graceful shutdown period lasts.

The existing Server.Shutdown method takes a Context to bound the graceful shutdown period.

Under this proposal, therefore, we'd have one API that takes a Context to bound the server lifetime but not the shutdown period, and a different and incompatible API which takes a Context to bound the graceful shutdown period but not the server lifetime. That seems confusing. It also means the ListenAndServeContext function would only be usable in cases where there is no graceful shutdown period or the grace period is strictly time-bounded.

I think this proposal is addressing two issues at the same time:

  • You can't bound a http.Server's lifetime using a Context without spinning up a goroutine and manually watching for context cancellation. I don't know how much of an issue this is in practice; how often do people want to stop a server using a Context rather than by calling Server.Shutdown?
  • The interactions between Server.Shutdown and Server.ListenAndServe are somewhat sharp-edged: ListenAndServe returns immediately with an error, when returning nil after shutdown has completed might be friendlier in most cases.

Perhaps it might be simpler to address the second point without introducing a context to ListenAndServe.

@natefinch
Copy link
Contributor

how often do people want to stop a server using a Context rather than by calling Server.Shutdown?

Basically always? Context is the canonical way to control the lifetime of a call. This is why signal.NotifyContext was added to the standard library, because it was so common to want to hook these two things up.

I wonder if the solution is to add graceful shutdown semantics right into context.Context, and then you'd only need ListenAndServeCtx, and the context would fully contain graceful exit semantics as well as simple cancel semantics.

And this would also then give everyone who uses contexts a one-stop shop for graceful shutdowns.

I think we steal the idea from Shutdown(ctx) - implement a separate context that only controls the lifetime of Shutdown logic. It'll be a context within a context, but if we can word things well, I think it'll be ok.

// ContextForShutdown returns the Context that controls the lifetime of the logic that 
// runs when ctx's Done channel is closed.
func ContextForShutdown(ctx Context) Context

// WithShutdownTimeout returns a context that puts a timeout on the context returned by
// ContextForShutdown, and a CancelFunc that closes the done channel on the shutdown context. 
func WithShutdownTimeout(parent Context, timeout time.Duration) (ctx Context, cancelShutdown CancelFunc)

// WithShutdownCancel returns a context and a CancelFunc that will close the done
// channel on the context returned by ContextForShutdown for ctx. 
func WithShutdownCancel(parent Context) (ctx Context, cancelShutdown CancelFunc)

I use Shutdown here to try to avoid the confusion of reusing "Cancel". Shutdown being the logic that runs after the original context is marked done.

// in user code, this is all we'd need for graceful shutdown of an application
ctx, stop := signal.NotifyContext(context.Background(), os.Interrupt)
defer stop() 
ctx, cancelShutdown := context.WithShutdownTimeout(ctx, timeout)
defer cancelShutdown()
return server.ListenAndServeCtx(ctx)

And then once the original context gets closed by the Signal, http.Server would call context.ContextForShutdown on the context it was passed by ListenAndServeCtx, and use that to pass into Shutdown.

@rsc rsc moved this from Incoming to Active in Proposals (old) May 11, 2022
@rsc
Copy link
Contributor

rsc commented May 11, 2022

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@earthboundkid
Copy link
Contributor

I'm agnostic about the right API here, but I do think this is a common need and a bit of a PITA to code correctly.

Maybe it accepts a callback to run when ctx1 is cancelled and that callback returns the context for the shutdown?

@DeedleFake
Copy link

In the event of a nil Server.BaseContext, would the provided Context be used?

@rsc
Copy link
Contributor

rsc commented May 18, 2022

It sounds like having some kind of graceful shutdown is a common need that might be worth filling. People seem less enthusiastic about these specific API details. Does anyone want to propose a simpler way?

@ianlancetaylor ianlancetaylor changed the title proposal: net/http: Include Context Based Server Graceful Shutdown. proposal: net/http: context based graceful server shutdown May 18, 2022
@marwan-at-work
Copy link
Contributor Author

I'm all ears (or eyes I guess) for proposing a simpler way 👍🏼

One alternative we can do is have server.Server take a context as an optional field (in addition to context timeout) and this way *Server.ListenAndServe (and friends) can check if the context field is not nil and it would launch a goroutine under the hood that manages the graceful shutdown. This would mean no new additional methods but only two additional fields.

Something like this:

type Server struct {
  ...
  ShutdownTimeout time.Duration
  Context context.Context
  ...
}

and you would just call server.ListenAndServer() as you normally do. People managing ListenAndServe manually won't be broken because the Context will be nil. And if people decide to add a Context, then they are explicitly choosing the new API and would not need to manage graceful shutdown anymore.

Personally I still prefer the Go convention of explicitly passing context as a method/function argument but there's precedent here with net/http.Request storing the context as a field. So I'm happy either way

@seankhliao
Copy link
Member

I think having http.Server itself support context based shutdowns isn't enough, often there are other background tasks, connection upgrades, or even multiple http.Servers running which all need to be coordinated and accounted for in a graceful shutdown.

I would rather see something like a version of github.com/oklog/run that can support both context-based and call-based cancellation, allowing existing code to plug in to coordinated shutdowns.

@neild
Copy link
Contributor

neild commented May 23, 2022

I think that any proposal here needs to start with a clear goal.

The original post here includes a function which shows how to run an http.Server with a context-bounded lifetime. Lightly modified:

func ListenAndServeTLS(ctx context.Context, server *http.Server, shutdownTimeout time.Duration, certFile, keyFile string) error {
	serverErr := make(chan error, 1)
	go func() {
		serverErr <- server.ListenAndServeTLS(certFile, keyFile)
	}()
	select {
	case <-ctx.Done():
		ctx, cancel := context.WithTimeout(context.Background(), shutdownTimeout)
		defer cancel()
		err = s.Shutdown(ctx)
	case err := <-serverErr:
		return err
	}
}

That doesn't look too bad to me.

The least elegant part to my eyes is the handling of shutdownTimeout, which creates a new background context to apply the graceful shutdown timeout. However, this lack of elegance is, I think, implicit in using contexts to bound the lifetime of operations: We cannot both bound the server lifetime with a context and have a graceful shutdown, because graceful shutdown requires extending the server lifetime past the point the server context expires.

This argues to me that a context is not the right way to bound the lifetime of an http.Server, and that the API we have today with Server.Close and Server.Shutdown explicitly terminating a server's lifetime is the better approach.

@earthboundkid
Copy link
Contributor

There are already ListenAndServe, ListenAndServeTLS, Serve, ServeTLS. Adding more context variant methods seems bad, so adding struct fields seems like lesser evil. The approach of using callbacks is pretty flexible. Maybe BeforeFunc func(*http.Server) error (not sure if this is a good name).

server := &http.Server{
  Addr: addr,
  Handler: handler,
  BeforeFunc: func(s *http.Server) {
        log.Print("starting up")
	c := make(chan os.Signal, 1)
	signal.Notify(c, os.Interrupt)
	<- c
        log.Print("shutting down")
	ctx, cancel := context.WithTimeout(context.Background(), shutdownTimeout)
	defer cancel()
	return s.Shutdown(ctx)
  },
}

if err := server.ListenAndServeContext(ctx, time.Second); err != nil {
  log.Fatal(err)
}

@bcmills
Copy link
Contributor

bcmills commented May 24, 2022

This argues to me that a context is not the right way to bound the lifetime of an http.Server, and that the API we have today with Server.Close and Server.Shutdown explicitly terminating a server's lifetime is the better approach.

I concur. My mental model of context.Context is that cancellation means “the caller is no longer interested in the results of this call”, and through that lens passing a Context to ListenAndServeContext would mean “the caller is no longer interested in the output of the server”. That sounds to me like a hard shutdown, not a graceful one.

@rsc
Copy link
Contributor

rsc commented Jun 1, 2022

Is there anything to learn from https://github.com/facebookarchive/grace?
I don't remember much about it, only that it was an early graceful shutdown contender.

It sounds like we are still waiting for the right approach.

@cristaloleg
Copy link

Nit: regarding graceful shutdown, tableflip looks more supported and maintained https://github.com/cloudflare/tableflip

@rsc
Copy link
Contributor

rsc commented Jun 8, 2022

I'm not sure how to move this discussion forward. Does anyone want to propose an alternative, simpler API?

Alternately, http.Server already has Shutdown and Close. It's a few lines of code to connect those to whatever contexts or timeouts that might be needed in a given situation. And servers don't get started and stopped at high frequency, so an extra goroutine or two wouldn't matter. Do we really need more than those? Is there something else they don't provide that we should be making available? That is, maybe Shutdown and Close are the alternative, simpler API.

@neild
Copy link
Contributor

neild commented Jun 8, 2022

I think the existing Shutdown and Close methods are fine.

I do not think we should provide a version of ListenAndServe that takes a Context that triggers server shutdown. As I mentioned above, the fact that the server needs to run for a grace period after shutdown has begun argues that a context isn't the right way to bound a server's lifetime. If you do want to shut down a server when a context expires, you can do so easily enough by calling Shutdown.

It would be nice if ListenAndServe didn't return until after the graceful shutdown period, and returned nil after a graceful shutdown. This would mean you could write something like this:

func main() {
  // ...
  if err := server.ListenAndServe(); err != nil {
    log.Fatal(err)
  }
}

Instead of this:

func main() {
  // ...
  if err := server.ListenAndServe(); err != http.ErrServerClosed {
    log.Fatal(err)
  }
  select{} // whatever called Shutdown needs to call os.Exit
}

Perhaps we could add a Server.Wait method, which waits for shutdown to complete and returns any non-ErrServerClosed error from ListenAndServe:

func main() {
  server.ListenAndServe() // ignore error, Wait will return it
  if err := server.Wait(); err != nil {
    log.Fatal(err)
  }
}

@marwan-at-work
Copy link
Contributor Author

marwan-at-work commented Jun 8, 2022

I'm not sure if I understand the suggestion above, so I'd love to see a full end to end example of how the code would look like from the caller's side while handling all scenarios such as pre-run errors like port-already-in-use, and post-run errors such as shutdown-timeout.

Also, telling people to ignore the error from ListenAndServe and to instead use Wait seems a little odd to me. I think I'd rather deprecate a method entirely in favor of a newer one as that will feel more clean cut than deprecating only the return part of an existing method signature.

Also I'd love to mention that my reasoning behind this issue (and not particularly the proposal itself) is the fact that today if you want to "properly" run a server in production, you need to handle graceful shutdowns and therefore you need to write this code (or a variation of it) every single time: https://github.com/marwan-at-work/serverctx/blob/main/serverctx.go

The code itself is not particularly long, but it is subtle and in retrospect it would have been great if the net/http library only exposed a single way of running/closing a server that was always graceful.

Therefore, I am mostly interested in ways can steer the community to always consider graceful shutdowns when running production servers.

Of course, adding new APIs can do just that but if there isn't a simple/clean way, maybe the solution is to just properly document/encourage people of this best practice?

For example

  1. We can add "Example Test" of running a server with graceful shutdowns. One already exists, but might be worth updating
  2. We can mention in the docs of http.ListenAndServer that it does not handle graceful shutdowns. Maybe we can even deprecate it in favor of doing whatever Example Test suggests.
  3. Maybe there's something go vet can do to help in that regard.

That said, I'm definitely happy to see new API suggestions like the one above and how they would work.

Thanks!

@rsc
Copy link
Contributor

rsc commented Jun 15, 2022

I'm not sure about changing ListenAndServe to ever return nil (lots of code does log.Fatal(http.ListenAndServe(...)) and it would be weird to get log.Fatal(nil) out of that). But we could certainly have a distinguished ErrGracefulShutdown (bad name) or something like that. In any event, that sounds like a different proposal.

Sounds like new API as described in this issue is a likely decline.

@rsc rsc moved this from Active to Likely Decline in Proposals (old) Jun 15, 2022
@rsc
Copy link
Contributor

rsc commented Jun 15, 2022

Based on the discussion above, this proposal seems like a likely decline.
— rsc for the proposal review group

@rsc
Copy link
Contributor

rsc commented Jun 22, 2022

No change in consensus, so declined.
— rsc for the proposal review group

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
No open projects
Development

No branches or pull requests