proposal: spec: allow fatal panic handler #32333

pearsontophergopher · 2019-05-30T17:10:12Z

What version of Go are you using (`go version`)?

$ go version go1.11.5 darwin/amd64

Does this issue reproduce with the latest release?

Yuppers

What operating system and processor architecture are you using (`go env`)?

go env Output

$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/usterch/Library/Caches/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/usterch/src/GO"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/go"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/jf/ggd1wkrj6wn4zlqgt10xh66c5sxxm7/T/go-build298064772=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

Want to catch all panics in main() in order to perform a hook action before calling os.Exit(1). Unfortunately, by convention, go does not roll-up panics past certain boundaries.

What did you expect to see?

I was hoping the stack would continue to unwind until there was no stack left or that there would be a middleware I could register to intercept a panic globally.

What did you see instead?

The stack unwinds to the goroutine/package boundary. Since all of our code is running in go routines and separate packages, there is no seemingly D.R.Y. method of handling that graceful shutdown.

Detail:

I'm sure this is going to get slammed by the community (be gentle), but I figure it's worthwhile to ask.

When a panic() occurs, it's bad. The program should crash unless there's an explicit recover that's able to fix that specific issue. Just prior to the crash though, we would like the ability to call one function to handle a simple clean-up operation before the os.Exit(1). This is not a novel concept, Python/JAVA/Ruby all support it for example. This is easier to accomplish since they aren't pre-compiled languages.

The workarounds I see consistently used are to either:
a) copy+paste the same defer-recover statement into literally every function
b) Use panicwrapper which effectively re-launches your program into a subprocess and watches it for issues. If your program exits abnormally, panicwrapper is the parent process, thus it is able to handle that single clean-up operation. There are limitations to panicwrapper though - such as signal handling can get really garbled.

It seems like this should be something that's natively included in the language though as opposed to having to hack around it and fork() and redirect signals/stdout/stderr.

Some Usage Examples:

You are using a non-thread-safe 3rd party library, but you don't know it's not thread-safe. You call a function from that library in a gin router and everything blows up since gin is threaded automatically.
You have several packages that you develop, most of which don't have main() functions, and instead call individual functions. You forgot to protect a string map for concurrency and 💥, panic(). You want to iterate an some sort of error counter in prometheus and log a message about the panic before exiting (e.g. a stackdump).

These are both fatal events where the program should crash, but having a global panic() hook/middleware would enable a small amount of clean-up/extra logging to enable easier debug/code fixes.

It seems like there was a very intentional decision made here to prevent the unroll all the way to main(). If you disagree with this proposal, could you help me understand a little more about why it's advantageous to maintain this boundary?

The text was updated successfully, but these errors were encountered:

ianlancetaylor · 2019-05-30T18:32:21Z

I don't understand how to implement this. When you start a new goroutine, the old goroutine keeps running. If the new goroutine panics, we can't panic in the old goroutine. It may not even exist anymore.

Or perhaps you are looking for something like GODEBUG=tracebackancestors=100? Search for tracebackancestors in https://golang.org/pkg/runtime/.

neild · 2019-05-30T18:36:11Z

I think the real feature you're asking for is a way to set a handler that runs before the program exits, even when the program is aborting due to an unhandled panic.

FWIW, while support for atexit handlers has been intentionally omitted because they are so often misused, I found myself wanting one today for https://golang.org/cl/179599--it would be nice if the testing package had a chance to clean up and display output when a goroutine started by a test (as opposed to the test goroutine itself) triggers a panic.

pearsontophergopher · 2019-05-30T19:39:56Z

@neild I think that's a fairly good description. I would love the ability to clean up the DB after a test panics, instead of having to go clean it manually when someone Fs* something up so that the defer doesn't run.

@ianlancetaylor I'm looking into tracebackancestors now and what kind of performance hit that brings.

My thought on implementation would be to add a PanicHandler somewhere around here: https://github.com/golang/go/blob/master/src/runtime/panic.go#L887 - it could no-op by default. Then I could call runtime.RegisterPanicHandler(someFunc) up in my main() function.

ianlancetaylor · 2019-05-30T22:36:11Z

There is no reliable way to run a hook when the program exits. And such a hook would run in a very limited environment, so it would be difficult to write a correct hook.

If you must do some sort of cleanup after a program exits, the only reliable mechanism is to execute the program as a child, and do the cleanup after it returns.

pearsontophergopher · 2019-05-30T22:41:59Z

@ianlancetaylor Doesn't literally every unrecoverable panic call runtime.fatalpanic which is ultimately what writes the panic to stderr and exits 2? If so, it would be something like:

if runtime.HasRegisteredPanicHandler() {
  handle := runtime.GetRegisteredPanicHandler()
  handle()
}

networkimprov · 2019-05-30T22:51:31Z

If you are cleaning up a database, vs logging an error msg, you should do that on startup, as no exit code will run on OS crash or power-off.

ianlancetaylor · 2019-05-30T23:00:52Z

We can reliably hook into when a panic is going to cause the program to exit, but that is not the same as reliably hooking into every case where a program exits. There are many exit paths that do not go through runtime.fatalpanic.

And my other point still holds: by the time we reach runtime.fatalpanic the runtime has started shutting down. Any hook called by fatalpanic can not run ordinary Go code. It can only run a carefully written subset of Go code.

pearsontophergopher · 2019-05-30T23:15:33Z

@networkimprov Yup - and that's what we do - we run a clean-up both in setup() and in a defer() block, just in-case tests panicked. It's not DRY, but it's robust.

@ianlancetaylor That may not have been clear, sorry - I only want it to run in the case of a panic, that's the only time, not on other exit paths. Other exit paths can all be easily handled normally.
I do kind of see what you're talking about now, where the stack isn't necessarily in a predictable state here. I figured that anything generic you could do in a recover() function, you could do here (e.g. forward the panic message to an external logging service). I don't know how much the stack unwinds before we get here - would something like a logger still be available as long as:

the logger itself didn't cause the panic and
was instantiated prior to the function that panicked?

Would you happen to feel any differently about this if the proposed destination for this handler's home was say...in the unsafe package?

ianlancetaylor · 2019-05-31T05:08:47Z

You can write anything in a deferred function. The deferred functions are invoked as the stack is unwound.

A fatal panic shuts down the whole runtime, including the scheduler and everything.

I'm not sure I see the purpose of only handling a panic. You can write that yourself, with some care, using defer statements. But why distinguish between a panic and a fatal runtime error? A fatal runtime error is something like running out of memory; it doesn't panic, doesn't run defer statements, but just crashes the program with a stack trace. I don't think it's a common case to want to handle a uncaught panic but not want to handle a fatal runtime error.

I personally would not feel differently if this were in the unsafe package.

pearsontophergopher · 2019-05-31T15:51:45Z

The two things we personally do with a panic -
a) iterate a prometheus error counter
b) send the panic as a log message to Splunk
I'm able to accomplish b) by moving the logging responsibility out of Go and into another daemon but a) cannot be accomplished without copy+pasting the exact same code into literally thousands of functions, and even copy+pasting that code every where, I still can't run either a) or b) when there's a panic from a 3rd party library.
When there's a panic from a 3rd party library, we find an alternate library or submit a patch to address the panic, but it's very hard to get at that panic error message in general. That's why we (and other users) end up using https://github.com/mitchellh/panicwrap or https://github.com/bugsnag/panicwrap, so that we can iterate the counter and send the log message. I've seen others handle it with a post stop kubernetes hook and they go find/grep the log looking for a panic message.
I'm looking to imitate the behavior that panicwrapper provides. I'm assuming* that those are fatal panics (as opposed to a different kind of panic). I just think having a language native method of handling those unexpected failures from 3rd party libraries would be super helpful

rsc · 2019-06-04T20:20:12Z

In general Go has taken the position that error recovery should either be done locally (that is, by a parent frame expecting a certain kind of panic in the child) or else in a separate process entirely. This is why you can't recover from panic across goroutines and you also can't recover from throw (runtime-internal detection of very bad conditions like possible memory corruption).

We could have a kind of global defer to push something onto the top of every goroutine before it starts, but it's unclear that would help more than it complicates things. In the end you still need another process (one that can't be affected by memory corruption, stuck locks, or whatever other issues might have triggered the panic) watching the Go program if you want to be completely robust.

rsc · 2019-06-04T20:21:03Z

Marking as LanguageChange because this is about a fundamental language issue, even if the end result is mainly implemented in the runtime. The spec would need to explain that there is this "final step" during a panic.

ianlancetaylor · 2019-07-30T21:19:53Z

Per comments above, we aren't going to add this to the language/runtime. It would require defining a subset of Go that could be run in the panic handler to avoid effectively relaunching goroutines as the program was exiting. For Go, as with many other languages, reliable error handling has to be done in a separate process.

I don't understand your comment about prometheus error counters; in particular I don't understand why the code has to be copied into thousands of functions. Perhaps there is some way that can be addressed short of a fatal panic handler. For example, perhaps the separate analyzer process could examine the stack trace. But I don't really know.

-- iant for @golang/proposal-review

gopherbot added this to the Proposal milestone May 30, 2019

gopherbot added the Proposal label May 30, 2019

ianlancetaylor changed the title ~~proposal: Allow panics() to roll-up past goroutine/package boundaries~~ proposal: Go 2: allow panics to roll-up past goroutine/package boundaries May 30, 2019

ianlancetaylor added v2 A language change or incompatible library change LanguageChange labels May 30, 2019

ianlancetaylor changed the title ~~proposal: Go 2: allow panics to roll-up past goroutine/package boundaries~~ proposal: allow fatal panic handler May 31, 2019

ianlancetaylor removed v2 A language change or incompatible library change LanguageChange labels May 31, 2019

rsc changed the title ~~proposal: allow fatal panic handler~~ proposal: spec: allow fatal panic handler Jun 4, 2019

rsc added the LanguageChange label Jun 4, 2019

neild mentioned this issue Jun 4, 2019

proposal: testing: report failing test after panic #32121

Closed

rsc added the v2 A language change or incompatible library change label Jun 11, 2019

ianlancetaylor closed this as completed Jul 30, 2019

bradfitz added the error-handling Language & library change proposals that are about error handling. label Oct 29, 2019

golang locked and limited conversation to collaborators Oct 28, 2020

gopherbot added the FrozenDueToAge label Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal: spec: allow fatal panic handler #32333

proposal: spec: allow fatal panic handler #32333

pearsontophergopher commented May 30, 2019

ianlancetaylor commented May 30, 2019

neild commented May 30, 2019

pearsontophergopher commented May 30, 2019

ianlancetaylor commented May 30, 2019

pearsontophergopher commented May 30, 2019 •

edited

networkimprov commented May 30, 2019

ianlancetaylor commented May 30, 2019

pearsontophergopher commented May 30, 2019

ianlancetaylor commented May 31, 2019

pearsontophergopher commented May 31, 2019

rsc commented Jun 4, 2019

rsc commented Jun 4, 2019

ianlancetaylor commented Jul 30, 2019

proposal: spec: allow fatal panic handler #32333

proposal: spec: allow fatal panic handler #32333

Comments

pearsontophergopher commented May 30, 2019

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

What did you do?

What did you expect to see?

What did you see instead?

Detail:

Some Usage Examples:

ianlancetaylor commented May 30, 2019

neild commented May 30, 2019

pearsontophergopher commented May 30, 2019

ianlancetaylor commented May 30, 2019

pearsontophergopher commented May 30, 2019 • edited

networkimprov commented May 30, 2019

ianlancetaylor commented May 30, 2019

pearsontophergopher commented May 30, 2019

ianlancetaylor commented May 31, 2019

pearsontophergopher commented May 31, 2019

rsc commented Jun 4, 2019

rsc commented Jun 4, 2019

ianlancetaylor commented Jul 30, 2019

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?

pearsontophergopher commented May 30, 2019 •

edited