runtime: panic + recover can cancel a call to Goexit #29226

bcmills · 2018-12-13T20:22:01Z

The documentation for runtime.Goexit says (emphasis mine):

Goexit terminates the goroutine that calls it. No other goroutine is affected. Goexit runs all deferred calls before terminating the goroutine. Because Goexit is not a panic, any recover calls in those deferred functions will return nil.

However, this is empirically not the case if one of the deferred functions itself triggers a panic: that panic can be recovered, and that recover cancels the termination of the goroutine.

See https://play.golang.org/p/7VrxPDByUNT for a minimal illustration.

Found while investigating #29207.

CC @aclements @randall77

The text was updated successfully, but these errors were encountered:

randall77 · 2018-12-28T19:04:12Z

Panics within panics hurt my brain.

I don't think this would be too hard to fix. We just need another bit in the goroutine to record the "in Goexit" state.

There's a comment in the runtime (from 2014):

		// If defer was started by earlier panic or Goexit (and, since we're back here, that triggered a new panic),
		// take defer off list. The earlier panic or Goexit will not continue running.

So it looks like the current behavior is at least somewhat intentional.
@rsc, who wrote that comment.

bcmills · 2019-09-24T12:58:53Z

#34481 proposes to substantially rework the panic implementation. We should see if we can incidentally address this issue as part of that effort.

danscales · 2019-09-24T21:48:28Z

This example is actually a bit of a special case, because the Goexit call and the recover are in the same stack frame/function:

func maybeGoexit() {
	defer func() {
		fmt.Println(recover())
	}()
	defer panic("cancelled Goexit!")
	runtime.Goexit()
}

In this case, we are running somewhat into the "abort" rule (which is currently only described in the implementation, not in any spec). If the processing of defers by a recursive panic gets to the frame where an outer panic or Goexit happened, then that outer panic/Goexit is aborted, because there is no mechanism to resume it with a recover (since a recover always finishes the defers of the current frame and then resumes execution with the next outer frame). The abort rule is exactly related to the comment by @rsc above. If we wanted to allow the Goexit to continue, I think we would have add a bunch of mechanism (maybe some stack smashing) just to deal with this case. (The abort case is also mentioned in the Requirements section for the just-added open-coded defer proposal: https://github.com/golang/proposal/blob/master/design/34481-opencoded-defers.md#requirements )

But note that the more likely/useful case (where a panic and recover happens completely within a deferred function during a Goexit() works fine. The recover happens, and the Goexit() continues.

https://play.golang.org/p/kcQBNOAUeSa

So, for instance, it would work fine if a panic-recover happened during a Printf in a defer function (which can happen with nil pointers, etc. of the printed args).

So, I'll think about this a bit more, but I would lean toward not fixing this (because it would require a bunch of added mechanism just for this one case).

go101 · 2019-09-25T11:35:16Z

I doubt the current implementation of Goexit is to create a special panic. In @bcmills's example, the panic is recovered, but in @danscales' example, the panic is not recovered.

If this is true, I lean toward the panic created in Goexit should be always unrecoverable.

go101 · 2019-09-25T12:03:23Z

In fact, the key here is, if Goexit creates a panic, in @bcmills's example, the panic created in Goexit is suppressed by the new panic created in defer panic("cancelled Goexit!"), which is consequently recovered. (If the suppression doesn't happen, the the recover will not happen too, for the panic created in Goexit is unrecoverable and will not cause crashing.) So, if the issue needs to be fixed, then we should avoid letting the new panic created in defer panic("cancelled Goexit!") suppress the special panic created in Goexit.

bcmills · 2019-09-25T13:35:37Z

note that the more likely/useful case (where a panic and recover happens completely within a deferred function during a Goexit() works fine

I agree that that's more likely, but there are realistic scenarios that could trigger this bug. For example, consider https://play.golang.org/p/vSL3LWEGpm2.

bcmills · 2019-09-25T13:42:17Z

This example is actually a bit of a special case, because the Goexit call and the recover are in the same stack frame/function

Note that the issue can be reproduced with the Goexit, recover, and panic all in separate functions (and presumably frames):
https://play.golang.org/p/M93Kxa4aBhX

danscales · 2019-09-25T16:35:59Z

I agree that that's more likely, but there are realistic scenarios that could trigger this bug. For example, consider https://play.golang.org/p/vSL3LWEGpm2.

Yes, you are demonstrating the exact panic abort semantics that I mentioned. Just to describe it again, an outer panic/Goexit is aborted if the inner panic reaches & processes any defers in the same frame or in a higher (outer) frame from the frame where the panic/Goexit happened. Once the outer panic/Goexit is aborted, it will no longer apply even if there is a recover() that recovers the inner panic.

I think you want to stop these unspecified abort semantics. Instead, you want to change the implementation so that if an inner panic is recovered in a frame at the same level or even further down in the stack from where the output panic/Goexit happened, you want the defer processing of the frames to continue under the auspices of the outer panic/Goexit.

Note that if the recover happens in a frame which is higher in the stack from the output panic/Goexit, we still want the current behavior, which is that the defers in the current frame are finished, and then "normal" execution continues in the next outer frame, which is some function called by a defer caused by the outer panic. So, the outer panic will continue as expected, but it will get to complete the running of the defer function that caused the panic.

Since the behavior of recursive panics/Goexits (including the abort semantics) was never specified in the first place, I think we have leeway on whether/how we want to fix this.

I've tried to get some feedback on improving the description of panic/recover/recursive panics in the Go language spec, but don't think we have consensus yet about whether or what to improve:

https://go-review.googlesource.com/c/go/+/189377

Feel free to add comments....

danscales · 2019-09-25T17:14:50Z

Instead, you want to change the implementation so that if an inner panic is recovered in a frame at the same level or even further down in the stack from where the output panic/Goexit happened, you want the defer processing of the frames to continue under the auspices of the outer panic/Goexit.

OK, and I think the hard part about doing this is that we presumably want to clean up the stack of the inner panic and somehow make it look like we are back to the outer Goexit/panic processing. That seems really hard to do without a lot of stack gymnastics (possibly stack smashing) and/or extra stubs to allow us to get control back to the runtime after cleaning up the stack. If we don't require that the stack is cleaned up, then getting rid of the abort behavior seems more doable.

bcmills · 2019-09-25T17:20:09Z

I think you want to stop these unspecified abort semantics. Instead, […] you want the defer processing of the frames to continue under the auspices of the outer panic/Goexit.

Maybe? Consider https://play.golang.org/p/MyBCSSorYFl: the outer function defers a recover and a cleanup function and then panics, and the cleanup function also panics independently

Intuitively, what I would expect is for the first defer to catch the first panic (“panic!”), and for the second panic (“whoops!”) to continue propagating outward, since that panic occurred after the execution of the function that performed the initial defer.

I could also reasonably see dropping the second panic entirely, since the fact that it occurred is likely a side-effect of some invariant violated as a result of the first panic: if the first panic is fixed, perhaps the second one won't occur at all.

Today, the first recover instead receives the second panic, and the details of the first panic are completely lost.

(I would call the second panic the “inner” one, but maybe I'm using “inner” and “outer” a bit differently than you are.)

go101 · 2019-09-26T15:10:26Z

If panics are maintained in a stack and only the top one can be recovered (this example shows the possibility is high), then the panics created by Goexit calls shouldn't be pushed into the panic stack. Instead, each goroutine should maintain a separated field to record the last (or the first) panic created by Goexit calls.

BTW, the following program proves that if a panic created by Goexit is at the top of the panic stack, then all other panics will also not get recovered.

package main

import "fmt"
import "runtime"

func main() {
	go func() {
		defer func() {
			fmt.Println("bb", recover()) // <nil>
		}()
		defer func() {
			defer func() {
				fmt.Println("aa", recover()) // <nil>
			}()
			defer runtime.Goexit()
			panic(2)
		}()
		panic(1)
	}()
	
	select{}
}

go101 · 2019-09-26T15:19:17Z

@ianlancetaylor
It looks gccgo and gc disagree on the last example.

ianlancetaylor · 2019-09-27T19:13:34Z

@go101 Thanks for noticing that difference. I think we should work out this issue, and then we can make sure that the implementations do the same thing.

go101 · 2019-09-27T20:15:40Z

It looks the difference is, in gccgo, the panic created in Goexit will not shadow the newest panic, whereas gc will. In gccgo, the panic created in Goexit will shadow the panics before the newest panic.

ianlancetaylor · 2019-09-27T23:14:42Z

Yes, but let's first figure out what is correct, and then implement that.

go101 · 2019-09-28T17:58:43Z

Personally, I think panicking and exiting should be two different properties of gorotines.

gopherbot · 2019-10-09T22:16:56Z

Change https://golang.org/cl/200081 mentions this issue: runtime: ensure that Goexit cannot be aborted by a recursive panic/recover

danscales · 2019-10-09T22:35:48Z

I figured out how to make sure that Goexits are not aborted by recovers that happen in defers below them... It didn't require any stack smashing or extra stubs -- just saving the pc/sp when making calls to deferred functions from the Goexit defer processing loop. See the mentioned change for the fix.

We discussed the abort behavior with respect to panic and Goexit, and are leaning toward only changing the Goexit behavior (which is the mentioned changed). As summarized well by Robert at https://go-review.googlesource.com/c/go/+/189377/7/doc/go_spec.html#6169 [paraphrased here]

From a programmer's point of view, if a defer contains a recover() call, I expect it to catch any pending panic, no matter where they are coming from. Otherwise it's impossible to write code that protects against (expected and unexpected) panics leaving a function/API entry point. So that would argue against "only recover the panic initiated in its own function".

And the abort (replace) behavior for panics exactly coalesces panics down so that we make sure that a recover in a function always stops all panics within that function or its callees.

go101 · 2019-10-10T14:06:08Z

There is another related question: should a Goexit call save the a panicking goroutine from crashing? The current gc and gccgo both do this. Personally, I think it is not very reasonable.

package main

import "runtime"
import "time"


func main() {
	go func() {
		defer runtime.Goexit() // [edit]: add the "defer"
		panic(1) // will not crash the program
	}()
	
	for {time.Sleep(100000)}
}

bcmills · 2019-10-10T14:12:05Z

@go101, in that snippet the runtime.Goexit() exits the goroutine immediately, so the panic line is not even reached.

bcmills · 2019-10-10T14:13:26Z

But if you defer the call to runtime.Goexit(), that does suppress the panic (https://play.golang.org/p/JeXz2KhA5f_h), and I agree that it should not.

bcmills · 2019-10-10T14:15:25Z

A more realistic example illustrating that problem (https://play.golang.org/p/Gx4zGz5L79i):

func TestSkipAfterPanic(t *testing.T) {
	defer func() {
		if !t.Failed() {
			t.Skip("huh?")
		}
	}()

	panic("serious bug that should fail this test!")
}

bcmills · 2019-10-11T02:02:58Z

In practice the ~only real user of runtime.Goexit is the testing package, and for that case it seems decidedly less weird for an unrecovered panic to cause a test failure than for t.Skip to cancel an unrecovered panic.

But you're right that I would need to update the double defer sandwich to account for the possibility of a Goexit resuming after a successful recover. Perhaps it will need to become the “triple defer with cheese”. 😃

Another option would be to make runtime.Goexit itself panic if the goroutine is already panicking. That would at least make the t.Skip example do something more reasonable (specifically, fail the test if it tries to Skip during an unrecovered panic).

go101 · 2019-10-11T13:43:19Z

Maybe two panic stacks are needed. One for normal panics, the other for Goexit panics. Or each non-exited function call may associate with at most two unrecovered panics, one is normal panic, the other is Goexit panic. Goexit panics are not recoverable. Either of two panics existing will make the associated function call enter its terminating execution phase,

go101 · 2019-10-11T13:55:49Z

I just modified my explanation on the panic/recover mechanism to include Goexit signals.

One reason I think Goexit signals should not shadow normal panics is that, if we put a defer runtime.Goexit() call at the beginning of a goroutine entry function, then the defrerred call acts as an ultimate ~~recover call~~ recover operation.

go101 · 2019-10-11T21:05:33Z

By describing it in non-optimized code:

type _panic struct {
	// Both of the two can be true.
	// causedByPanic may be changed from true
	// to false, but causedByExit may not.
	causedByPanic  bool
	causedByGoExit  bool

	headReason *_panicReason
	tailReason *_panicReason
}

type _panicReason struct {
	value interface{}
	prev  *_panicReason
}

// When a function call exits, its associating `_panic`
// will be merged with the `_panic` associating with its caller.
...
	if exitedCall.panic.causedByPanic {
		caller.panic.causedByPanic = true
		if exitedCall.panic.tailReason != nil {
			exitedCall.panic.tailReason = caller.panic.headReason
		}
		caller.panic.headReason = exitedCall.panic.headReason
		if caller.panic.tailReason == nil {
			caller.panic.tailReason = caller.panic.headReason
		}
	}
	caller.panic.causedByGoExit = caller.panic.causedByGoExit || exitedCall.panic.causedByGoExit
	if !caller.isExiting && (caller.panic.causedByGoExit || caller.panic.causedByPanic) {
		caller.startExiting()
	}
...

[edit]: to optimize, the panic reason list can be move the the _goroutine struct as a stack (just as the current implementation), and the two bools can be replaced with bits.

go101 · 2019-10-26T04:28:49Z

@griesemer @ianlancetaylor Is it possible that gc and gccgo can make agreement on the panic and Goexist mechenism in Go 1.14? Specifically,

the result of which implementation of this example is correct?
should Goexit shadow normal panics?

In my opinion, the earlier the points are clarified, the better for the Go community.

ianlancetaylor · 2019-10-26T19:48:44Z

I don't understand precisely what your questions are, but I agree that these questions need to be resolved in the language spec.

That said, it does not seem urgent to me. These issues have been confused for many years and nobody has cared. If we sort them out for 1.14, great. If not, OK.

go101 · 2019-10-27T12:44:12Z

I wrote a book and one chapter in it explains the panic/recover mechanism. And I plan to make it cover runtime.Goexit calls. So I very care about this. I really don't want to spread wrong explainations into the Go community.

I think the implementation is not urgent to be corrected (if the current implementation is really not perfect), but the mechanism should be settled down as earlier as possible, if there are no special difficulties to prevent it being done.

BTW, @ianlancetaylor , thanks for your explanation in another thread. I will write down the confirmed part in my book. I asked this question is because I need to finish the chapter in my book. If this is impossible in Go 1.14, then I will mention the Goexit related part is still not formalized yet., otherwise, I will wait unitl it is formalized.

danscales · 2019-10-27T22:43:38Z

@go101 I think we have mostly settled that the behavior of panic and recover will stay as it is right now. My current proposed changes to the Go language spec are here (don't know if you've seen the latest changeset):

https://go-review.googlesource.com/c/go/+/189377

I/we are not trying to specify every possibility for panics/recovers in the spec, but just want to specify better when recovers actually apply (only in a defer directly called by a panicking sequence) and then describe the the typical case of recursive panics and also the behavior of when one panic can replace another panic. See also my comment on Oct 9th above on why the panic replacement behavior makes sense.

Goexit is a library routine, so it will not be specified in the language spec. However, as I mentioned above, we are planning to fix the current bug and never allow a panic/recover to cancel a Goexit. We have to discuss more about further interaction between panic and Goexit (as your example in your Sept 26 comment and examples from Bryan Mills illustrate). The fix for the current bug may make it into Go 1.14, but not definitely. Since we still have to discuss things, I'm fairly sure that no further changes in Goexit/panic/recover interaction will make it into Go 1.14.

go101 · 2019-10-28T16:12:56Z

@danscales Thanks for the clarification.

we are planning to fix the current bug and never allow a panic/recover to cancel a Goexit.

What about the inverse? Will Goexit calls cancel unrecovered panics?

ianlancetaylor · 2019-10-28T21:49:10Z

I am guessing that you are asking whether defer Goexit() should recover a panic that is currently being thrown. I don't think it should. Goexit is not recover.

go101 · 2019-10-29T14:13:24Z

Yes, that is what I asked. Thanks for the clarification.

go101 · 2019-11-05T14:08:19Z

Should this following program print should be unreachable or not.
I tried tip, it prints it.

edit: sorry, the following program is a wrong example. Please see this comment for reason. ;D

package main

import "fmt"
import "runtime"

func f() {
	defer func() {
		recover()
	}()
	panic("will cancel Goexit but should not")
	runtime.Goexit()
}

func main() {
	c := make(chan struct{})
	go func() {
		defer close(c)
		f()
		fmt.Println("should be unreachable")
	}()
	<-c
}

go101 · 2019-11-05T14:11:51Z

And should the following program crash?
I tried tip. It doesn't, though I feel it should crash by @ianlancetaylor's clarification above.

package main

import "runtime"

func main() {
	c := make(chan struct{})
	go func() {
		defer close(c)
		// The Goexit signal shadows the
		// "bye" panic, but it should not.
		defer runtime.Goexit()
		panic("bye")
	}()
	<-c
}

bcmills · 2019-11-05T15:09:52Z

Should this following program print should be unreachable or not.

I agree that it should not, and I can reproduce the same behavior you observe using gotip.

bcmills · 2019-11-05T15:10:56Z

And should the following program crash?

That one is less clear to me, although I agree that @ianlancetaylor's comment suggests that it should crash.

ianlancetaylor · 2019-11-05T15:23:50Z

I agree with @bcmills.

danscales · 2019-11-05T17:45:38Z

Should this following program print should be unreachable or not.
I tried tip, it prints it.

package main

import "fmt"
import "runtime"

func f() {
	defer func() {
		recover()
	}()
	panic("will cancel Goexit but should not")
	runtime.Goexit()
}

func main() {
	c := make(chan struct{})
	go func() {
		defer close(c)
		f()
		fmt.Println("should be unreachable")
	}()
	<-c
}

This is exactly performing to spec and is unrelated to the current bug. https://golang.org/ref/spec#Handling_panics and https://go-review.googlesource.com/c/go/+/189377

The panic() call starts a panic sequence. By spec, the function that called panic will never continue running normally (hence you will never reach the Goexit). If a recover occurs in a defer directly called by the panicking sequence, then the panic will be recovered, and (after finishing any remaining defers in that frame), execution of the goroutine will continue in the caller of that frame (which in this case is f).

danscales · 2019-11-05T18:04:34Z

package main

import "runtime"

func main() {
	c := make(chan struct{})
	go func() {
		defer close(c)
		// The Goexit signal shadows the
		// "bye" panic, but it should not.
		defer runtime.Goexit()
		panic("bye")
	}()
	<-c
}

This is the issue that we mentioned above, which is that a Goexit can cancel a panic. I just filed a separate bug for this (#35378), and we can discuss there. As I mentioned, any fix so that a Goexit does not cancel a panic will not make it into 1.14.

bcmills · 2019-11-05T18:05:28Z

Ah, yeah. I somehow read the panic("will cancel Goexit […]") as defer panic([…]) instead. 😞

go101 · 2019-11-06T01:29:04Z

Ah, sorry, I also acted as it is a deferred call. ;D

go101 · 2020-04-05T16:22:56Z

It looks this problem has been fixed in Go SDK 1.14. Should it be closed?

danscales · 2020-04-05T18:36:49Z

Yes, let's close. This was fixed by my change 7dcd343 . I already opened a separate issue #35378 about the converse situation that a Goexit call can cancel a panic.

bcmills added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Dec 13, 2018

bcmills added this to the Go1.13 milestone Dec 13, 2018

andybons modified the milestones: Go1.13, Go1.14 Jul 8, 2019

danscales mentioned this issue Sep 26, 2019

spec: clarify when calling recover stops a panic #34530

Open

danscales self-assigned this Oct 8, 2019

rsc modified the milestones: Go1.14, Backlog Oct 9, 2019

gopherbot closed this as completed in 7dcd343 Nov 4, 2019

bcmills reopened this Nov 5, 2019

go101 mentioned this issue Feb 14, 2020

cmd/compile: the compiler should probably recognize runtime.Goexit #37193

Closed

go101 mentioned this issue Apr 5, 2020

Goexit with panic go101/go101#147

Closed

danscales closed this as completed Apr 5, 2020

golang locked and limited conversation to collaborators Apr 5, 2021

gopherbot added the FrozenDueToAge label Apr 5, 2021

rsc unassigned danscales Jun 23, 2022

runtime: panic + recover can cancel a call to Goexit #29226

runtime: panic + recover can cancel a call to Goexit #29226

Comments

bcmills commented Dec 13, 2018

randall77 commented Dec 28, 2018

bcmills commented Sep 24, 2019

danscales commented Sep 24, 2019

go101 commented Sep 25, 2019 • edited

go101 commented Sep 25, 2019 • edited

bcmills commented Sep 25, 2019

bcmills commented Sep 25, 2019

danscales commented Sep 25, 2019

danscales commented Sep 25, 2019

bcmills commented Sep 25, 2019 • edited

go101 commented Sep 26, 2019 • edited

go101 commented Sep 26, 2019

ianlancetaylor commented Sep 27, 2019

go101 commented Sep 27, 2019 • edited

ianlancetaylor commented Sep 27, 2019

go101 commented Sep 28, 2019

gopherbot commented Oct 9, 2019

danscales commented Oct 9, 2019

go101 commented Oct 10, 2019 • edited

bcmills commented Oct 10, 2019

bcmills commented Oct 10, 2019

bcmills commented Oct 10, 2019

bcmills commented Oct 11, 2019

go101 commented Oct 11, 2019 • edited

go101 commented Oct 11, 2019 • edited

go101 commented Oct 11, 2019 • edited

go101 commented Oct 26, 2019

ianlancetaylor commented Oct 26, 2019

go101 commented Oct 27, 2019 • edited

danscales commented Oct 27, 2019

go101 commented Oct 28, 2019

ianlancetaylor commented Oct 28, 2019

go101 commented Oct 29, 2019

go101 commented Nov 5, 2019 • edited

go101 commented Nov 5, 2019

bcmills commented Nov 5, 2019

bcmills commented Nov 5, 2019

ianlancetaylor commented Nov 5, 2019

danscales commented Nov 5, 2019

danscales commented Nov 5, 2019

bcmills commented Nov 5, 2019

go101 commented Nov 6, 2019

go101 commented Apr 5, 2020

danscales commented Apr 5, 2020

go101 commented Sep 25, 2019 •

edited

go101 commented Sep 25, 2019 •

edited

bcmills commented Sep 25, 2019 •

edited

go101 commented Sep 26, 2019 •

edited

go101 commented Sep 27, 2019 •

edited

go101 commented Oct 10, 2019 •

edited

go101 commented Oct 11, 2019 •

edited

go101 commented Oct 11, 2019 •

edited

go101 commented Oct 11, 2019 •

edited

go101 commented Oct 27, 2019 •

edited

go101 commented Nov 5, 2019 •

edited