Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: panic + recover can cancel a call to Goexit #29226

Closed
bcmills opened this issue Dec 13, 2018 · 49 comments
Closed

runtime: panic + recover can cancel a call to Goexit #29226

bcmills opened this issue Dec 13, 2018 · 49 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented Dec 13, 2018

The documentation for runtime.Goexit says (emphasis mine):

Goexit terminates the goroutine that calls it. No other goroutine is affected. Goexit runs all deferred calls before terminating the goroutine. Because Goexit is not a panic, any recover calls in those deferred functions will return nil.

However, this is empirically not the case if one of the deferred functions itself triggers a panic: that panic can be recovered, and that recover cancels the termination of the goroutine.

See https://play.golang.org/p/7VrxPDByUNT for a minimal illustration.

Found while investigating #29207.

CC @aclements @randall77

@bcmills bcmills added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Dec 13, 2018
@bcmills bcmills added this to the Go1.13 milestone Dec 13, 2018
@randall77
Copy link
Contributor

Panics within panics hurt my brain.

I don't think this would be too hard to fix. We just need another bit in the goroutine to record the "in Goexit" state.

There's a comment in the runtime (from 2014):

		// If defer was started by earlier panic or Goexit (and, since we're back here, that triggered a new panic),
		// take defer off list. The earlier panic or Goexit will not continue running.

So it looks like the current behavior is at least somewhat intentional.
@rsc, who wrote that comment.

@andybons andybons modified the milestones: Go1.13, Go1.14 Jul 8, 2019
@bcmills
Copy link
Contributor Author

bcmills commented Sep 24, 2019

#34481 proposes to substantially rework the panic implementation. We should see if we can incidentally address this issue as part of that effort.

@danscales
Copy link
Contributor

This example is actually a bit of a special case, because the Goexit call and the recover are in the same stack frame/function:

func maybeGoexit() {
	defer func() {
		fmt.Println(recover())
	}()
	defer panic("cancelled Goexit!")
	runtime.Goexit()
}

In this case, we are running somewhat into the "abort" rule (which is currently only described in the implementation, not in any spec). If the processing of defers by a recursive panic gets to the frame where an outer panic or Goexit happened, then that outer panic/Goexit is aborted, because there is no mechanism to resume it with a recover (since a recover always finishes the defers of the current frame and then resumes execution with the next outer frame). The abort rule is exactly related to the comment by @rsc above. If we wanted to allow the Goexit to continue, I think we would have add a bunch of mechanism (maybe some stack smashing) just to deal with this case. (The abort case is also mentioned in the Requirements section for the just-added open-coded defer proposal: https://github.com/golang/proposal/blob/master/design/34481-opencoded-defers.md#requirements )

But note that the more likely/useful case (where a panic and recover happens completely within a deferred function during a Goexit() works fine. The recover happens, and the Goexit() continues.

https://play.golang.org/p/kcQBNOAUeSa

So, for instance, it would work fine if a panic-recover happened during a Printf in a defer function (which can happen with nil pointers, etc. of the printed args).

So, I'll think about this a bit more, but I would lean toward not fixing this (because it would require a bunch of added mechanism just for this one case).

@go101
Copy link

go101 commented Sep 25, 2019

I doubt the current implementation of Goexit is to create a special panic. In @bcmills's example, the panic is recovered, but in @danscales' example, the panic is not recovered.

If this is true, I lean toward the panic created in Goexit should be always unrecoverable.

@go101
Copy link

go101 commented Sep 25, 2019

In fact, the key here is, if Goexit creates a panic, in @bcmills's example, the panic created in Goexit is suppressed by the new panic created in defer panic("cancelled Goexit!"), which is consequently recovered. (If the suppression doesn't happen, the the recover will not happen too, for the panic created in Goexit is unrecoverable and will not cause crashing.) So, if the issue needs to be fixed, then we should avoid letting the new panic created in defer panic("cancelled Goexit!") suppress the special panic created in Goexit.

@bcmills
Copy link
Contributor Author

bcmills commented Sep 25, 2019

note that the more likely/useful case (where a panic and recover happens completely within a deferred function during a Goexit() works fine

I agree that that's more likely, but there are realistic scenarios that could trigger this bug. For example, consider https://play.golang.org/p/vSL3LWEGpm2.

@bcmills
Copy link
Contributor Author

bcmills commented Sep 25, 2019

This example is actually a bit of a special case, because the Goexit call and the recover are in the same stack frame/function

Note that the issue can be reproduced with the Goexit, recover, and panic all in separate functions (and presumably frames):
https://play.golang.org/p/M93Kxa4aBhX

@danscales
Copy link
Contributor

I agree that that's more likely, but there are realistic scenarios that could trigger this bug. For example, consider https://play.golang.org/p/vSL3LWEGpm2.

Yes, you are demonstrating the exact panic abort semantics that I mentioned. Just to describe it again, an outer panic/Goexit is aborted if the inner panic reaches & processes any defers in the same frame or in a higher (outer) frame from the frame where the panic/Goexit happened. Once the outer panic/Goexit is aborted, it will no longer apply even if there is a recover() that recovers the inner panic.

I think you want to stop these unspecified abort semantics. Instead, you want to change the implementation so that if an inner panic is recovered in a frame at the same level or even further down in the stack from where the output panic/Goexit happened, you want the defer processing of the frames to continue under the auspices of the outer panic/Goexit.

Note that if the recover happens in a frame which is higher in the stack from the output panic/Goexit, we still want the current behavior, which is that the defers in the current frame are finished, and then "normal" execution continues in the next outer frame, which is some function called by a defer caused by the outer panic. So, the outer panic will continue as expected, but it will get to complete the running of the defer function that caused the panic.

Since the behavior of recursive panics/Goexits (including the abort semantics) was never specified in the first place, I think we have leeway on whether/how we want to fix this.

I've tried to get some feedback on improving the description of panic/recover/recursive panics in the Go language spec, but don't think we have consensus yet about whether or what to improve:

https://go-review.googlesource.com/c/go/+/189377

Feel free to add comments....

@danscales
Copy link
Contributor

Instead, you want to change the implementation so that if an inner panic is recovered in a frame at the same level or even further down in the stack from where the output panic/Goexit happened, you want the defer processing of the frames to continue under the auspices of the outer panic/Goexit.

OK, and I think the hard part about doing this is that we presumably want to clean up the stack of the inner panic and somehow make it look like we are back to the outer Goexit/panic processing. That seems really hard to do without a lot of stack gymnastics (possibly stack smashing) and/or extra stubs to allow us to get control back to the runtime after cleaning up the stack. If we don't require that the stack is cleaned up, then getting rid of the abort behavior seems more doable.

@bcmills
Copy link
Contributor Author

bcmills commented Sep 25, 2019

I think you want to stop these unspecified abort semantics. Instead, […] you want the defer processing of the frames to continue under the auspices of the outer panic/Goexit.

Maybe? Consider https://play.golang.org/p/MyBCSSorYFl: the outer function defers a recover and a cleanup function and then panics, and the cleanup function also panics independently

Intuitively, what I would expect is for the first defer to catch the first panic (“panic!”), and for the second panic (“whoops!”) to continue propagating outward, since that panic occurred after the execution of the function that performed the initial defer.

I could also reasonably see dropping the second panic entirely, since the fact that it occurred is likely a side-effect of some invariant violated as a result of the first panic: if the first panic is fixed, perhaps the second one won't occur at all.

Today, the first recover instead receives the second panic, and the details of the first panic are completely lost.

(I would call the second panic the “inner” one, but maybe I'm using “inner” and “outer” a bit differently than you are.)

@go101
Copy link

go101 commented Sep 26, 2019

If panics are maintained in a stack and only the top one can be recovered (this example shows the possibility is high), then the panics created by Goexit calls shouldn't be pushed into the panic stack. Instead, each goroutine should maintain a separated field to record the last (or the first) panic created by Goexit calls.

BTW, the following program proves that if a panic created by Goexit is at the top of the panic stack, then all other panics will also not get recovered.

package main

import "fmt"
import "runtime"

func main() {
	go func() {
		defer func() {
			fmt.Println("bb", recover()) // <nil>
		}()
		defer func() {
			defer func() {
				fmt.Println("aa", recover()) // <nil>
			}()
			defer runtime.Goexit()
			panic(2)
		}()
		panic(1)
	}()
	
	select{}
}

@go101
Copy link

go101 commented Sep 26, 2019

@ianlancetaylor
It looks gccgo and gc disagree on the last example.

@ianlancetaylor
Copy link
Contributor

@go101 Thanks for noticing that difference. I think we should work out this issue, and then we can make sure that the implementations do the same thing.

@go101
Copy link

go101 commented Sep 27, 2019

It looks the difference is, in gccgo, the panic created in Goexit will not shadow the newest panic, whereas gc will. In gccgo, the panic created in Goexit will shadow the panics before the newest panic.

@ianlancetaylor
Copy link
Contributor

Yes, but let's first figure out what is correct, and then implement that.

@go101
Copy link

go101 commented Sep 28, 2019

Personally, I think panicking and exiting should be two different properties of gorotines.

@danscales danscales self-assigned this Oct 8, 2019
@rsc rsc modified the milestones: Go1.14, Backlog Oct 9, 2019
@gopherbot
Copy link

Change https://golang.org/cl/200081 mentions this issue: runtime: ensure that Goexit cannot be aborted by a recursive panic/recover

@danscales
Copy link
Contributor

I figured out how to make sure that Goexits are not aborted by recovers that happen in defers below them... It didn't require any stack smashing or extra stubs -- just saving the pc/sp when making calls to deferred functions from the Goexit defer processing loop. See the mentioned change for the fix.

We discussed the abort behavior with respect to panic and Goexit, and are leaning toward only changing the Goexit behavior (which is the mentioned changed). As summarized well by Robert at https://go-review.googlesource.com/c/go/+/189377/7/doc/go_spec.html#6169 [paraphrased here]

From a programmer's point of view, if a defer contains a recover() call, I expect it to catch any pending panic, no matter where they are coming from. Otherwise it's impossible to write code that protects against (expected and unexpected) panics leaving a function/API entry point. So that would argue against "only recover the panic initiated in its own function".

And the abort (replace) behavior for panics exactly coalesces panics down so that we make sure that a recover in a function always stops all panics within that function or its callees.

@go101
Copy link

go101 commented Oct 10, 2019

There is another related question: should a Goexit call save the a panicking goroutine from crashing? The current gc and gccgo both do this. Personally, I think it is not very reasonable.

package main

import "runtime"
import "time"


func main() {
	go func() {
		defer runtime.Goexit() // [edit]: add the "defer"
		panic(1) // will not crash the program
	}()
	
	for {time.Sleep(100000)}
}

@bcmills
Copy link
Contributor Author

bcmills commented Oct 10, 2019

@go101, in that snippet the runtime.Goexit() exits the goroutine immediately, so the panic line is not even reached.

@bcmills
Copy link
Contributor Author

bcmills commented Oct 10, 2019

But if you defer the call to runtime.Goexit(), that does suppress the panic (https://play.golang.org/p/JeXz2KhA5f_h), and I agree that it should not.

@bcmills
Copy link
Contributor Author

bcmills commented Oct 10, 2019

A more realistic example illustrating that problem (https://play.golang.org/p/Gx4zGz5L79i):

func TestSkipAfterPanic(t *testing.T) {
	defer func() {
		if !t.Failed() {
			t.Skip("huh?")
		}
	}()

	panic("serious bug that should fail this test!")
}

@bcmills
Copy link
Contributor Author

bcmills commented Oct 11, 2019

In practice the ~only real user of runtime.Goexit is the testing package, and for that case it seems decidedly less weird for an unrecovered panic to cause a test failure than for t.Skip to cancel an unrecovered panic.

But you're right that I would need to update the double defer sandwich to account for the possibility of a Goexit resuming after a successful recover. Perhaps it will need to become the “triple defer with cheese”. 😃

Another option would be to make runtime.Goexit itself panic if the goroutine is already panicking. That would at least make the t.Skip example do something more reasonable (specifically, fail the test if it tries to Skip during an unrecovered panic).

@go101
Copy link

go101 commented Oct 11, 2019

Maybe two panic stacks are needed. One for normal panics, the other for Goexit panics. Or each non-exited function call may associate with at most two unrecovered panics, one is normal panic, the other is Goexit panic. Goexit panics are not recoverable. Either of two panics existing will make the associated function call enter its terminating execution phase,

@go101
Copy link

go101 commented Oct 11, 2019

I just modified my explanation on the panic/recover mechanism to include Goexit signals.

One reason I think Goexit signals should not shadow normal panics is that, if we put a defer runtime.Goexit() call at the beginning of a goroutine entry function, then the defrerred call acts as an ultimate recover call recover operation.

@go101
Copy link

go101 commented Oct 11, 2019

By describing it in non-optimized code:

type _panic struct {
	// Both of the two can be true.
	// causedByPanic may be changed from true
	// to false, but causedByExit may not.
	causedByPanic  bool
	causedByGoExit  bool

	headReason *_panicReason
	tailReason *_panicReason
}

type _panicReason struct {
	value interface{}
	prev  *_panicReason
}

// When a function call exits, its associating `_panic`
// will be merged with the `_panic` associating with its caller.
...
	if exitedCall.panic.causedByPanic {
		caller.panic.causedByPanic = true
		if exitedCall.panic.tailReason != nil {
			exitedCall.panic.tailReason = caller.panic.headReason
		}
		caller.panic.headReason = exitedCall.panic.headReason
		if caller.panic.tailReason == nil {
			caller.panic.tailReason = caller.panic.headReason
		}
	}
	caller.panic.causedByGoExit = caller.panic.causedByGoExit || exitedCall.panic.causedByGoExit
	if !caller.isExiting && (caller.panic.causedByGoExit || caller.panic.causedByPanic) {
		caller.startExiting()
	}
...	

[edit]: to optimize, the panic reason list can be move the the _goroutine struct as a stack (just as the current implementation), and the two bools can be replaced with bits.

@go101
Copy link

go101 commented Oct 26, 2019

@griesemer @ianlancetaylor Is it possible that gc and gccgo can make agreement on the panic and Goexist mechenism in Go 1.14? Specifically,

  1. the result of which implementation of this example is correct?
  2. should Goexit shadow normal panics?

In my opinion, the earlier the points are clarified, the better for the Go community.

@ianlancetaylor
Copy link
Contributor

I don't understand precisely what your questions are, but I agree that these questions need to be resolved in the language spec.

That said, it does not seem urgent to me. These issues have been confused for many years and nobody has cared. If we sort them out for 1.14, great. If not, OK.

@go101
Copy link

go101 commented Oct 27, 2019

I wrote a book and one chapter in it explains the panic/recover mechanism. And I plan to make it cover runtime.Goexit calls. So I very care about this. I really don't want to spread wrong explainations into the Go community.

I think the implementation is not urgent to be corrected (if the current implementation is really not perfect), but the mechanism should be settled down as earlier as possible, if there are no special difficulties to prevent it being done.

BTW, @ianlancetaylor , thanks for your explanation in another thread. I will write down the confirmed part in my book. I asked this question is because I need to finish the chapter in my book. If this is impossible in Go 1.14, then I will mention the Goexit related part is still not formalized yet., otherwise, I will wait unitl it is formalized.

@danscales
Copy link
Contributor

@go101 I think we have mostly settled that the behavior of panic and recover will stay as it is right now. My current proposed changes to the Go language spec are here (don't know if you've seen the latest changeset):

https://go-review.googlesource.com/c/go/+/189377

I/we are not trying to specify every possibility for panics/recovers in the spec, but just want to specify better when recovers actually apply (only in a defer directly called by a panicking sequence) and then describe the the typical case of recursive panics and also the behavior of when one panic can replace another panic. See also my comment on Oct 9th above on why the panic replacement behavior makes sense.

Goexit is a library routine, so it will not be specified in the language spec. However, as I mentioned above, we are planning to fix the current bug and never allow a panic/recover to cancel a Goexit. We have to discuss more about further interaction between panic and Goexit (as your example in your Sept 26 comment and examples from Bryan Mills illustrate). The fix for the current bug may make it into Go 1.14, but not definitely. Since we still have to discuss things, I'm fairly sure that no further changes in Goexit/panic/recover interaction will make it into Go 1.14.

@go101
Copy link

go101 commented Oct 28, 2019

@danscales Thanks for the clarification.

we are planning to fix the current bug and never allow a panic/recover to cancel a Goexit.

What about the inverse? Will Goexit calls cancel unrecovered panics?

@ianlancetaylor
Copy link
Contributor

I am guessing that you are asking whether defer Goexit() should recover a panic that is currently being thrown. I don't think it should. Goexit is not recover.

@go101
Copy link

go101 commented Oct 29, 2019

Yes, that is what I asked. Thanks for the clarification.

@go101
Copy link

go101 commented Nov 5, 2019

Should this following program print should be unreachable or not.
I tried tip, it prints it.

edit: sorry, the following program is a wrong example. Please see this comment for reason. ;D

package main

import "fmt"
import "runtime"

func f() {
	defer func() {
		recover()
	}()
	panic("will cancel Goexit but should not")
	runtime.Goexit()
}

func main() {
	c := make(chan struct{})
	go func() {
		defer close(c)
		f()
		fmt.Println("should be unreachable")
	}()
	<-c
}

@go101
Copy link

go101 commented Nov 5, 2019

And should the following program crash?
I tried tip. It doesn't, though I feel it should crash by @ianlancetaylor's clarification above.

package main

import "runtime"

func main() {
	c := make(chan struct{})
	go func() {
		defer close(c)
		// The Goexit signal shadows the
		// "bye" panic, but it should not.
		defer runtime.Goexit()
		panic("bye")
	}()
	<-c
}

@bcmills bcmills reopened this Nov 5, 2019
@bcmills
Copy link
Contributor Author

bcmills commented Nov 5, 2019

Should this following program print should be unreachable or not.

I agree that it should not, and I can reproduce the same behavior you observe using gotip.

@bcmills
Copy link
Contributor Author

bcmills commented Nov 5, 2019

And should the following program crash?

That one is less clear to me, although I agree that @ianlancetaylor's comment suggests that it should crash.

@ianlancetaylor
Copy link
Contributor

I agree with @bcmills.

@danscales
Copy link
Contributor

Should this following program print should be unreachable or not.
I tried tip, it prints it.

package main

import "fmt"
import "runtime"

func f() {
	defer func() {
		recover()
	}()
	panic("will cancel Goexit but should not")
	runtime.Goexit()
}

func main() {
	c := make(chan struct{})
	go func() {
		defer close(c)
		f()
		fmt.Println("should be unreachable")
	}()
	<-c
}

This is exactly performing to spec and is unrelated to the current bug. https://golang.org/ref/spec#Handling_panics and https://go-review.googlesource.com/c/go/+/189377

The panic() call starts a panic sequence. By spec, the function that called panic will never continue running normally (hence you will never reach the Goexit). If a recover occurs in a defer directly called by the panicking sequence, then the panic will be recovered, and (after finishing any remaining defers in that frame), execution of the goroutine will continue in the caller of that frame (which in this case is f).

@danscales
Copy link
Contributor

package main

import "runtime"

func main() {
	c := make(chan struct{})
	go func() {
		defer close(c)
		// The Goexit signal shadows the
		// "bye" panic, but it should not.
		defer runtime.Goexit()
		panic("bye")
	}()
	<-c
}

This is the issue that we mentioned above, which is that a Goexit can cancel a panic. I just filed a separate bug for this (#35378), and we can discuss there. As I mentioned, any fix so that a Goexit does not cancel a panic will not make it into 1.14.

@bcmills
Copy link
Contributor Author

bcmills commented Nov 5, 2019

Ah, yeah. I somehow read the panic("will cancel Goexit […]") as defer panic([…]) instead. 😞

@go101
Copy link

go101 commented Nov 6, 2019

Ah, sorry, I also acted as it is a deferred call. ;D

@go101
Copy link

go101 commented Apr 5, 2020

It looks this problem has been fixed in Go SDK 1.14. Should it be closed?

@danscales
Copy link
Contributor

Yes, let's close. This was fixed by my change 7dcd343 . I already opened a separate issue #35378 about the converse situation that a Goexit call can cancel a panic.

@golang golang locked and limited conversation to collaborators Apr 5, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

8 participants