Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: runtime: allow programs to handle deadlock events #44026

Closed
cwmos opened this issue Jan 31, 2021 · 10 comments
Closed

proposal: runtime: allow programs to handle deadlock events #44026

cwmos opened this issue Jan 31, 2021 · 10 comments

Comments

@cwmos
Copy link

cwmos commented Jan 31, 2021

Summary

It would be great if a Go program could detect when all (other) Goroutines are blocked and then allow it to take action to do something about the situation. Given that Go has a deadlock detector, it seems like this situation is already detected by the Go runtime.

Use case

I want to use this to implement fake time that:

  • Advances as fast as it can given the actual CPU resources available
  • To the program that runs, makes it look like infinite CPU resources are available.

Implementing such fake time can basically be done as follows:

Loop:

  • Wait until all (other) Goroutines are blocked
  • Advance fake time to a point where a new event happens. For example, to a point where a channel returned by a fake version of time.After gets a new value written to it or a fake version of time.Sleep returns.

This is something I have previously successfully implemented in other programming languages and that has been very useful. There are several use cases of this including:

  • Test timeouts quickly and accurately: Some programs have various (long) timeouts. To unit test those, it is an advantage to be able to advance time quickly and accurately.
  • Test distributed systems: You may want to run a big distributed system written in Go in a single Go process running on a single computer using a fake network. There may not be enough CPU to do this in real time. Fake time as described here will allow this to run using all the CPU available and time will just progress as fast as it can. Fake time may then at any instant be either slower or faster than real time. I believe that this network simulator does something similar: https://www.nsnam.org/docs/dce/manual/html/getting-started.html
  • Debug code that uses timers: Allow breakpoints to be set while debugging code that uses timers. Time and timers will automatically stop while the program is stopped in a breakpoint and resume when the program is started again.
  • Detect unintended busy loops that only lasts for a limited amount of time: If fake time is used, such busy loop will run forever which will be easily detectable. I have found several bugs like this during the years by using fake time.

Some ways to make the feature

  • Make a version of runtime.gosched that only returns when all other Goroutines are blocked.
  • Make it possible to read the number of Goroutines that are not blocked. You could then make a loop where you call runtime.gosched until that value is 0.
  • Make it possible to start a special Goroutine when the system is deadlocked.

A nuance

What to do if no Goroutine can run now, but one or more Goroutines are pseudo-blocked, for example:

  • Waiting on external events, for example a socket
  • Stuck in a cgo call
  • Waiting on one or more channels returned by the real non-fake time.After function.

UPDATED: In a simple approach where the existing deadlock detector is used, any pseudo-blocked Goroutine would be considered non-blocked, and hence prevent fake time from increasing. The feature would still be very useful. However, a probably even more useful approach would be to consider pseudo-blocked Goroutines blocked. In that case, fake time would increase even in case of pseudo-blocked Go routines. If a program wants to prevent fake time from increasing because of a pseudo-blocked Goroutine it knows of, it can just prevent that manually (some details would need to be considered to make sure this can work correctly).

An off topic note

Should fake time not be an OS feature? Or a feature of a virtual machine hypervisor? Yes, I think absolutely it should!

@seankhliao seankhliao changed the title Allow a Go program to detect when all Goroutines are blocked proposal: allow programs to handle deadlock events Jan 31, 2021
@gopherbot gopherbot added this to the Proposal milestone Jan 31, 2021
@seankhliao
Copy link
Member

see also #8869

@ianlancetaylor ianlancetaylor added this to Incoming in Proposals (old) Feb 1, 2021
@ianlancetaylor
Copy link
Contributor

The majority of real Go programs read from the network or from pipes. Many also use timers. Some use the os/signal package to look for signals. Any Go program that does any of those things can never deadlock.

The Go runtime already supports fake time, although it is not fully supported. See https://golang.org/src/runtime/time_fake.go. That may be a more profitable approach than focusing on detecting deadlocks.

@beoran
Copy link

beoran commented Feb 1, 2021

I don't think the fake time idea is very interesting, but having a way to try to resolve deadlock does seems to be useful, at least if we had a way to terminate the blocked goroutines, which IIRC, we don't have. Otherwise something like a runtime.SetDeadlockCallback(callback func(goroutineID ...int)) could be useful.

@cwmos
Copy link
Author

cwmos commented Feb 2, 2021

The majority of real Go programs read from the network or from pipes. Many also use timers. Some use the os/signal package to look for signals. Any Go program that does any of those things can never deadlock.

I think you have an important point about networking. I think the feature will still be useful in case networking is used: First, networking may be mocked in code during testing. Second, I have added some more text in the proposal under "UPDATED" with ideas on how to address this.

Regarding timers, a central idea is that the Go program should use fake versions of timers. Hence real native Go timers would not be used and hence not prevent deadlock detection.

@cwmos
Copy link
Author

cwmos commented Mar 12, 2021

Since I opened this issue I have used Go some more. I still think the requested feature would be extremely useful to mock time in tests as I have described previously. However, I want to mention a simpler application.

Suppose I have written this program:

type Result interface {
	Give(int)
}

func multiply(a, b int, result Result) {
	go func() {
		result.Give(a * b)
	}()
}

Suppose I want to test the multiply function using gomock. After having generated mocks for the Result interface, I could try to do it like this:

func TestMultiply(t *testing.T) {
	ctrl := gomock.NewController(t)
	defer ctrl.Finish()
	m := mock.NewMockResult(ctrl)
	m.EXPECT().Give(4)
	multiply(2, 2, m)
}

Unfortunately there is a race and the test may fail or succeed depending on whether the Give function is called before the test ends or not.

If we have the feature of this issue I could just add the line:

runtime.WaitForDeadlock()

at the end of the TestMultiply function and there will be no races.

Now, of course the problem can be solved by using channels or WaitGroups in the test (see for example https://medium.com/@poy/gomock-and-go-routines-6a7c01d989d5). However, that is cumbersome. Further, if multiply makes additional calls to Give after the test has finished, the test will still succeed even though it should not.

@rsc
Copy link
Contributor

rsc commented Jul 14, 2021

This seems like a fairly invasive, subtle runtime change that would rarely be useful in practice (because, as others have noted, programs with network listeners or readers never deadlock). It does not seem worth the weight.

@rsc rsc changed the title proposal: allow programs to handle deadlock events proposal: runtime: allow programs to handle deadlock events Jul 14, 2021
@cwmos
Copy link
Author

cwmos commented Jul 14, 2021

@rsc , yes this is really for unit tests and simulations where all network activity is mocked or faked so that no real sockets are used.

@rsc
Copy link
Contributor

rsc commented Jul 21, 2021

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc rsc moved this from Incoming to Active in Proposals (old) Jul 21, 2021
@rsc
Copy link
Contributor

rsc commented Jul 28, 2021

Based on the discussion above, this proposal seems like a likely decline.
— rsc for the proposal review group

@rsc rsc moved this from Active to Likely Decline in Proposals (old) Jul 28, 2021
@rsc rsc moved this from Likely Decline to Declined in Proposals (old) Aug 4, 2021
@rsc
Copy link
Contributor

rsc commented Aug 4, 2021

No change in consensus, so declined.
— rsc for the proposal review group

@rsc rsc closed this as completed Aug 4, 2021
@golang golang locked and limited conversation to collaborators Aug 4, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
No open projects
Development

No branches or pull requests

6 participants