runtime: unexpected goroutine starvation #7126

shanemhansen · 2014-01-14T23:30:50Z

When running the program linked below (where 1 goroutine does an infinite loop of
os.File.Write calls and the main goroutine sleeps for one second before exiting), I
found very unexpected behaviour, the main goroutine either never gets woken up or
doesn't get woken up for 30s to 1m. I discussed this on #go-nuts and both Tv` and
dominikh thought this was an actual bug.

What steps will reproduce the problem?
1. Download http://play.golang.org/p/r-CDnGvo9E as foo.go
2. go run foo.go 2>/dev/null
3. Wait anywhere from 1s to over 1 minute. It's hard to reproduce. The main goroutine
isn't always scheduled post-sleep.

What is the expected output? The program should run in about 1 second and output:
sleeping
done

What do you see instead? The program sometimes fails to finish and outputs just:
sleeping

Which compiler are you using? 6l

Which operating system are you using? uname -a
Linux bazinga 3.11.0-12-generic #19-Ubuntu SMP Wed Oct 9 16:20:46 UTC 2013 x86_64 x86_64
x86_64 GNU/Linux


Which version are you using?  (run 'go version')
go version go1.2 linux/amd64

Example:

shane@bazinga:~$ time go run foo.go 2>/dev/null 
sleeping
done

real    0m3.531s
user    0m1.909s
sys 0m1.676s
shane@bazinga:~$ time go run foo.go 2>/dev/null 
sleeping
done

real    0m28.316s
user    0m14.006s
sys 0m14.332s

shanemhansen · 2014-01-14T23:31:48Z

Comment 1:

I'm aware of the preemption changes in go1.2. I'm sure that I'm calling a function.
Also, the problem completely disappears if I run with GOMAXPROCS=4.

davecheney · 2014-01-15T00:09:57Z

Comment 2:

As I understand it, the scheduler will bias towards the active goroutine (the one that
just returned to the scheduler) if it looks like there is more work to do. So, in tight
loops it is possible that this goroutine is observed to spin and starve others.
Is this a contrived example or does it come from a larger program ?
You could try stracing the program and seeing if those os.Stderr writes even make it to
syscalls.
Assigning to Dmitry for his comments, I have not assigned this to a release.

Labels changed: added release-none, repo-main.

Owner changed to @dvyukov.

Status changed to Accepted.

shanemhansen · 2014-01-15T00:17:58Z

Comment 3:

I ran into this while writing a benchmark for a real logging library, so I guess that
makes it contrived in that I don't expect a production scenario wherein we'd be using
this library in a tight loop.
I straced, and a least one goroutine is writing to stderr at a syscall level. The
problem is that time.Sleep never "returns".

davecheney · 2014-01-15T00:27:22Z

Comment 4:

The strict reading of time.Sleep(duration) says that the sleep will return *after* that
time, and uses a liberal definition of after.
Can you post the benchmark you were working on, perhaps that can be altered to avoid the
issue.

shanemhansen · 2014-01-15T00:37:12Z

Comment 5:

Here is the benchmark with the (unfortunately not mine to post) logging library omitted.
I'm logging to 5 different files at maximum speed, and trying to cleanly shut the system
down by Close()'ing the files out from under the goroutines. The time.Sleep() call isn't
essential, but being able to shutdown within a reasonable amount of time is.
http://play.golang.org/p/h8d_so7MLS

minux · 2014-01-15T00:48:29Z

Comment 6:

IMO, the program itself is buggy.
it has two goroutines loops endlessly, which prevents the 3rd (main) goroutine to execute
when GOMAXPROCS=1.
the reason is used to work is that the we originally treated all Write syscalls as
blocking,
however, the new scheduler will treat syscalls that finish in a very short amount of time
as nonblocking (so execute Write like this won't cause the scheduler to schedule another
goroutine, hence the starvation).
This scheduler behavior change is good for throughput, but not that good for latency.
Either fixing the multiple ready gorutines problem or increasing GOMAXPROCS is the
correct way to go.

davecheney · 2014-01-15T01:28:17Z

Comment 7:

Why don't you write this is a testing.B benchmark rather than reinventing the wheel. I'm
also concerned that this will end up benchmarking os.Write or your io subsystem.

shanemhansen · 2014-01-15T03:40:11Z

Comment 8:

long comment deleted. Thanks to those who helped fix my code to avoid the undesired
behavior.
Feel free to close this bug as invalid if go is working as desired. I'm concerned that
optimizing for throughput
to the extent that a timer is called 30x to 90x later than requested even when the tight
loop has both I/O and
function calls (2 things that I think of as opportunities to switch goroutines) will
make writing working code
much harder than it should be.
Maybe this is actually a feature request "Wakeup from sleep within an order of magnitude
of the time
requested even if other goroutines are doing I/O".

bradfitz · 2014-01-15T23:12:44Z

Comment 9:

Status changed to Retracted.

davecheney · 2014-01-15T23:44:59Z

Comment 10:

Hi Brad,
I talked to Shane a little offline about this issue and I would like to leave it open
for the moment to get Dmitry's point of view. It may be that this is not a case that the
scheduler is targeting and Shane understands this.
Cheers
Dave

Status changed to Accepted.

dvyukov · 2014-01-16T08:40:19Z

Comment 11:

This is not an issue with preemption, the control periodically passes through scheduler
due to periodic garbage collections. But scheduler fails to schedule all goroutines
fairly.
I've mail https://golang.org/cl/53080043/ with a fix.
Shane, the program causes extremely frequent garbage collections (heap size is almost
zero, and the loop generates garbage). You can increase portion of time spent in the
logging library (rather than GC) by adding some garbage ballast, e.g.
var ballast = make([]byte, 64<<20)
This will reduce frequency of GCs tremendously.

Labels changed: added release-go1.3, removed release-none.

dvyukov · 2014-01-16T08:40:27Z

Comment 12:

Status changed to Started.

shanemhansen · 2014-01-16T16:52:25Z

Comment 13:

Verified Works for me on go1.3 w patch applied.
Verified ballast suggestion works for 1.2.

dvyukov · 2014-01-21T06:25:07Z

Comment 14:

This issue was closed by revision 90eca36.

Status changed to Fixed.

shanemhansen added fixed labels Jan 21, 2014

rsc added this to the Go1.3 milestone Apr 14, 2015

rsc removed the release-go1.3 label Apr 14, 2015

dvyukov mentioned this issue May 17, 2016

runtime: timerproc does not get to run under load #15706

Closed

golang locked and limited conversation to collaborators Jun 25, 2016

gopherbot added the FrozenDueToAge label Jun 25, 2016

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime: unexpected goroutine starvation #7126

runtime: unexpected goroutine starvation #7126

shanemhansen commented Jan 14, 2014

shanemhansen commented Jan 14, 2014

davecheney commented Jan 15, 2014

shanemhansen commented Jan 15, 2014

davecheney commented Jan 15, 2014

shanemhansen commented Jan 15, 2014

minux commented Jan 15, 2014

davecheney commented Jan 15, 2014

shanemhansen commented Jan 15, 2014

bradfitz commented Jan 15, 2014

davecheney commented Jan 15, 2014

dvyukov commented Jan 16, 2014

dvyukov commented Jan 16, 2014

shanemhansen commented Jan 16, 2014

dvyukov commented Jan 21, 2014

runtime: unexpected goroutine starvation #7126

runtime: unexpected goroutine starvation #7126

Comments

shanemhansen commented Jan 14, 2014

shanemhansen commented Jan 14, 2014

davecheney commented Jan 15, 2014

shanemhansen commented Jan 15, 2014

davecheney commented Jan 15, 2014

shanemhansen commented Jan 15, 2014

minux commented Jan 15, 2014

davecheney commented Jan 15, 2014

shanemhansen commented Jan 15, 2014

bradfitz commented Jan 15, 2014

davecheney commented Jan 15, 2014

dvyukov commented Jan 16, 2014

dvyukov commented Jan 16, 2014

shanemhansen commented Jan 16, 2014

dvyukov commented Jan 21, 2014