Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: unexpected goroutine starvation #7126

Closed
shanemhansen opened this issue Jan 14, 2014 · 14 comments
Closed

runtime: unexpected goroutine starvation #7126

shanemhansen opened this issue Jan 14, 2014 · 14 comments
Milestone

Comments

@shanemhansen
Copy link
Contributor

When running the program linked below (where 1 goroutine does an infinite loop of
os.File.Write calls and the main goroutine sleeps for one second before exiting), I
found very unexpected behaviour, the main goroutine either never gets woken up or
doesn't get woken up for 30s to 1m. I discussed this on #go-nuts and both Tv` and
dominikh thought this was an actual bug.

What steps will reproduce the problem?
1. Download http://play.golang.org/p/r-CDnGvo9E as foo.go
2. go run foo.go 2>/dev/null
3. Wait anywhere from 1s to over 1 minute. It's hard to reproduce. The main goroutine
isn't always scheduled post-sleep.

What is the expected output? The program should run in about 1 second and output:
sleeping
done

What do you see instead? The program sometimes fails to finish and outputs just:
sleeping

Which compiler are you using? 6l

Which operating system are you using? uname -a
Linux bazinga 3.11.0-12-generic #19-Ubuntu SMP Wed Oct 9 16:20:46 UTC 2013 x86_64 x86_64
x86_64 GNU/Linux


Which version are you using?  (run 'go version')
go version go1.2 linux/amd64

Example:

shane@bazinga:~$ time go run foo.go 2>/dev/null 
sleeping
done

real    0m3.531s
user    0m1.909s
sys 0m1.676s
shane@bazinga:~$ time go run foo.go 2>/dev/null 
sleeping
done

real    0m28.316s
user    0m14.006s
sys 0m14.332s
@shanemhansen
Copy link
Contributor Author

Comment 1:

I'm aware of the preemption changes in go1.2. I'm sure that I'm calling a function.
Also, the problem completely disappears if I run with GOMAXPROCS=4.

@davecheney
Copy link
Contributor

Comment 2:

As I understand it, the scheduler will bias towards the active goroutine (the one that
just returned to the scheduler) if it looks like there is more work to do. So, in tight
loops it is possible that this goroutine is observed to spin and starve others.
Is this a contrived example or does it come from a larger program ?
You could try stracing the program and seeing if those os.Stderr writes even make it to
syscalls.
Assigning to Dmitry for his comments, I have not assigned this to a release.

Labels changed: added release-none, repo-main.

Owner changed to @dvyukov.

Status changed to Accepted.

@shanemhansen
Copy link
Contributor Author

Comment 3:

I ran into this while writing a benchmark for a real logging library, so I guess that
makes it contrived in that I don't expect a production scenario wherein we'd be using
this library in a tight loop.
I straced, and a least one goroutine is writing to stderr at a syscall level. The
problem is that time.Sleep never "returns".

@davecheney
Copy link
Contributor

Comment 4:

The strict reading of time.Sleep(duration) says that the sleep will return *after* that
time, and uses a liberal definition of after.
Can you post the benchmark you were working on, perhaps that can be altered to avoid the
issue.

@shanemhansen
Copy link
Contributor Author

Comment 5:

Here is the benchmark with the (unfortunately not mine to post) logging library omitted.
I'm logging to 5 different files at maximum speed, and trying to cleanly shut the system
down by Close()'ing the files out from under the goroutines. The time.Sleep() call isn't
essential, but being able to shutdown within a reasonable amount of time is.
http://play.golang.org/p/h8d_so7MLS

@minux
Copy link
Member

minux commented Jan 15, 2014

Comment 6:

IMO, the program itself is buggy.
it has two goroutines loops endlessly, which prevents the 3rd (main) goroutine to execute
when GOMAXPROCS=1.
the reason is used to work is that the we originally treated all Write syscalls as
blocking,
however, the new scheduler will treat syscalls that finish in a very short amount of time
as nonblocking (so execute Write like this won't cause the scheduler to schedule another
goroutine, hence the starvation).
This scheduler behavior change is good for throughput, but not that good for latency.
Either fixing the multiple ready gorutines problem or increasing GOMAXPROCS is the
correct way to go.

@davecheney
Copy link
Contributor

Comment 7:

Why don't you write this is a testing.B benchmark rather than reinventing the wheel. I'm
also concerned that this will end up benchmarking os.Write or your io subsystem.

@shanemhansen
Copy link
Contributor Author

Comment 8:

long comment deleted. Thanks to those who helped fix my code to avoid the undesired
behavior.
Feel free to close this bug as invalid if go is working as desired. I'm concerned that
optimizing for throughput
to the extent that a timer is called 30x to 90x later than requested even when the tight
loop has both I/O and
function calls (2 things that I think of as opportunities to switch goroutines) will
make writing working code
much harder than it should be.
Maybe this is actually a feature request "Wakeup from sleep within an order of magnitude
of the time
requested even if other goroutines are doing I/O".

@bradfitz
Copy link
Contributor

Comment 9:

Status changed to Retracted.

@davecheney
Copy link
Contributor

Comment 10:

Hi Brad,
I talked to Shane a little offline about this issue and I would like to leave it open
for the moment to get Dmitry's point of view. It may be that this is not a case that the
scheduler is targeting and Shane understands this.
Cheers
Dave

Status changed to Accepted.

@dvyukov
Copy link
Member

dvyukov commented Jan 16, 2014

Comment 11:

This is not an issue with preemption, the control periodically passes through scheduler
due to periodic garbage collections. But scheduler fails to schedule all goroutines
fairly.
I've mail https://golang.org/cl/53080043/ with a fix.
Shane, the program causes extremely frequent garbage collections (heap size is almost
zero, and the loop generates garbage). You can increase portion of time spent in the
logging library (rather than GC) by adding some garbage ballast, e.g.
var ballast = make([]byte, 64<<20)
This will reduce frequency of GCs tremendously.

Labels changed: added release-go1.3, removed release-none.

@dvyukov
Copy link
Member

dvyukov commented Jan 16, 2014

Comment 12:

Status changed to Started.

@shanemhansen
Copy link
Contributor Author

Comment 13:

Verified Works for me on go1.3 w patch applied.
Verified ballast suggestion works for 1.2.

@dvyukov
Copy link
Member

dvyukov commented Jan 21, 2014

Comment 14:

This issue was closed by revision 90eca36.

Status changed to Fixed.

@rsc rsc added this to the Go1.3 milestone Apr 14, 2015
@rsc rsc removed the release-go1.3 label Apr 14, 2015
@golang golang locked and limited conversation to collaborators Jun 25, 2016
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants