Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: scheduler randomness lost #29961

Closed
ardan-bkennedy opened this issue Jan 28, 2019 · 11 comments
Closed

runtime: scheduler randomness lost #29961

ardan-bkennedy opened this issue Jan 28, 2019 · 11 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@ardan-bkennedy
Copy link

What version of Go are you using (go version)?

1.12 beta 2

Does this issue reproduce with the latest release?

Only seeing it with 1.12 beta 2

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOOS="darwin"

What did you do?

I've been using this program in training for years. It shows how random the scheduler is with context switches and it can't be predicted.

https://github.com/ardanlabs/gotraining/blob/master/topics/go/concurrency/goroutines/example2/example2.go

Just build the program and run it several times. On 1.11 or earlier, you will see the B goroutine context switches as an unpredictable prime number each time. Sometimes you get 4 context switches before it is complete. It is very rare not to get one before the work for any goroutine is complete. On 1.12, these context switches are gone. I am seeing this behavior with other programs I have. This one is the easiest to run and see the change.

What did you expect to see?

Create Goroutines
Waiting To Finish
B:2
B:3
B:5
B:7
. . .
B:4139
B:4153
B:4157
A:2 <- Unpredictable context switch
A:3
A:5
A:7
. . .
A:4231
A:4241
A:4243
B:4159 <- Unpredictable context switch
B:4177
B:4201
. . .
Completed B
. . .
A:4987
A:4993
A:4999
Completed A

What did you see instead?

Every time I run it there is no context switch. The B goroutine finishes completely and then the A goroutine runs. This happens every single time.

B:2
B:3
B:5
B:7
B:11
...
B:4987
B:4993
B:4999
Completed B
A:2
A:3
A:5
A:7
. . .
A:4987
A:4993
A:4999
Completed A

@agnivade agnivade changed the title 1.12 : Scheduler Randomness Lost runtime: scheduler randomness lost Jan 28, 2019
@agnivade
Copy link
Contributor

/cc @aclements @mknyszek

@agnivade agnivade added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jan 28, 2019
@randall77
Copy link
Contributor

I don't see this issue on linux/amd64. More often than not, the B goroutine finishes before A starts, but certainly not always.

@ardan-bkennedy
Copy link
Author

Historically on my Mac the reverse was true. There was at least one context switch of B to A before B finished. Sometimes more. Now on my Mac that context switch never materializes. I’ve seen the same lack of context switches in other concurrent code.
:(

@josharian
Copy link
Contributor

Since others can’t reproduce, I suggest doing a git bisect, in the hopes that identifying the offending commit provides some insight.

@jakebailey
Copy link

jakebailey commented Jan 29, 2019

I was curious after seeing this issue, and was able to reproduce this on my machine (comparing master and 1.11.5 with example2.go), so I did a bisect. Maybe unsurprisingly...

a3b01440fef3d2833909f6651455924a1c86d192 is the first bad commit
commit a3b01440fef3d2833909f6651455924a1c86d192
Author: Keith Randall <khr@golang.org>
Date:   Mon Sep 24 07:13:36 2018 -0700

    syscall: implement syscalls on Darwin using libSystem
    
    There are still some references to the bare Syscall functions
    in the stdlib. I will root those out in a following CL.
    (This CL is big enough as it is.)
    Most are in vendor directories:
    
    cmd/vendor/golang.org/x/sys/unix/
    vendor/golang_org/x/net/route/syscall.go
    syscall/bpf_bsd.go
    syscall/exec_unix.go
    syscall/flock.go
    
    Update #17490
    
    Change-Id: I69ab707811530c26b652b291cadee92f5bf5c1a4
    Reviewed-on: https://go-review.googlesource.com/c/141639
    Run-TryBot: Keith Randall <khr@golang.org>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    Reviewed-by: Elias Naur <elias.naur@gmail.com>
    Reviewed-by: Ian Lance Taylor <iant@golang.org>

:040000 040000 bb10031dcf6ee1f1bf801e542ae0d2cb62b95701 7cb86a18ad76ad148d2a67817413c30460c2d3a8 M      src
$ git bisect log
git bisect start
# bad: [c00595cec55526f30b84903d9472d7f63f9c447d] doc: mention init traceback change in Go 1.12 release notes
git bisect bad c00595cec55526f30b84903d9472d7f63f9c447d
# good: [4039be00a9e77bd4080ac657a940472341fa088f] image: add benchmarks for At and Set methods
git bisect good 4039be00a9e77bd4080ac657a940472341fa088f
# bad: [4ae49b592110055d2461c708bc2b85b049deb535] cmd/compile: use ANDCC, ORCC, XORCC to avoid CMP on ppc64x
git bisect bad 4ae49b592110055d2461c708bc2b85b049deb535
# good: [e0a97a59286f00b511ecacd1f4011bac348ab60f] cmd/doc: make comments inside functions appear with -src
git bisect good e0a97a59286f00b511ecacd1f4011bac348ab60f
# good: [7836457ec3cc128efd9dd54197e8e5c25408c8b0] runtime: add physical memory scavenging test
git bisect good 7836457ec3cc128efd9dd54197e8e5c25408c8b0
# good: [15265ec4217f8d9497d8d5079ac8996302dfd007] cmd/compile: avoid duplicate GC bitmap symbols
git bisect good 15265ec4217f8d9497d8d5079ac8996302dfd007
# good: [35c05542938416cde6a366505c24568ea5ccd98e] cmd/asm: rename R18 to R18_PLATFORM on ARM64
git bisect good 35c05542938416cde6a366505c24568ea5ccd98e
# bad: [7da1f7addf54e55c7b40c1a7bffe5a64989154d8] net: simplify nested if-blocks
git bisect bad 7da1f7addf54e55c7b40c1a7bffe5a64989154d8
# good: [0fcd40503b41b84beb8d355615d1ad5ebc4eae57] go/types: avoid certain problems with recursive alias type declarations
git bisect good 0fcd40503b41b84beb8d355615d1ad5ebc4eae57
# bad: [67018e9b684d1a9686a37053307476cd32358c17] cmd/go: document the go statement in a go.mod file
git bisect bad 67018e9b684d1a9686a37053307476cd32358c17
# bad: [0b071e40195c7f202933d3b3ec3da3b4f40cd472] cmd/go: add /v2 to go.mod require example
git bisect bad 0b071e40195c7f202933d3b3ec3da3b4f40cd472
# bad: [a3b01440fef3d2833909f6651455924a1c86d192] syscall: implement syscalls on Darwin using libSystem
git bisect bad a3b01440fef3d2833909f6651455924a1c86d192
# first bad commit: [a3b01440fef3d2833909f6651455924a1c86d192] syscall: implement syscalls on Darwin using libSystem

@jakebailey
Copy link

Quick link to the CL (since I forgot it): https://golang.org/cl/141639

@jakebailey
Copy link

Just to make it super easy to run (instead of scrolling through 10,000 prints), you can use uniq to track which goroutine is running:

$ ~/gosrc/bin/go version; ~/gosrc/bin/go run example2.go | cut -c1 | grep '[AB]' | uniq
go version devel +c00595cec5 Mon Jan 28 20:35:33 2019 +0000 darwin/amd64
B
A

$ ~/gosrc/bin/go version; ~/gosrc/bin/go run example2.go | cut -c1 | grep '[AB]' | uniq
go version devel +4039be00a9 Mon Sep 24 17:44:11 2018 +0000 darwin/amd64
B
A
B

The first shows a recent commit doing all of B, then all of A, while the second shows a much older commit doing some of B, then all of A, then the rest of B.

@ardan-bkennedy
Copy link
Author

Thank you Jake. I honestly didn't know how to do that.

@robaho
Copy link

robaho commented Feb 5, 2019

Wouldn't a trace file make this easier to understand? And they should be running concurrently, unless GOMAXPROCS=1 and then any switch would only happen due to the serialization around the Print - so maybe the Print lock is not fair in the latest release ?

@agnivade agnivade added this to the Unplanned milestone Mar 8, 2019
@ardan-bkennedy
Copy link
Author

This has been fixed on tip. My guess is because of the changes to the schedule.

@ardan-bkennedy
Copy link
Author

This is not an issue any longer.

@golang golang locked and limited conversation to collaborators Jun 12, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

7 participants