Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

context: misuse of sync.Cond in ExampleAfterFunc_cond #62180

Closed
bcmills opened this issue Aug 21, 2023 · 6 comments
Closed

context: misuse of sync.Cond in ExampleAfterFunc_cond #62180

bcmills opened this issue Aug 21, 2023 · 6 comments
Assignees
Labels
Documentation NeedsFix The path to resolution is known, but the work has not been done. Testing An issue that has been verified to require only test changes, not just a test failure.
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented Aug 21, 2023

#!watchflakes
post <- `^goroutine \d+ \[sync\.Cond\.Wait, .*\]:` && `^context_test\.ExampleAfterFunc_cond`

(Pulled out from #62165 (comment).)

https://build.golang.org/log/6040fd74316c6fe15adcca97b7b68a471db61575:

goroutine 1 [sync.Cond.Wait, 14 minutes]:
runtime.gopark(0x40002800e0?, 0x40000021a0?, 0x88?, 0x59?, 0xc731c?)
	/home/gopher/build/go/src/runtime/proc.go:398 +0xc8 fp=0x4000235940 sp=0x4000235920 pc=0x4a178
runtime.goparkunlock(...)
	/home/gopher/build/go/src/runtime/proc.go:404
sync.runtime_notifyListWait(0x40000ce010, 0x0)
	/home/gopher/build/go/src/runtime/sema.go:527 +0x150 fp=0x4000235990 sp=0x4000235940 pc=0x78580
sync.(*Cond).Wait(0x40000ce000)
	/home/gopher/build/go/src/sync/cond.go:70 +0xcc fp=0x40002359d0 sp=0x4000235990 pc=0x8093c
context_test.ExampleAfterFunc_cond.func1({0x1d3bf8, 0x40000b00e0}, 0x40000ce000)
	/home/gopher/build/go/src/context/example_test.go:131 +0xa8 fp=0x4000235a30 sp=0x40002359d0 pc=0x1469d8
context_test.ExampleAfterFunc_cond()
	/home/gopher/build/go/src/context/example_test.go:142 +0x120 fp=0x4000235ab0 sp=0x4000235a30 pc=0x13f190
testing.runExample({{0x19d3c8, 0x15}, 0x1a78e0, {0x19f322, 0x1a}, 0x0})
	/home/gopher/build/go/src/testing/run_example.go:63 +0x244 fp=0x4000235bb0 sp=0x4000235ab0 pc=0xd58f4
testing.runExamples(0x1d4c78?, {0x2ca040, 0x7, 0x25?})
	/home/gopher/build/go/src/testing/example.go:40 +0x104 fp=0x4000235c50 sp=0x4000235bb0 pc=0xd1ad4
testing.(*M).Run(0x400011a280)
	/home/gopher/build/go/src/testing/testing.go:1927 +0x5e4 fp=0x4000235ea0 sp=0x4000235c50 pc=0xdb404
main.main()
	_testmain.go:149 +0x1a8 fp=0x4000235f30 sp=0x4000235ea0 pc=0x147888
runtime.main()
	/home/gopher/build/go/src/runtime/proc.go:267 +0x2b8 fp=0x4000235fd0 sp=0x4000235f30 pc=0x49d18
runtime.goexit()
	/home/gopher/build/go/src/runtime/asm_arm64.s:1197 +0x4 fp=0x4000235fd0 sp=0x4000235fd0 pc=0x7c4c4

This test / example was added in https://go.dev/cl/482695 (attn @dneil @Sajmani) for #57928; it's new as of Go 1.21.

Given that this example involves a sync.Cond and a context.WithTimeout, I'm about 98% confident that the bug is in the example somewhere. 😅

@bcmills bcmills added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Aug 21, 2023
@bcmills bcmills added this to the Go1.22 milestone Aug 21, 2023
@gopherbot
Copy link

Found new dashboard test flakes for:

#!watchflakes
post <- `^goroutine \d+ \[sync\.Cond\.Wait, .*\]:` && `^context_test\.ExampleAfterFunc_cond`
2023-08-07 22:48 openbsd-arm64-jsing go@b9153f6e context (log)
panic: test timed out after 15m0s

goroutine 304 [running]:
panic({0x161860?, 0x40001c4010?})
	/home/gopher/build/go/src/runtime/panic.go:621 +0x140 fp=0x4000047f00 sp=0x4000047e50 pc=0x46800
testing.(*M).startAlarm.func1()
	/home/gopher/build/go/src/testing/testing.go:2259 +0x30c fp=0x4000047fd0 sp=0x4000047f00 pc=0xdd86c
runtime.goexit()
	/home/gopher/build/go/src/runtime/asm_arm64.s:1197 +0x4 fp=0x4000047fd0 sp=0x4000047fd0 pc=0x7c4c4
created by time.goFunc
	/home/gopher/build/go/src/time/sleep.go:177 +0x38

watchflakes

@bcmills
Copy link
Contributor Author

bcmills commented Aug 21, 2023

@gopherbot, please backport to Go 1.21. This example in the context documentation for an API added in Go 1.21 demonstrates incorrect use of a sync.Cond.

@bcmills bcmills added Testing An issue that has been verified to require only test changes, not just a test failure. NeedsFix The path to resolution is known, but the work has not been done. labels Aug 21, 2023
@gopherbot
Copy link

Backport issue(s) opened: #62189 (for 1.21).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://go.dev/wiki/MinorReleases.

@gopherbot gopherbot removed the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Aug 21, 2023
@bcmills bcmills changed the title context: apparent deadlock in ExampleAfterFunc_cond context: misuse of sync.Cond in ExampleAfterFunc_cond Aug 21, 2023
@bcmills bcmills changed the title context: misuse of sync.Cond in ExampleAfterFunc_cond context: misuse of sync.Cond in ExampleAfterFunc_cond Aug 21, 2023
@bcmills bcmills self-assigned this Aug 21, 2023
@gopherbot
Copy link

Change https://go.dev/cl/521596 mentions this issue: context: fix synchronization in ExampleAfterFunc_cond

@bcmills
Copy link
Contributor Author

bcmills commented Aug 21, 2023

I will send a fix. Further evidence that sync.Cond should be avoided.

@gopherbot
Copy link

Change https://go.dev/cl/521598 mentions this issue: [release-branch.go1.21] context: fix synchronization in ExampleAfterFunc_cond

cellularmitosis pushed a commit to cellularmitosis/go that referenced this issue Aug 24, 2023
Condition variables are subtle and error-prone, and this example
demonstrates exactly the sorts of problems that they introduce.
Unfortunately, we're stuck with them for the foreseeable future.

As previously implemented, this example was racy: since the callback
passed to context.AfterFunc did not lock the mutex before calling
Broadcast, it was possible for the Broadcast to occur before the
goroutine was parked in the call to Wait, causing in a missed wakeup
resulting in deadlock.

The example also had a more insidious problem: it was not safe for
multiple goroutines to call waitOnCond concurrently, but the whole
point of using a sync.Cond is generally to synchronize concurrent
goroutines. waitOnCond must use Broadcast to ensure that it wakes up
the target goroutine, but the use of Broadcast in this way would
produce spurious wakeups for all of the other goroutines waiting on
the same condition variable. Since waitOnCond did not recheck the
condition in a loop, those spurious wakeups would cause waitOnCond
to spuriously return even if its own ctx was not yet done.

Fixing the aforementioned bugs exposes a final problem, inherent to
the use of condition variables in this way. This one is a performance
problem: for N concurrent calls to waitOnCond, the resulting CPU cost
is at least O(N²). This problem cannot be addressed without either
reintroducing one of the above bugs or abandoning sync.Cond in the
example entirely. Given that this example was already published in Go
1.21, I worry that Go users may think that it is appropriate to use a
sync.Cond in conjunction with context.AfterFunc, so I have chosen to
retain the Cond-based example and document its pitfalls instead of
removing or replacing it entirely.

I described this class of bugs and performance issues — and suggested
some channel-based alternatives — in my GopherCon 2018 talk,
“Rethinking Classical Concurrency Patterns”. The section on condition
variables starts on slide 37. (https://youtu.be/5zXAHh5tJqQ?t=679)

Fixes golang#62180.
For golang#20491.

Change-Id: If987cd9d112997c56171a7ef4fccadb360bb79bc
Reviewed-on: https://go-review.googlesource.com/c/go/+/521596
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Auto-Submit: Bryan Mills <bcmills@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Bryan Mills <bcmills@google.com>
gopherbot pushed a commit that referenced this issue Aug 24, 2023
…unc_cond

Condition variables are subtle and error-prone, and this example
demonstrates exactly the sorts of problems that they introduce.
Unfortunately, we're stuck with them for the foreseeable future.

As previously implemented, this example was racy: since the callback
passed to context.AfterFunc did not lock the mutex before calling
Broadcast, it was possible for the Broadcast to occur before the
goroutine was parked in the call to Wait, causing in a missed wakeup
resulting in deadlock.

The example also had a more insidious problem: it was not safe for
multiple goroutines to call waitOnCond concurrently, but the whole
point of using a sync.Cond is generally to synchronize concurrent
goroutines. waitOnCond must use Broadcast to ensure that it wakes up
the target goroutine, but the use of Broadcast in this way would
produce spurious wakeups for all of the other goroutines waiting on
the same condition variable. Since waitOnCond did not recheck the
condition in a loop, those spurious wakeups would cause waitOnCond
to spuriously return even if its own ctx was not yet done.

Fixing the aforementioned bugs exposes a final problem, inherent to
the use of condition variables in this way. This one is a performance
problem: for N concurrent calls to waitOnCond, the resulting CPU cost
is at least O(N²). This problem cannot be addressed without either
reintroducing one of the above bugs or abandoning sync.Cond in the
example entirely. Given that this example was already published in Go
1.21, I worry that Go users may think that it is appropriate to use a
sync.Cond in conjunction with context.AfterFunc, so I have chosen to
retain the Cond-based example and document its pitfalls instead of
removing or replacing it entirely.

I described this class of bugs and performance issues — and suggested
some channel-based alternatives — in my GopherCon 2018 talk,
“Rethinking Classical Concurrency Patterns”. The section on condition
variables starts on slide 37. (https://youtu.be/5zXAHh5tJqQ?t=679)

Fixes #62189.
Updates #62180.
For #20491.

Change-Id: If987cd9d112997c56171a7ef4fccadb360bb79bc
Reviewed-on: https://go-review.googlesource.com/c/go/+/521596
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Auto-Submit: Bryan Mills <bcmills@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Bryan Mills <bcmills@google.com>
(cherry picked from commit 1081f8c)
Reviewed-on: https://go-review.googlesource.com/c/go/+/521598
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation NeedsFix The path to resolution is known, but the work has not been done. Testing An issue that has been verified to require only test changes, not just a test failure.
Projects
Status: Done
Development

No branches or pull requests

2 participants