Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime,sync/atomic: deadlocks in sync/atomic_test.TestValueSwapConcurrent #45975

Closed
bcmills opened this issue May 5, 2021 · 11 comments
Closed
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. release-blocker
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented May 5, 2021

2021-05-05T15:54:39-69368ce/linux-386-longtest
2021-05-04T22:12:31-6a6aa32/linux-386-longtest

Also seen in a TryBot:
https://storage.googleapis.com/go-build-log/9041f75f/linux-amd64-longtest_50b82ee1.log

I'm not gonna paste the logs here because they're huge.

The deadlocked test was added in CL 241678, but given the gcMarkDone stacks I'm guessing that this is triggering an existing bug in the runtime rather than a bug in sync/atomic or the test itself. Still, it may be worth considering whether the test can be written in a way that is less allocation-intensive. 😅

CC @arnottcr @randall77 @prattmic @mknyszek @aclements

@bcmills bcmills added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker labels May 5, 2021
@bcmills bcmills added this to the Go1.17 milestone May 5, 2021
@bcmills
Copy link
Contributor Author

bcmills commented May 5, 2021

(It is not clear to me whether this issue is related to #45884 and/or #45885 and/or #45916. They might all be symptoms of the same bug.)

@urandom2
Copy link
Contributor

urandom2 commented May 5, 2021

I have not have enough profiling and optimisation experience to just clock allocations. I assume this is the two TestXxxConcurrent methods? would introducing pointers, or storing all the numbers to the stack before iteration help?

@urandom2
Copy link
Contributor

urandom2 commented May 5, 2021

forgot about -gcflags=-m; can you confirm the efficacy of:

  • O(1): define the anonymous func outside the for loop, so that we have one closure that accepts i uint64 as a parameter
  • O(m): new must escape to the heap because another goroutine may pick it up
  • O(1): ignoring heap escape outside the for loops
  • also, I will cleanup the imports of both "sync/atomic" and . "sync/atomic"

@randall77
Copy link
Contributor

I think preallocating all the interface{} that hold numbers might reduce the allocation profile a lot.

I don't think defining the func outside the loop would help much - we need to allocate a goroutine per iteration in any case.

@prattmic
Copy link
Member

prattmic commented May 10, 2021

@dr2chase got a similar looking failure in a benchmark, for which we have a core file. So far, we've found:

  • gcMarkDone is calling forEachP, which is stuck waiting for all Ps to run the function.
  • One P hasn't run the function (i.e., runSafePointFn == 1).
  • This P is in _Pidle, so it should have been handled by forEachP (if already idle), or on transition to idle.
  • This P is in sched.pidle by the time we got the core.

I'm still investigating further.

@gopherbot
Copy link

Change https://golang.org/cl/318529 mentions this issue: DO NOT SUBMIT: trace forEachP, P transitions

@gopherbot
Copy link

Change https://golang.org/cl/318569 mentions this issue: runtime: hold sched.lock across atomic pidleget/pidleput

@urandom2
Copy link
Contributor

I think preallocating all the interface{} that hold numbers might reduce the allocation profile a lot.

You thinking of something like this?

diff --git a/src/sync/atomic/value_test.go b/src/sync/atomic/value_test.go
index a5e717d6e0..f474f3f8b0 100644
--- a/src/sync/atomic/value_test.go
+++ b/src/sync/atomic/value_test.go
@@ -188,9 +188,11 @@ func TestValueSwapConcurrent(t *testing.T) {
                i := i
                g.Add(1)
                go func() {
+                       var box interface{}
                        var c uint64
                        for new := i; new < i+n; new++ {
-                               if old := v.Swap(new); old != nil {
+                               box = new
+                               if old := v.Swap(box); old != nil {
                                        c += old.(uint64)
                                }
                        }

Does not really seem to trace any better:

// Package main attempts to recreate atomic_test.TestValueSwapConcurrent in go1.16.
package main

import (
	"log"
	"sync"
	. "sync/atomic"
)

func main() {
	var x sync.Mutex
	var v Value
	var count uint64
	var g sync.WaitGroup
	var m, n uint64 = 10000, 10000
	m = 1000
	n = 1000
	for i := uint64(0); i < m*n; i += n {
		i := i
		g.Add(1)
		var box interface{}
		go func() {
			var c uint64
			for new := i; new < i+n; new++ {
				box = new
				x.Lock()
				old := v.Load()
				v.Store(box)
				x.Unlock()
				if old != nil {
					c += old.(uint64)
				}
			}
			AddUint64(&count, c)
			g.Done()
		}()
	}
	g.Wait()
	want, got := (m*n-1)*(m*n)/2, count+v.Load().(uint64)
	log.Printf("sum from 0 to %d was %d, want %v\n", m*n-1, got, want)
}
./main.go:38:44: inlining call to atomic.(*Value).Load
./main.go:25:11: inlining call to sync.(*Mutex).Lock
./main.go:26:18: inlining call to atomic.(*Value).Load
./main.go:28:13: inlining call to sync.(*Mutex).Unlock
./main.go:34:10: inlining call to sync.(*WaitGroup).Done
./main.go:10:6: moved to heap: x
./main.go:11:6: moved to heap: v
./main.go:12:6: moved to heap: count
./main.go:13:6: moved to heap: g
./main.go:14:9: moved to heap: n
./main.go:20:7: moved to heap: box
./main.go:21:6: func literal escapes to heap
./main.go:39:12: ... argument does not escape
./main.go:39:54: m * n - 1 escapes to heap
./main.go:39:54: got escapes to heap
./main.go:39:54: want escapes to heap
./main.go:24:9: new escapes to heap

@randall77
Copy link
Contributor

I was thinking for the CompareAndSwapConcurrent test, we probably convert j and j+1 to interface{} lots of times before the CompareAndSwap succeeds. We can do those conversions only once.
(The performance might matter between making just one for every j at the start of the test (a [m*n]interface{}), or making j and j+1 only once per successful CompareAndSwap, as one will succeed with pointer equality and one needs a call into the runtime to do the comparison, but I think either way should be good enough.)

I agree for just Swap it probably doesn't matter.

@urandom2
Copy link
Contributor

so, every time through the inner for loop we make two more heap values, even if we are just spinning? thus, this should allocate only what we absolutely need:

diff --git a/src/sync/atomic/value_test.go b/src/sync/atomic/value_test.go
index a5e717d6e0..5f1eea7d1c 100644
--- a/src/sync/atomic/value_test.go
+++ b/src/sync/atomic/value_test.go
@@ -255,12 +257,16 @@ func TestValueCompareAndSwapConcurrent(t *testing.T) {
                m = 100
                n = 100
        }
+       vec := make([]interface{}, m*n)
+       for i := range vec {
+               vec[i] = i
+       }
        for i := 0; i < m; i++ {
                i := i
                w.Add(1)
                go func() {
                        for j := i; j < m*n; runtime.Gosched() {
-                               if v.CompareAndSwap(j, j+1) {
+                               if v.CompareAndSwap(vec[j], vec[j+1]) {
                                        j += m
                                }
                        }

otherwise I think we have to do some deeper loop in loop nesting to do the "making j and j+1 only once per successful CompareAndSwap"; but yeah, I see how this can absolutely suck resources if you are constructing new interface data structures on the heap for every iteration through.

@randall77
Copy link
Contributor

@arnottcr Yes, that's exactly the change I was contemplating.

@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels May 21, 2021
@golang golang locked and limited conversation to collaborators May 21, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. release-blocker
Projects
None yet
Development

No branches or pull requests

6 participants