Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: staticlockranking builders failing on release branches on LUCI #64722

Open
prattmic opened this issue Dec 14, 2023 · 8 comments
Open
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@prattmic
Copy link
Member

Example failure:

https://ci.chromium.org/ui/p/golang/builders/try/go1.21-linux-amd64-staticlockranking/b8762252922810888305/test-results?sortby=&groupby=

        65878  ======
        0 : rwmutexW 18 0x111d488
        1 : fin 26 0x111cec0
        fatal error: lock ordering problem
        
        runtime stack:
        runtime.throw({0xbb398e?, 0xffffffffffffe000?})
        	/home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/panic.go:1077 +0x5c fp=0x7ffcd92875b8 sp=0x7ffcd9287588 pc=0x43fd9c
        runtime.checkRanks(0xc0000081a0, 0x7ffcd9287638?, 0x111d460?)
        	/home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/lockrank_on.go:162 +0x236 fp=0x7ffcd9287618 sp=0x7ffcd92875b8 pc=0x411eb6
        runtime.lockWithRankMayAcquire.func1()
        	/home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/lockrank_on.go:235 +0x85 fp=0x7ffcd9287648 sp=0x7ffcd9287618 pc=0x4125c5
        traceback: unexpected SPWRITE function runtime.systemstack
        runtime.systemstack()
        	/home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/asm_amd64.s:509 +0x4a fp=0x7ffcd9287658 sp=0x7ffcd9287648 pc=0x47452a
        
        goroutine 1 [running]:
        runtime.systemstack_switch()
        	/home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/asm_amd64.s:474 +0x8 fp=0xc00013f3c8 sp=0xc00013f3b8 pc=0x4744c8
        runtime.lockWithRankMayAcquire(0x100c00013f488?, 0x7fc6e535f658?)
        	/home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/lockrank_on.go:224 +0x5a fp=0xc00013f400 sp=0xc00013f3c8 pc=0x4124fa
        runtime.lockRankMayQueueFinalizer(...)
        	/home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/mfinal.go:91
        runtime.mallocgc(0x10, 0xb244e0, 0x0)
        	/home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/malloc.go:963 +0x4b fp=0xc00013f468 sp=0xc00013f400 pc=0x413eeb
        runtime.convTnoptr(0xb244e0, 0x2f?)
        	/home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/iface.go:348 +0x2b fp=0xc00013f4a0 sp=0xc00013f468 pc=0x41096b
        syscall.Setrlimit(0x7, 0x30?)
        	/home/swarming/.swarming/w/ir/x/w/goroot/src/syscall/rlimit.go:47 +0x4f fp=0xc00013f4e8 sp=0xc00013f4a0 pc=0x48830f
        syscall.Exec({0xc00003c7c0?, 0xc00013f5e8?}, {0xc000036040, 0x2, 0x2}, {0xc000006c00, 0x30, 0x30})
        	/home/swarming/.swarming/w/ir/x/w/goroot/src/syscall/exec_unix.go:283 +0x16c fp=0xc00013f580 sp=0xc00013f4e8 pc=0x486d4c
        cmd/go/internal/toolchain.execGoToolchain({0xc00003a0cc, 0x7}, {0xc000040037, 0x28}, {0xc00003c7c0, 0x40})
        	/home/swarming/.swarming/w/ir/x/w/goroot/src/cmd/go/internal/toolchain/exec.go:53 +0x345 fp=0xc00013f618 sp=0xc00013f580 pc=0x9c0f45
        cmd/go/internal/toolchain.Exec({0xc00003a0cc, 0x7})
        	/home/swarming/.swarming/w/ir/x/w/goroot/src/cmd/go/internal/toolchain/select.go:280 +0x345 fp=0xc00013f8c8 sp=0xc00013f618 pc=0x9c2145
        cmd/go/internal/toolchain.Select()
        	/home/swarming/.swarming/w/ir/x/w/goroot/src/cmd/go/internal/toolchain/select.go:230 +0xb0f fp=0xc00013fa00 sp=0xc00013f8c8 pc=0x9c1d6f
        cmd/go.main()
        	/home/swarming/.swarming/w/ir/x/w/goroot/src/cmd/go/main.go:97 +0x34 fp=0xc00013fb10 sp=0xc00013fa00 pc=0xa19454
        cmd/go.Main(...)
        	/home/swarming/.swarming/w/ir/x/w/goroot/src/cmd/go/export_test.go:7
        cmd/go_test.TestMain(0x4760fa?)
        	/home/swarming/.swarming/w/ir/x/w/goroot/src/cmd/go/go_test.go:160 +0x14dc fp=0xc00013fe88 sp=0xc00013fb10 pc=0xa5b85c
        main.main()
        	_testmain.go:193 +0x1c6 fp=0xc00013ff40 sp=0xc00013fe88 pc=0xa7f3a6
        runtime.main()
        	/home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/proc.go:267 +0x2bb fp=0xc00013ffe0 sp=0xc00013ff40 pc=0x44283b
        runtime.goexit()
        	/home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00013ffe8 sp=0xc00013ffe0 pc=0x4764a1

This specific ordering violation is not problematic, though it is unclear to me why this is only failing on LUCI, and even there only on the release branches. More importantly, digging into this reveals fundamental problems with the may we model rwmutex.

  1. We treat all rwmutex the same. That is, they all use rwmutexR and rwmutexW even though they are semantically different locks. This is technically OK, but it reduces precision in the lock ranking and makes it more difficult to understand.
  2. rwmutexR is not actually held across read locks, it is just an internal implementation detail held temporarily when there is contention. As a result the read lock rank is not consistently modeled since the lock is so rarely taken. We should have a rank that is always acquired on read lock.

cc @mknyszek

@prattmic prattmic added the NeedsFix The path to resolution is known, but the work has not been done. label Dec 14, 2023
@prattmic prattmic added this to the Go1.22 milestone Dec 14, 2023
@prattmic prattmic self-assigned this Dec 14, 2023
@gopherbot
Copy link

Change https://go.dev/cl/549536 mentions this issue: runtime: properly model rwmutex in lock ranking

@bcmills bcmills added the compiler/runtime Issues related to the Go compiler and/or runtime. label Dec 14, 2023
@prattmic
Copy link
Member Author

@gopherbot please backport to 1.20 and 1.21. This test-only problem is causing failures on the LUCI release branches.

@gopherbot
Copy link

Backport issue(s) opened: #64760 (for 1.20), #64761 (for 1.21).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://go.dev/wiki/MinorReleases.

gopherbot pushed a commit that referenced this issue Dec 15, 2023
Currently, lock ranking doesn't really try to model rwmutex. It records
the internal locks rLock and wLock, but in a subpar fashion:

1. wLock is held from lock to unlock, so it works OK, but it conflates
   write locks of all rwmutexes as rwmutexW, rather than allowing
   different rwmutexes to have different rankings.
2. rLock is an internal implementation detail that is only taken when
   there is contention in rlock. As as result, the reader lock path is
   almost never checked.

Add proper modeling. rwmutexR and rwmutexW remain as the ranks of the
internal locks, which have their own ordering. The new init method is
passed the ranks of the higher level lock that this represents, just
like lockInit for mutex.

execW ordered before MALLOC captures the case from #64722. i.e., there
can be allocation between BeforeFork and AfterFork.

For #64722.

Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-staticlockranking
Change-Id: I23335b28faa42fb04f1bc9da02fdf54d1616cd28
Reviewed-on: https://go-review.googlesource.com/c/go/+/549536
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
@mdempsky mdempsky changed the title staticlockranking builders failing on release branches on LUCI runtime: staticlockranking builders failing on release branches on LUCI Jan 4, 2024
@mdempsky
Copy link
Member

mdempsky commented Jan 4, 2024

Is this a regression, or has lock ranking never worked on the LUCI builders for the release branches?

@prattmic
Copy link
Member Author

prattmic commented Jan 4, 2024

It has never worked on the LUCI builders.

@gopherbot
Copy link

Change https://go.dev/cl/554976 mentions this issue: [release-branch.go1.21] runtime: properly model rwmutex in lock ranking

@gopherbot
Copy link

Change https://go.dev/cl/554995 mentions this issue: [release-branch.go1.20] runtime: properly model rwmutex in lock ranking

@gopherbot
Copy link

Change https://go.dev/cl/555055 mentions this issue: runtime: replace rwmutexR/W with per-rwmutex lock rank

gopherbot pushed a commit that referenced this issue Jan 9, 2024
CL 549536 intended to decouple the internal implementation of rwmutex
from the semantic meaning of an rwmutex read/write lock in the static
lock ranking.

Unfortunately, it was not thought through well enough. The internals
were represented with the rwmutexR and rwmutexW lock ranks. The idea was
that the internal lock ranks need not model the higher-level ordering,
since those have separate rankings. That is incorrect; rwmutexW is held
for the duration of a write lock, so it must be ranked before any lock
taken while any write lock is held, which is precisely what we were
trying to avoid.

This is visible in violations like:

        0 : execW 11 0x0
        1 : rwmutexW 51 0x111d9c8
        2 : fin 30 0x111d3a0
        fatal error: lock ordering problem

execW < fin is modeled, but rwmutexW < fin is missing.

Fix this by eliminating the rwmutexR/W lock ranks shared across
different types of rwmutex. Instead require users to define an
additional "internal" lock rank to represent the implementation details
of rwmutex.rLock. We can avoid an additional "internal" lock rank for
rwmutex.wLock because the existing writeRank has the same semantics for
semantic and internal locking. i.e., writeRank is held for the duration
of a write lock, which is exactly how rwmutex.wLock is used, so we can
use writeRank directly on wLock.

For #64722.

Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-staticlockranking
Change-Id: Ia572de188a46ba8fe054ae28537648beaa16b12c
Reviewed-on: https://go-review.googlesource.com/c/go/+/555055
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
gopherbot pushed a commit that referenced this issue Jan 25, 2024
(This cherry-pick combines CL 549536 and the follow-up fix CL 555055.)

Currently, lock ranking doesn't really try to model rwmutex. It records
the internal locks rLock and wLock, but in a subpar fashion:

1. wLock is held from lock to unlock, so it works OK, but it conflates
   write locks of all rwmutexes as rwmutexW, rather than allowing
   different rwmutexes to have different rankings.
2. rLock is an internal implementation detail that is only taken when
   there is contention in rlock. As as result, the reader lock path is
   almost never checked.

Add proper modeling. rwmutexR and rwmutexW remain as the ranks of the
internal locks, which have their own ordering. The new init method is
passed the ranks of the higher level lock that this represents, just
like lockInit for mutex.

execW ordered before MALLOC captures the case from #64722. i.e., there
can be allocation between BeforeFork and AfterFork.

For #64722.
Fixes #64760.

------

runtime: replace rwmutexR/W with per-rwmutex lock rank

CL 549536 intended to decouple the internal implementation of rwmutex
from the semantic meaning of an rwmutex read/write lock in the static
lock ranking.

Unfortunately, it was not thought through well enough. The internals
were represented with the rwmutexR and rwmutexW lock ranks. The idea was
that the internal lock ranks need not model the higher-level ordering,
since those have separate rankings. That is incorrect; rwmutexW is held
for the duration of a write lock, so it must be ranked before any lock
taken while any write lock is held, which is precisely what we were
trying to avoid.

This is visible in violations like:

        0 : execW 11 0x0
        1 : rwmutexW 51 0x111d9c8
        2 : fin 30 0x111d3a0
        fatal error: lock ordering problem

execW < fin is modeled, but rwmutexW < fin is missing.

Fix this by eliminating the rwmutexR/W lock ranks shared across
different types of rwmutex. Instead require users to define an
additional "internal" lock rank to represent the implementation details
of rwmutex.rLock. We can avoid an additional "internal" lock rank for
rwmutex.wLock because the existing writeRank has the same semantics for
semantic and internal locking. i.e., writeRank is held for the duration
of a write lock, which is exactly how rwmutex.wLock is used, so we can
use writeRank directly on wLock.

For #64722.

Cq-Include-Trybots: luci.golang.try:go1.20-linux-amd64-staticlockranking
Change-Id: I23335b28faa42fb04f1bc9da02fdf54d1616cd28
Reviewed-on: https://go-review.googlesource.com/c/go/+/549536
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
(cherry picked from commit 9b4b3e5)
(cherry picked from commit dcbe772)
Reviewed-on: https://go-review.googlesource.com/c/go/+/554995
gopherbot pushed a commit that referenced this issue Jan 25, 2024
(This cherry-pick combines CL 549536 and the follow-up fix CL 555055.)

Currently, lock ranking doesn't really try to model rwmutex. It records
the internal locks rLock and wLock, but in a subpar fashion:

1. wLock is held from lock to unlock, so it works OK, but it conflates
   write locks of all rwmutexes as rwmutexW, rather than allowing
   different rwmutexes to have different rankings.
2. rLock is an internal implementation detail that is only taken when
   there is contention in rlock. As as result, the reader lock path is
   almost never checked.

Add proper modeling. rwmutexR and rwmutexW remain as the ranks of the
internal locks, which have their own ordering. The new init method is
passed the ranks of the higher level lock that this represents, just
like lockInit for mutex.

execW ordered before MALLOC captures the case from #64722. i.e., there
can be allocation between BeforeFork and AfterFork.

For #64722.
Fixes #64761.

------

runtime: replace rwmutexR/W with per-rwmutex lock rank

CL 549536 intended to decouple the internal implementation of rwmutex
from the semantic meaning of an rwmutex read/write lock in the static
lock ranking.

Unfortunately, it was not thought through well enough. The internals
were represented with the rwmutexR and rwmutexW lock ranks. The idea was
that the internal lock ranks need not model the higher-level ordering,
since those have separate rankings. That is incorrect; rwmutexW is held
for the duration of a write lock, so it must be ranked before any lock
taken while any write lock is held, which is precisely what we were
trying to avoid.

This is visible in violations like:

        0 : execW 11 0x0
        1 : rwmutexW 51 0x111d9c8
        2 : fin 30 0x111d3a0
        fatal error: lock ordering problem

execW < fin is modeled, but rwmutexW < fin is missing.

Fix this by eliminating the rwmutexR/W lock ranks shared across
different types of rwmutex. Instead require users to define an
additional "internal" lock rank to represent the implementation details
of rwmutex.rLock. We can avoid an additional "internal" lock rank for
rwmutex.wLock because the existing writeRank has the same semantics for
semantic and internal locking. i.e., writeRank is held for the duration
of a write lock, which is exactly how rwmutex.wLock is used, so we can
use writeRank directly on wLock.

For #64722.

Cq-Include-Trybots: luci.golang.try:go1.21-linux-amd64-staticlockranking
Change-Id: I23335b28faa42fb04f1bc9da02fdf54d1616cd28
Reviewed-on: https://go-review.googlesource.com/c/go/+/549536
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
(cherry picked from commit 9b4b3e5)
(cherry picked from commit dcbe772)
Reviewed-on: https://go-review.googlesource.com/c/go/+/554976
@gopherbot gopherbot modified the milestones: Go1.22, Go1.23 Feb 6, 2024
ezz-no pushed a commit to ezz-no/go-ezzno that referenced this issue Feb 18, 2024
Currently, lock ranking doesn't really try to model rwmutex. It records
the internal locks rLock and wLock, but in a subpar fashion:

1. wLock is held from lock to unlock, so it works OK, but it conflates
   write locks of all rwmutexes as rwmutexW, rather than allowing
   different rwmutexes to have different rankings.
2. rLock is an internal implementation detail that is only taken when
   there is contention in rlock. As as result, the reader lock path is
   almost never checked.

Add proper modeling. rwmutexR and rwmutexW remain as the ranks of the
internal locks, which have their own ordering. The new init method is
passed the ranks of the higher level lock that this represents, just
like lockInit for mutex.

execW ordered before MALLOC captures the case from golang#64722. i.e., there
can be allocation between BeforeFork and AfterFork.

For golang#64722.

Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-staticlockranking
Change-Id: I23335b28faa42fb04f1bc9da02fdf54d1616cd28
Reviewed-on: https://go-review.googlesource.com/c/go/+/549536
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
ezz-no pushed a commit to ezz-no/go-ezzno that referenced this issue Feb 18, 2024
CL 549536 intended to decouple the internal implementation of rwmutex
from the semantic meaning of an rwmutex read/write lock in the static
lock ranking.

Unfortunately, it was not thought through well enough. The internals
were represented with the rwmutexR and rwmutexW lock ranks. The idea was
that the internal lock ranks need not model the higher-level ordering,
since those have separate rankings. That is incorrect; rwmutexW is held
for the duration of a write lock, so it must be ranked before any lock
taken while any write lock is held, which is precisely what we were
trying to avoid.

This is visible in violations like:

        0 : execW 11 0x0
        1 : rwmutexW 51 0x111d9c8
        2 : fin 30 0x111d3a0
        fatal error: lock ordering problem

execW < fin is modeled, but rwmutexW < fin is missing.

Fix this by eliminating the rwmutexR/W lock ranks shared across
different types of rwmutex. Instead require users to define an
additional "internal" lock rank to represent the implementation details
of rwmutex.rLock. We can avoid an additional "internal" lock rank for
rwmutex.wLock because the existing writeRank has the same semantics for
semantic and internal locking. i.e., writeRank is held for the duration
of a write lock, which is exactly how rwmutex.wLock is used, so we can
use writeRank directly on wLock.

For golang#64722.

Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-staticlockranking
Change-Id: Ia572de188a46ba8fe054ae28537648beaa16b12c
Reviewed-on: https://go-review.googlesource.com/c/go/+/555055
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsFix The path to resolution is known, but the work has not been done.
Projects
Status: In Progress
Development

No branches or pull requests

4 participants