Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

weak: TestIssue69210 failures #70455

Closed
gopherbot opened this issue Nov 20, 2024 · 18 comments
Closed

weak: TestIssue69210 failures #70455

gopherbot opened this issue Nov 20, 2024 · 18 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. Critical A critical problem that affects the availability or correctness of production systems built using Go NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@gopherbot
Copy link
Contributor

#!watchflakes
default <- pkg == "weak" && test == "TestIssue69210"

Issue created automatically to collect these failures.

Example (log):

=== RUN   TestIssue69210
runtime: marked free object in span 0x7b8257a312b0, elemsize=16 freeindex=0 (bad use of unsafe.Pointer or having race conditions? try -d=checkptr or -race)
0xc000014000 alloc marked  
0xc000014010 alloc marked  
0xc000014020 alloc marked  
0xc000014030 alloc marked  
0xc000014040 alloc marked  
0xc000014050 alloc marked  
0xc000014060 alloc marked  
0xc000014070 alloc marked  
...
fatal error: found pointer to free object

goroutine 35 gp=0xc0000f6c40 m=12 mp=0xc000120008 [running]:
runtime.throw({0x63fe18?, 0xc000014440?})
	/home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/panic.go:1099 +0x48 fp=0xc000075d20 sp=0xc000075cf0 pc=0x4a5d28
runtime.(*mspan).reportZombies(0x7b8257a312b0)
	/home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/mgcsweep.go:877 +0x2ea fp=0xc000075da0 sp=0xc000075d20 pc=0x45a48a
runtime.(*sweepLocked).sweep(0x7b2280?, 0x0)
	/home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/mgcsweep.go:657 +0xcf4 fp=0xc000075ee0 sp=0xc000075da0 pc=0x459cb4
runtime.sweepone()
...
runtime.gopark(0x8c26894275?, 0x43ead4?, 0x5f?, 0x80?, 0xc000188050?)
	/home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/proc.go:435 +0xce fp=0xc000128ef8 sp=0xc000128ed8 pc=0x4a5e6e
time.Sleep(0x3d0900)
	/home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/time.go:338 +0x165 fp=0xc000128f50 sp=0xc000128ef8 pc=0x4a9165
weak_test.TestIssue69210.func2()
	/home/swarming/.swarming/w/ir/x/w/goroot/src/weak/pointer_test.go:197 +0x13a fp=0xc000128fe0 sp=0xc000128f50 pc=0x5db0da
runtime.goexit({})
	/home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc000128fe8 sp=0xc000128fe0 pc=0x4acd21
created by weak_test.TestIssue69210 in goroutine 34
	/home/swarming/.swarming/w/ir/x/w/goroot/src/weak/pointer_test.go:187 +0x25c

watchflakes

@gopherbot gopherbot added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Nov 20, 2024
@gopherbot
Copy link
Contributor Author

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "weak" && test == "TestIssue69210"
2024-11-19 23:01 gotip-linux-amd64-longtest-race go@6f519476 weak.TestIssue69210 [ABORT] (log)
=== RUN   TestIssue69210
runtime: marked free object in span 0x7b8257a312b0, elemsize=16 freeindex=0 (bad use of unsafe.Pointer or having race conditions? try -d=checkptr or -race)
0xc000014000 alloc marked  
0xc000014010 alloc marked  
0xc000014020 alloc marked  
0xc000014030 alloc marked  
0xc000014040 alloc marked  
0xc000014050 alloc marked  
0xc000014060 alloc marked  
0xc000014070 alloc marked  
...
fatal error: found pointer to free object

goroutine 35 gp=0xc0000f6c40 m=12 mp=0xc000120008 [running]:
runtime.throw({0x63fe18?, 0xc000014440?})
	/home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/panic.go:1099 +0x48 fp=0xc000075d20 sp=0xc000075cf0 pc=0x4a5d28
runtime.(*mspan).reportZombies(0x7b8257a312b0)
	/home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/mgcsweep.go:877 +0x2ea fp=0xc000075da0 sp=0xc000075d20 pc=0x45a48a
runtime.(*sweepLocked).sweep(0x7b2280?, 0x0)
	/home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/mgcsweep.go:657 +0xcf4 fp=0xc000075ee0 sp=0xc000075da0 pc=0x459cb4
runtime.sweepone()
...
runtime.gopark(0x8c26894275?, 0x43ead4?, 0x5f?, 0x80?, 0xc000188050?)
	/home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/proc.go:435 +0xce fp=0xc000128ef8 sp=0xc000128ed8 pc=0x4a5e6e
time.Sleep(0x3d0900)
	/home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/time.go:338 +0x165 fp=0xc000128f50 sp=0xc000128ef8 pc=0x4a9165
weak_test.TestIssue69210.func2()
	/home/swarming/.swarming/w/ir/x/w/goroot/src/weak/pointer_test.go:197 +0x13a fp=0xc000128fe0 sp=0xc000128f50 pc=0x5db0da
runtime.goexit({})
	/home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc000128fe8 sp=0xc000128fe0 pc=0x4acd21
created by weak_test.TestIssue69210 in goroutine 34
	/home/swarming/.swarming/w/ir/x/w/goroot/src/weak/pointer_test.go:187 +0x25c

watchflakes

@thepudds
Copy link
Contributor

CC @mknyszek

@mknyszek
Copy link
Contributor

Oh that's fun. Thanks for the CC!

@mknyszek mknyszek self-assigned this Nov 20, 2024
@mknyszek mknyszek added the compiler/runtime Issues related to the Go compiler and/or runtime. label Nov 20, 2024
@mknyszek mknyszek added this to the Go1.24 milestone Nov 20, 2024
@thepudds
Copy link
Contributor

There's also #70048, which at first glance seems to be reporting a data race from when it was internal/weak, but maybe that is something else or maybe already resolved?

@mknyszek
Copy link
Contributor

mknyszek commented Nov 20, 2024

They're definitely related. The data race doesn't make any sense unless bt is accidentally pointing to the same object from multiple goroutines. That can only happen if it was erroneously freed.

That also lines up with the failure here: there's an erroneously freed (and then marked) value.

@mknyszek
Copy link
Contributor

I closed that issue so copying over the one failure that's in there:

2024-10-25 04:52 gotip-linux-amd64-longtest-race go@2ef8e41f internal/weak.TestIssue69210 (log)

@mknyszek
Copy link
Contributor

I do think it's interesting that this has only happened in race mode. I wonder if the race instrumentation is breaking any of the assumptions the weak->strong conversion code makes.

More likely though is that it's a more general bug but race mode is just good at reproducing the timing. We only have two data points after all.

@mknyszek
Copy link
Contributor

Having both of these failures is also very useful because it suggests that https://go.dev/cl/623615 is not involved. It landed on November 1st.

@mknyszek
Copy link
Contributor

mknyszek commented Nov 20, 2024

It took a couple minutes of execution under stress2, but I did successfully reproduce it locally with -race.

@mknyszek
Copy link
Contributor

Here's a theory: getOrAddWeakHandle is very careful about keeping p alive. It is not as careful about keeping handle alive, which is only stored in a special, and not visible to the GC until that special actually gets added to the specials list. What if it gets collected, and reused? Race instrumentation won't find that because all mutations of handle are in the runtime. But this theoretically could manifest as the failures seen in this issue, if handle is freed and reused by another handle, which stores a different value into it. ... Maybe?

@mknyszek
Copy link
Contributor

This would specifically be an issue because in the "successful addspecial" case we return s.handle instead of handle, but s is go:notinheap! So handle locally on the stack could die right before the call to addspecial, and in theory be collectible. This is not true in the other paths where handle is returned.

@mknyszek
Copy link
Contributor

I have a fix, assuming this is the issue. I don't have a good way to validate this, so I'm just going to let the test run under stress2 for a very long time with -race and see what happens.

@mknyszek
Copy link
Contributor

I have been running the test for far longer than it took to reproduce the issue, and it's not failing anymore. I'll keep it running, but I think this is it.

@mknyszek
Copy link
Contributor

@gopherbot Please open a backport issue for Go 1.23.

This bug causes rare and subtle crashes when using weak pointers (so, anything using unique and by extension, the net package). The fix is also simple and very safe.

@gopherbot
Copy link
Contributor Author

Backport issue(s) opened: #70469 (for 1.23).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://go.dev/wiki/MinorReleases.

@gopherbot
Copy link
Contributor Author

Change https://go.dev/cl/630315 mentions this issue: runtime: explicitly keep handle alive during getOrAddWeakHandle

@gopherbot
Copy link
Contributor Author

Change https://go.dev/cl/630277 mentions this issue: runtime: keep cleanup closure alive across adding the cleanup special

@gopherbot
Copy link
Contributor Author

Change https://go.dev/cl/630316 mentions this issue: [release-branch.go1.23] runtime: explicitly keep handle alive during getOrAddWeakHandle

gopherbot pushed a commit that referenced this issue Nov 20, 2024
This is similar to the weak handle bug in #70455. In short, there's a
window where a heap-allocated value is only visible through a special
that has not been made visible to the GC yet.

For #70455.

Change-Id: Ic2bb2c60d422a5bc5dab8d971cfc26ff6d7622bc
Reviewed-on: https://go-review.googlesource.com/c/go/+/630277
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
gopherbot pushed a commit that referenced this issue Nov 20, 2024
…getOrAddWeakHandle

getOrAddWeakHandle is very careful about keeping its input alive across
the operation, but not very careful about keeping the heap-allocated
handle it creates alive. In fact, there's a window in this function
where it is *only* visible via the special. Specifically, the window of
time between when the handle is stored in the special and when the
special actually becomes visible to the GC.

(If we fail to add the special because it already exists, that case is
fine. We don't even use the same handle value, but the one we obtain
from the attached GC-visible special, *and* we return that value, so it
remains live.)

For #70455.
Fixes #70469.

Change-Id: Iadaff0cfb93bcaf61ba2b05be7fa0519c481de82
Reviewed-on: https://go-review.googlesource.com/c/go/+/630316
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. Critical A critical problem that affects the availability or correctness of production systems built using Go and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Nov 20, 2024
mpminardi pushed a commit to tailscale/go that referenced this issue Jan 28, 2025
…getOrAddWeakHandle

getOrAddWeakHandle is very careful about keeping its input alive across
the operation, but not very careful about keeping the heap-allocated
handle it creates alive. In fact, there's a window in this function
where it is *only* visible via the special. Specifically, the window of
time between when the handle is stored in the special and when the
special actually becomes visible to the GC.

(If we fail to add the special because it already exists, that case is
fine. We don't even use the same handle value, but the one we obtain
from the attached GC-visible special, *and* we return that value, so it
remains live.)

For golang#70455.
Fixes golang#70469.

Change-Id: Iadaff0cfb93bcaf61ba2b05be7fa0519c481de82
Reviewed-on: https://go-review.googlesource.com/c/go/+/630316
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. Critical A critical problem that affects the availability or correctness of production systems built using Go NeedsFix The path to resolution is known, but the work has not been done.
Projects
Archived in project
Development

No branches or pull requests

4 participants