-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x/tools/go/ssa: commit 75ff53bc introduced a race condition #67079
Comments
Hi, maintainer of go/ssa and author of https://go.dev/cl/538358 here. I don't have a good hypothesis yet: that CL looks sound w.r.t. concurrency, as does the same code at master. Interestingly, tinygo does not appear to use Program.RuntimeTypes or ssautil.AllFunctions (which is good: these functions should be deprecated!), which were the primary focus of that CL. The tinygo logic to construct go/ssa code looks generally sound too. Can you share the test case you used for bisecting to this CL? Even if it takes hundreds of runs to reliably fail, that would make it much easier to investigate. |
@aykevl It looks like
A lack of synchronization could be a source of races once generics are involved. (ssa.Program.Build solves this.) I don't know why https://go.dev/cl/538358 would be involved. A test case would really help. |
It's quite safe to go/ssa-Build all Packages in parallel--that's exactly what Program.Build does--so that doesn't explain it. tinygo seems to use concurrency for building and compiling each SSA package in parallel while respecting topological order, which seems sound. (There's more available parallelism of course, in reading, parsing, type-checking, etc.) The only synchronization within each compile job (package) is a cross-process lock on the output file, presumably to prevent concurrent writes from different tinygo processes. There's no fine-grained concurrency, so the lack of Mutex.Lock operations isn't obviously a problem. I wonder about the fact that some functions in low-level packages of intrinsics (runtime et al) are used from all other packages (e.g. when compiling ssa.Go); if the intrinsics package is not fully llvm-built before the other packages this could lead to a race.
Agreed. |
ssa is very careful to only read from specific fixed fields on functions from other packages or created functions while building. Reading from generic instantiations before they are done can race. Waiting for every package to finish building is good enough, which is what ssa.Program does. Annoyingly diamonds can cause the same instantiation to be created so package topology is not enough. This definitely used to be the case, and I don't remember fixing this. (My memory might be off though.) |
That sounds very much like the behavior I have observed anecdotally. However, I do not have a reproducible test case. |
The way I can reproduce this issue in TinyGo is as follows:
Normally it prints lines like this:
But sometimes it will result in a crash, like this (though there are many forms this crash can take, this is a common one):
I have tested this with the current dev branch, https://github.com/tinygo-org/tinygo/tree/1d9f26cee1f1501b09647186a25ba29aa6a0c58c. |
An easy and helpful experiment would be to bracket each ssa.Package.Build operation for package p with That would allow us to see whether both SSA and LLVM build steps are respecting the topological order of the graph. |
I am currently trimming down the list of packages that, when built together, trigger this race. That's rather time-consuming. Right now I'm down to this list:
So it's not related to the runtime package it seems. @adonovan thanks for the suggestion! I'll try that once I've trimmed down the package list some more. Though I think I could also trigger the crash when I built all of the SSA at the start using |
I have created a branch here that shows the bug: Then, build TinyGo (use
I found a few things that may point towards the bug:
So my guess would be that the "build methods of RuntimeTypes lazily" change does indeed build some methods lazily, but then when two different goroutines access one of these methods, the second doesn't wait until the first is finished. I also managed to capture a race condition with the Race condition log
(There's a lot of duplication in there, I'm not sure why - maybe another bug?) |
This looks suspicious: https://github.com/golang/tools/blob/75ff53bc6b14b0a09f5d4b838ea47ae679d748ba/go/ssa/methods.go#L52 I also see that the synthetic method |
Thanks @aykevl, this is really helpful information. You're right that there's a race building wrappers. I'll prepare a fix. |
There are a couple of strange things about the
Nonetheless, I can reproduce the race nearly 100% of the time using the test below, so I can work on a fix now. func TestIssue67079(t *testing.T) {
// This test reproduced a race in the SSA builder nearly 100% of the time.
// Load the package.
const src = `package p; type T int; func (T) f() {}; var _ = (*T).f`
conf := loader.Config{Fset: token.NewFileSet()}
f, err := parser.ParseFile(conf.Fset, "p.go", src, 0)
if err != nil {
t.Fatal(err)
}
conf.CreateFromFiles("p", f)
iprog, err := conf.Load()
if err != nil {
t.Fatal(err)
}
pkg := iprog.Created[0].Pkg
// Create and build SSA program.
ptrT := types.NewPointer(pkg.Scope().Lookup("T").Type())
ptrTf := types.NewMethodSet(ptrT).At(0) // (*T).f symbol
prog := ssautil.CreateProgram(iprog, ssa.BuilderMode(0))
prog.Build()
var g errgroup.Group
// Access bodies of all functions.
g.Go(func() error {
for fn := range ssautil.AllFunctions(prog) {
for _, b := range fn.Blocks {
for _, instr := range b.Instrs {
if call, ok := instr.(*ssa.Call); ok {
call.Common().StaticCallee() // access call.Value
}
}
}
}
return nil
})
// Force building of wrappers.
g.Go(func() error {
prog.MethodValue(ptrTf)
return nil
})
g.Wait() // ignore error
} |
Change https://go.dev/cl/590815 mentions this issue: |
Thank you for working on a fix!
Yeah I was wondering about that too. Perhaps it's a bug in the stack trace code? I'm on linux/arm64 (Asahi Linux, Fedora 40) if that matters.
I added a replace directive to go.mod to use golang/tools@75ff53b, see my previous comment.
Yeah I did have a few local edits but forgot to mention it. Sorry about that! |
Thanks for reporting the crucial information!
I would urge you to investigate and report it separately, as regardless of whether it's a bug in your code, our code, or the Go runtime, it seems like something we would all want to fix. Any reliable way to reproduce that impossible stack would be a great help.
Not to worry, the information you provided was sufficient to pinpoint the problem.
That's right; only panic and Goexit cause deferred functions to run. Internal errors in the runtime (called "fatal" or "throw"), or unhandled signals, simply terminate the program and print the stacks. |
Ok, here is a way to reproduce this:
package main
import _ "context"
func main() {
} You may need to run this a few times, but for me it reproduces in about half of the cases (with the same This is on an aarch64 system (Apple M1 Pro CPU) running Fedora Asahi 40, with the distribution version of Go (but I've also seen the same bug on a Go version downloaded from go.dev/dl so it isn't distribution specific). I can turn this into a separate bug? I guess it should be possible to reduce this testcase further. |
Even after checking out the repo and the correct branch, installing llvm via homebrew, and pulling the git submodules, I continue to get this build error: tinygo2$ go install -race
# tinygo.org/x/go-llvm
../../go/pkg/mod/tinygo.org/x/go-llvm@v0.0.0-20240518103902-697964f2a9dc/analysis.go:16:10: fatal error: 'llvm-c/Analysis.h' file not found
#include "llvm-c/Analysis.h" // If you are getting an error here you need to build or install LLVM, see https://tinygo.org/docs/guides/build/
^~~~~~~~~~~~~~~~~~~
1 error generated. so unfortunately I can't reproduce it yet. Could you disassemble the
Yes, please do. |
@adonovan weird, did you try
I'll try to create a bug report later, I don't really have the time right now. Maybe after this weekend. |
Change https://go.dev/cl/591775 mentions this issue: |
Program.MethodValue returns the wrapper function for a given function symbol, creating them on demand and caching them. However, methods are inserted into the cache before building is complete, so if MethodValue is called concurrently, thread A may construct a function, insert it in the cache, then build it, while another thread B retrieves it from the cache and returns without waiting, causing it to observe the function in the process of being built. The fix is for threads making a cache hit to wait for an event that indicates completion of building. The event is represented as a field, 'built chan unit'. (The extra synchroniziation makes no apparent difference to TestStdlib's SSA create and build metrics.) Also: - Add a test that reliably reproduces the data race. - Un-export the methods created.{add,at,len}. - rename Function.done to 'finish', and simplify. - define type unit = struct{} and clean up. - consolidate clean-up of transient builder state in finishBody. Fixes golang/go#67079 Change-Id: I621150b962dc0430d78d2895d81aa7015a53e2cb
@aykevl FYI: my fix MethodValue race CL, which you cherrypicked to tinygo, is not going to be merged to x/tools. @timothy-king extended it in CL 591775 to cover the races that can also arise though generic instantiation; we hope to land that in x/tools soon. |
@adonovan yeah that's what I expected. I just wanted a quick workaround because this bug has been nagging us for quite some time (and we have an upcoming release) and the patch looked smaller and more focused. Once the fix lands in x/tools, I'll update TinyGo to use that version instead. |
Go version
go version go1.22.2 linux/arm64
Output of
go env
in your module/workspace:What did you do?
I upgraded the x/tools package to v0.17.0.
What did you see happen?
Lots of infrequent crashes starting from golang/tools@75ff53b (see tinygo-org/tinygo#4206).
The symptoms are all kinds of weird crashes, but in particular what I'm seeing often is functions that appear to be unfinished (no terminator in a basic block).
Interestingly, once I build the program using
-race
the race condition goes away. This makes it particularly difficult to pinpoint where exactly the bug is located.I've also tried the master branch (golang/tools@0b45163) but sadly it still shows the same crashes.
I can try to work on a standalone reproducer, but with the nature of this crash it's difficult to find one. Still, if it's difficult to find the culprit just from the bisected commit I can try to work on a reproducer.
What did you expect to see?
No race condition.
The text was updated successfully, but these errors were encountered: