New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
race: Thread Sanitizer breaks sequential consistency semantics #58020
Comments
CC @dvyukov |
Looks like we started promising sequential consistent behavior in https://go-review.googlesource.com/c/go/+/381316 (~ 1 year ago, released first in 1.19), but we didn't verify that we actually provided that, as far as I can tell. @rsc Probably the right fix is to fix the tsan library to use But an easier, quicker fix is just to use the |
Change https://go.dev/cl/463227 mentions this issue: |
I have a CL out to fix this immediate problem. I'm not sure we're really covered completely, though. For instance atomic loads are marked acquire, not acquire+release. Will that lead to problems? The compare-and-swap ops are also dual-marked and one of the marks is acquire-only. Maybe that's for when the comparison fails? What about other architectures? They may in fact be worse, as they tend to have looser default semantics. How do we test this? The good news is that this only matters to race mode, where performance doesn't matter much. We can drop big memory barrier hammers if we need to. |
There are tests for sequential consistency, they run under race detector and don't fail, right? |
Yes, those tests pass. That said, they also pass if I change the non-race code to use MOVQ instead of XCHGQ. So either we're currently using XCHGQ unnecessarily or our tests aren't very thorough. (Or maybe my processor provides more than minimal guarantees, hard to tell.) I think we're going to need a test case from the OP. Without that, it is hard to be sure there is something wrong, and certainly hard to tell whether we've actually fixed it. |
It should fail pretty reliably on a typical x86 CPU.
Then (which should practically lead to MOVQ):
|
Hmm, I did this:
Maybe that code doesn't get used somehow? Semantic inlining somewhere, I suspect. |
Yes, that'll be the reason. Need this patch:
Then the sequential consistency tests in sync/atomic fail. |
CC @golang/compiler @golang/runtime |
1 similar comment
CC @golang/compiler @golang/runtime |
Timed out in state WaitingForInfo. Closing. (I am just a bot, though. Please speak up if this is a mistake or you have the requested information.) |
This is still not fixed in Go 1.20.3 or gotip, and it appears that @randall77 and @dvyukov managed to verify the issue. Is there any further info you need me to provide? |
We're going to need a test case. I do think the code looks not quite right, but @dvyukov disagrees. We have tests for sequential consistency under I'll reopen for now. |
Yes, we need a test. What's failing? |
@dvyukov: Here's a reproduction recipe. It's based on "Left-Right - A Concurrency Control Technique with Wait-Free Population Oblivious Reads", an algorithm that requires sequential consistency. When run under Furthermore, to demonstrate what we believe to be the reported issue, you can change the package view
import (
"runtime"
"sync"
"sync/atomic"
"testing"
_ "unsafe"
)
//go:linkname sync_runtime_procPin sync.runtime_procPin
//go:nosplit
func sync_runtime_procPin() int
//go:linkname sync_runtime_procUnpin sync.runtime_procUnpin
//go:nosplit
func sync_runtime_procUnpin()
const Reading = 1
const NotReading = 0
const Left = -1
type atomicInt64 struct {
atomic.Int64
pad [120]byte
}
type leftright struct {
active atomic.Int64
writerVersion atomic.Uint64
readerVersion [2][]atomicInt64
left, right map[uintptr]uintptr
}
func (lr *leftright) init(maxprocs int) {
lr.left = make(map[uintptr]uintptr)
lr.right = make(map[uintptr]uintptr)
lr.readerVersion = [2][]atomicInt64{
make([]atomicInt64, maxprocs),
make([]atomicInt64, maxprocs),
}
lr.active.Store(Left)
}
func (lr *leftright) readerArrive(version uint64, tid int) {
v := lr.readerVersion[version&0x1]
// BUG: this breaks sequential consistency when run under -race
// Because Swap uses a LOCK XCHG in x64, you can fix the sequential consistency issue with:
// v[tid].Swap(Reading)
v[tid].Store(Reading)
}
func (lr *leftright) readerDepart(version uint64, tid int) {
v := lr.readerVersion[version&0x1]
// BUG: this breaks sequential consistency when run under -race
// Because Swap uses a LOCK XCHG in x64, you can fix the sequential consistency issue with:
// v[tid].Swap(NotReading)
v[tid].Store(NotReading)
}
func (lr *leftright) readerIsEmpty(version uint64) bool {
v := lr.readerVersion[version&0x1]
for t := range v {
if v[t].Load() == Reading {
return false
}
}
return true
}
func (lr *leftright) Read(callback func(tbl map[uintptr]uintptr, version uint64)) {
tid := sync_runtime_procPin()
vi := lr.writerVersion.Load()
lr.readerArrive(vi, tid)
if lr.active.Load() == Left {
callback(lr.left, vi)
} else {
callback(lr.right, vi)
}
lr.readerDepart(vi, tid)
sync_runtime_procUnpin()
}
func (lr *leftright) publish(active int64, callback func(tbl map[uintptr]uintptr)) {
lr.active.Store(-active)
prevVersion := lr.writerVersion.Load()
nextVersion := prevVersion + 1
for !lr.readerIsEmpty(nextVersion) {
runtime.Gosched()
}
lr.writerVersion.Store(nextVersion)
for !lr.readerIsEmpty(prevVersion) {
runtime.Gosched()
}
if -active == Left {
callback(lr.right)
} else {
callback(lr.left)
}
}
func (lr *leftright) Write(callback func(tbl map[uintptr]uintptr)) {
active := lr.active.Load()
if active == Left {
callback(lr.right)
} else {
callback(lr.left)
}
lr.publish(active, callback)
}
func TestLeftRightRace(t *testing.T) {
runtime.GOMAXPROCS(8)
const Procs = 8
const Iter = 1000000
var lr leftright
lr.init(Procs)
var wg sync.WaitGroup
wg.Add(Procs + 1)
for proc := 0; proc < Procs; proc++ {
go func() {
defer wg.Done()
for i := 0; i < Iter; i++ {
lr.Read(func(tbl map[uintptr]uintptr, _ uint64) {
_ = tbl[0]
})
}
}()
}
go func() {
defer wg.Done()
for i := 0; i < Iter; i++ {
lr.Write(func(tbl map[uintptr]uintptr) {
tbl[0] = uintptr(i)
})
}
}()
wg.Wait()
} |
It's quite hard to understand what it's doing and ensure the code itself is correct. A simpler example that just captures the pattern where race detector fails would help. The race detector also reports a race on this code. Is it a false race report? |
I know, but reducing the test case further is proving to be very complicated and I currently don't have more cycles to spend on this this week. Can we agree that the reproduction recipe is valid, for now?
Do you mean the data race on Have you attempted running the same code but replacing the two calls to Can we agree that replacing an atomic If you agree with the underlying issue and can reproduce the bug with the provided example, I'll gladly spend some more time (probably next week) trying to minimize this test case so it can be incorporated in the standard library's test suite. Thanks for taking the time to look into this. I know the issue is hairy! |
Just to clarify, this bug report is not about the race detector failing. It's about the race detector weakening the sequential consistency of Go programs, so that algorithms like Left/Right, which require sequential consistency, stop behaving properly when they're being run under the race detector. |
I don't know. It took humanity a decade to shake out bugs of a simple lock-free stack. And you are providing a much more involved algorithm.
I am confused. But I think I see what may be the issue. It has nothing to do with tsan providing sequential consistency for executions or not, but rather with race detection algorithm. C++ does have a difference between sequentially consistent store and an RMW operations:
In the absence of a formal Go memory model, I assumed that Go follows C++ in these aspects (e.g. because it's compiled by the same backends that generally respect C++ rules only). And the Changing the code to the following also prevents the race report by establishing transitive HB relation between goroutines mapped to the same proc:
And the root cause is If you go this route, you need to somehow teach race detector about the transitive synchronization provided by the runtime scheduler. |
Thanks for the review! Let me see if I understand the situation: you're saying that by pinning the reader goroutines to specific |
Yes. Looks accurate. |
Thanks again for the review, @dvyukov. I don't know if making TSAN aware of Go scheduler pinning is in scope, since this would only apply when race testing internal runtime code. I'll leave that up to you! |
The race detector intentionally ignores all synchronization in the runtime and does not test it. It's intended for end users for testing of their code. |
So let me just summarize where I think we are. The issue here is that the race detector is reporting on a race that you think it shouldn't be. The issue is not race mode breaking sequential consistency semantics. That is, if we modified the race detector to do its normal thing but not report any races, the program would still run correctly. We think the race detector is reporting races because it cannot see happens-before edges introduced by procPin/procUnpin that are required to infer a race-free execution. If all that sounds right, then I don't think there is anything to do here on the Go side. procPin/procUnpin is not a supported API and we don't make any promises about its use via linkname. |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes, it reproduces on the tip of
master
too.What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
I was working on some lock-free algorithms in Go and found something interesting. Here's a break down:
sync/atomic
package. This is explicitly declared in the documentation that @rsc carefully crafted!AMD64
, the Go compiler always emitsXCHG
instructions for bothatomic.StoreXX
andatomic.SwapXX
. In fact, these two atomic functions are interchangeable in the codegen if you don't care about the output.go/src/cmd/compile/internal/ssa/_gen/AMD64.rules
Lines 526 to 535 in 4df10fb
-race
), the Go codegen for x64 no longer applies, as the atomic operations are delegated to the TSAN runtime library so they can be checked for races:go/src/runtime/race_amd64.s
Lines 234 to 238 in 4df10fb
go/src/runtime/race_amd64.s
Lines 259 to 263 in 4df10fb
Store
andSwap
, unlike the Go codegen which would emit identical code for both.https://github.com/llvm/llvm-project/blob/79649eacbc113a287d380fd7a0dab60302078eb4/compiler-rt/lib/tsan/rtl/tsan_interface_atomic.cpp
atomic64_exchange
is emitted with C++'sstd::memory_order_acq_rel
, which coincidentally in x64 is equivalent tostd::memory_order_seq_cst
(it emits aLOCK XCHG
like Go would). However,atomic64_store
is emitted withstd::memory_order_release
which is not sequentially consistent! In fact, it only emits aMOV
because writes in x64 imply release semantics.-race
, calls toatomic.Store64
in Go code are actually not sequentially consistent!It took me and @dbussink a good while to figure this one out. We were porting a lock-free algorithm that required sequentially consistent semantics and we were extremely puzzled when the algorithm worked correctly when ran normally and yet it broke down and started emitting wrong results without any race errors when ran under
-race
. It got even more confusing when replacing anatomic.StoreInt64
with anatomic.SwapInt64
fixed the algorithm.We think this explains the issue we were seeing quite well!
What did you expect to see?
I would expect the sequentially consistent semantics of Go atomics to be maintained when running under the race detector. I would expect the actual assembly instructions for the atomic operations to be identical when running under the race detector (modulo all the helper code that TSAN adds around them to actually track races).
What did you see instead?
The race detector actually introduced a race in my code that didn't exist before! I want my money back!
The text was updated successfully, but these errors were encountered: