|
|
Descriptionsync: use atomic load operations in Mutex
Because of the lack of memory model for atomic operations,
I cannot say whether or not the previous code was wrong.
But I believe that we don't want to mix atomic stores (CAS)
and non atomic loads to the same variable.
Patch Set 1 #Patch Set 2 : diff -r d797b2316601 https://code.google.com/p/go #Patch Set 3 : diff -r 209bb94bd691 https://code.google.com/p/go #MessagesTotal messages: 10
Hello golang-codereviews@googlegroups.com, I'd like you to review this change to https://code.google.com/p/go
Sign in to reply to this message.
I found bunch of other places in sync package where we do update variables with atomic operations and then load it without atomic load ops. Maybe I should abandon this change. On Thu, May 8, 2014 at 7:31 PM, <ruiu@google.com> wrote: > Reviewers: golang-codereviews, > > Message: > Hello golang-codereviews@googlegroups.com, > > I'd like you to review this change to > https://code.google.com/p/go > > > Description: > sync: use atomic load operations in Mutex > > Because of the lack of memory model for atomic operations, > I cannot say whether or not the previous code was wrong. > But I believe that we don't want to mix atomic stores (CAS) > and non atomic loads to the same variable. > > Please review this at https://codereview.appspot.com/91230048/ > > Affected files (+2, -2 lines): > M src/pkg/sync/mutex.go > > > Index: src/pkg/sync/mutex.go > =================================================================== > --- a/src/pkg/sync/mutex.go > +++ b/src/pkg/sync/mutex.go > @@ -49,7 +49,7 @@ > > awoke := false > for { > - old := m.state > + old := atomic.LoadInt32(&m.state) > new := old | mutexLocked > if old&mutexLocked != 0 { > new = old + 1<<mutexWaiterShift > @@ -104,6 +104,6 @@ > runtime_Semrelease(&m.sema) > return > } > - old = m.state > + old = atomic.LoadInt32(&m.state) > } > } > > >
Sign in to reply to this message.
Formally you right that these are data races. However, I am concerned about performance. Please run and attach results of sync.Mutex benchmarks. I don't think that it needs to go into 1.3. There is an outstanding patch to support atomic intrinsics in the compiler for C. But since we are slowly moving runtime from C to Go, I think we will need a similar patch for Go atomics. And then we can easily fix this.
Sign in to reply to this message.
On Fri, May 9, 2014 at 3:30 AM, <dvyukov@google.com> wrote: > Formally you right that these are data races. > However, I am concerned about performance. Please run and attach results > of sync.Mutex benchmarks. > I don't think that it needs to go into 1.3. There is an outstanding > patch to support atomic intrinsics in the compiler for C. But since we > are slowly moving runtime from C to Go, I think we will need a similar > patch for Go atomics. And then we can easily fix this. > I think I don't understand the point. Does this code need to be fixed? It is about correctness, so if it needs to be fixed, it needs to be fixed regardless of benchmark results, no? Also, in the fast path in Mutex.Lock, we only use a CAS. No semaphore nor memory barrier. Does it really satisfy Mutex's happens-before semantics?
Sign in to reply to this message.
On Sun, May 11, 2014 at 4:34 AM, Rui Ueyama <ruiu@google.com> wrote: > On Fri, May 9, 2014 at 3:30 AM, <dvyukov@google.com> wrote: >> >> Formally you right that these are data races. >> However, I am concerned about performance. Please run and attach results >> of sync.Mutex benchmarks. >> I don't think that it needs to go into 1.3. There is an outstanding >> patch to support atomic intrinsics in the compiler for C. But since we >> are slowly moving runtime from C to Go, I think we will need a similar >> patch for Go atomics. And then we can easily fix this. > > > I think I don't understand the point. Does this code need to be fixed? It is > about correctness, so if it needs to be fixed, it needs to be fixed > regardless of benchmark results, no? I have not seen any evidence that the current code can break with the current toolchain. Have you? However, *formally* it's incorrect, so it can break in future. > Also, in the fast path in Mutex.Lock, we only use a CAS. No semaphore nor > memory barrier. Does it really satisfy Mutex's happens-before semantics? CAS provides full barrier.
Sign in to reply to this message.
On Sat, May 10, 2014 at 11:59 PM, Dmitry Vyukov <dvyukov@google.com> wrote: > On Sun, May 11, 2014 at 4:34 AM, Rui Ueyama <ruiu@google.com> wrote: > > On Fri, May 9, 2014 at 3:30 AM, <dvyukov@google.com> wrote: > >> > >> Formally you right that these are data races. > >> However, I am concerned about performance. Please run and attach results > >> of sync.Mutex benchmarks. > >> I don't think that it needs to go into 1.3. There is an outstanding > >> patch to support atomic intrinsics in the compiler for C. But since we > >> are slowly moving runtime from C to Go, I think we will need a similar > >> patch for Go atomics. And then we can easily fix this. > > > > > > I think I don't understand the point. Does this code need to be fixed? > It is > > about correctness, so if it needs to be fixed, it needs to be fixed > > regardless of benchmark results, no? > > I have not seen any evidence that the current code can break with the > current toolchain. Have you? > No I haven't. > However, *formally* it's incorrect, so it can break in future. > > > > Also, in the fast path in Mutex.Lock, we only use a CAS. No semaphore nor > > memory barrier. Does it really satisfy Mutex's happens-before semantics? > > CAS provides full barrier. > It looks to me that Go's CAS implementation for ARM (*1) does not provide full barrier. I'm not an ARM expert, so correct me if I'm wrong, but I'd think we need an explicit memory fence operation besides LDREX/STREX on ARM, unlike x86's CMPXCHG with a LOCK prefix. On Linux/ARM, it looks like the kernel's implementation (*2) will be used to cmpxchg, and it has DMB memory barrier instructions. *1 http://golang.org/src/pkg/sync/atomic/asm_arm.s *2 https://github.com/torvalds/linux/blob/f8fe23ec4e89b58e63085ea92348aff3bcca3e...
Sign in to reply to this message.
On Mon, May 12, 2014 at 12:43 AM, Rui Ueyama <ruiu@google.com> wrote: > On Sat, May 10, 2014 at 11:59 PM, Dmitry Vyukov <dvyukov@google.com> wrote: >> >> On Sun, May 11, 2014 at 4:34 AM, Rui Ueyama <ruiu@google.com> wrote: >> > On Fri, May 9, 2014 at 3:30 AM, <dvyukov@google.com> wrote: >> >> >> >> Formally you right that these are data races. >> >> However, I am concerned about performance. Please run and attach >> >> results >> >> of sync.Mutex benchmarks. >> >> I don't think that it needs to go into 1.3. There is an outstanding >> >> patch to support atomic intrinsics in the compiler for C. But since we >> >> are slowly moving runtime from C to Go, I think we will need a similar >> >> patch for Go atomics. And then we can easily fix this. >> > >> > >> > I think I don't understand the point. Does this code need to be fixed? >> > It is >> > about correctness, so if it needs to be fixed, it needs to be fixed >> > regardless of benchmark results, no? >> >> I have not seen any evidence that the current code can break with the >> current toolchain. Have you? > > > No I haven't. > >> >> However, *formally* it's incorrect, so it can break in future. >> >> >> > Also, in the fast path in Mutex.Lock, we only use a CAS. No semaphore >> > nor >> > memory barrier. Does it really satisfy Mutex's happens-before semantics? >> >> CAS provides full barrier. > > > It looks to me that Go's CAS implementation for ARM (*1) does not provide > full barrier. I'm not an ARM expert, so correct me if I'm wrong, but I'd > think we need an explicit memory fence operation besides LDREX/STREX on ARM, > unlike x86's CMPXCHG with a LOCK prefix. On Linux/ARM, it looks like the > kernel's implementation (*2) will be used to cmpxchg, and it has DMB memory > barrier instructions. > > *1 http://golang.org/src/pkg/sync/atomic/asm_arm.s > *2 > https://github.com/torvalds/linux/blob/f8fe23ec4e89b58e63085ea92348aff3bcca3e... Now the question is -- how was it working before?... I've mailed https://codereview.appspot.com/93280043/ but there is even no DMB instruction in the ARM assembler...
Sign in to reply to this message.
On Mon, May 12, 2014 at 10:10 AM, Dmitry Vyukov <dvyukov@google.com> wrote: > On Mon, May 12, 2014 at 12:43 AM, Rui Ueyama <ruiu@google.com> wrote: > > On Sat, May 10, 2014 at 11:59 PM, Dmitry Vyukov <dvyukov@google.com> > wrote: > >> > >> On Sun, May 11, 2014 at 4:34 AM, Rui Ueyama <ruiu@google.com> wrote: > >> > On Fri, May 9, 2014 at 3:30 AM, <dvyukov@google.com> wrote: > >> >> > >> >> Formally you right that these are data races. > >> >> However, I am concerned about performance. Please run and attach > >> >> results > >> >> of sync.Mutex benchmarks. > >> >> I don't think that it needs to go into 1.3. There is an outstanding > >> >> patch to support atomic intrinsics in the compiler for C. But since > we > >> >> are slowly moving runtime from C to Go, I think we will need a > similar > >> >> patch for Go atomics. And then we can easily fix this. > >> > > >> > > >> > I think I don't understand the point. Does this code need to be fixed? > >> > It is > >> > about correctness, so if it needs to be fixed, it needs to be fixed > >> > regardless of benchmark results, no? > >> > >> I have not seen any evidence that the current code can break with the > >> current toolchain. Have you? > > > > > > No I haven't. > > > >> > >> However, *formally* it's incorrect, so it can break in future. > >> > >> > >> > Also, in the fast path in Mutex.Lock, we only use a CAS. No semaphore > >> > nor > >> > memory barrier. Does it really satisfy Mutex's happens-before > semantics? > >> > >> CAS provides full barrier. > > > > > > It looks to me that Go's CAS implementation for ARM (*1) does not provide > > full barrier. I'm not an ARM expert, so correct me if I'm wrong, but I'd > > think we need an explicit memory fence operation besides LDREX/STREX on > ARM, > > unlike x86's CMPXCHG with a LOCK prefix. On Linux/ARM, it looks like the > > kernel's implementation (*2) will be used to cmpxchg, and it has DMB > memory > > barrier instructions. > > > > *1 http://golang.org/src/pkg/sync/atomic/asm_arm.s > > *2 > > > https://github.com/torvalds/linux/blob/f8fe23ec4e89b58e63085ea92348aff3bcca3e... > > > Now the question is -- how was it working before?... > I've mailed https://codereview.appspot.com/93280043/ but there is even > no DMB instruction in the ARM assembler... > Yeah, it's surprising to find that Mutex behaves (at least in theory) wrong on ARM. But it's wrong only on FreeBSD and NetBSD. On Linux, Go uses a kernel-provided CAS function instead of its own, located at a high address (0xffff0fc0) in the user memory space. That kernel function provides memory barrier. Maybe no one is really running Go on FreeBSD or NetBSD on recent multi-core ARM chips? Or the bug happens in extremely rare conditions? I don't know why it haven't been found earlier.
Sign in to reply to this message.
On Mon, May 12, 2014 at 1:48 PM, 'Rui Ueyama' via golang-codereviews < golang-codereviews@googlegroups.com> wrote: > > Yeah, it's surprising to find that Mutex behaves (at least in theory) > wrong on ARM. > > But it's wrong only on FreeBSD and NetBSD. On Linux, Go uses a > kernel-provided CAS function instead of its own, located at a high address > (0xffff0fc0) in the user memory space. That kernel function provides memory > barrier. > we also fallback to LDREX/STREX based implementation when running on very old linux kernels, but probably people aren't using pre 3.1 kernels (that introduced __kuser_cmpxchg64, and __kuser_cmpxchg32 was introduced much earlier). > > Maybe no one is really running Go on FreeBSD or NetBSD on recent > multi-core ARM chips? Or the bug happens in extremely rare conditions? I > don't know why it haven't been found earlier. > NetBSD doesn't support ARM SMP, and FreeBSD just enabled support in -current 1 month ago, so that could explain why we haven't found this bug before. we don't have SMP builders for FreeBSD, and the issue for supporting FreeBSD 11 (-current) on ARM is pushed back to Go 1.4, so we can't add a SMP builder for 1.3 either.
Sign in to reply to this message.
On Mon, May 12, 2014 at 11:00 AM, minux <minux.ma@gmail.com> wrote: > > On Mon, May 12, 2014 at 1:48 PM, 'Rui Ueyama' via golang-codereviews < > golang-codereviews@googlegroups.com> wrote: >> >> Yeah, it's surprising to find that Mutex behaves (at least in theory) >> wrong on ARM. >> >> But it's wrong only on FreeBSD and NetBSD. On Linux, Go uses a >> kernel-provided CAS function instead of its own, located at a high address >> (0xffff0fc0) in the user memory space. That kernel function provides memory >> barrier. >> > we also fallback to LDREX/STREX based implementation when running on very > old linux kernels, > but probably people aren't using pre 3.1 kernels (that introduced > __kuser_cmpxchg64, and __kuser_cmpxchg32 > was introduced much earlier). > >> >> Maybe no one is really running Go on FreeBSD or NetBSD on recent >> multi-core ARM chips? Or the bug happens in extremely rare conditions? I >> don't know why it haven't been found earlier. >> > NetBSD doesn't support ARM SMP, and FreeBSD just enabled support in > -current 1 month ago, > so that could explain why we haven't found this bug before. > > we don't have SMP builders for FreeBSD, and the issue for supporting > FreeBSD 11 (-current) > on ARM is pushed back to Go 1.4, so we can't add a SMP builder for 1.3 > either. > That explains it.
Sign in to reply to this message.
|