cmd/compile: go1.8 regression: sync/atomic loop elided #19182

peterGo · 2017-02-19T03:13:06Z

sync/atomic: go1.8 regression: atomic.AddUint64 is unreliable

Sometimes atomic.AddUint64 has no effect.

Update: @cespare has discovered that the loop containing atomic.AddUint64 has been elided.

Update: @rjeczalik has reported that this behavior also occurs with atomic.StoreUint64 and atomic.CompareAndSwapUint64 functions.

atomic.go:


package main

import (
	"fmt"
	"runtime"
	"sync/atomic"
	"time"
)

var a uint64 = 0

func main() {
	runtime.GOMAXPROCS(runtime.NumCPU())
	fmt.Println(runtime.NumCPU(), runtime.GOMAXPROCS(0))

	go func() {
		for {
			atomic.AddUint64(&a, uint64(1))
		}
	}()

	for {
		val := atomic.LoadUint64(&a)
		fmt.Println(val)
		time.Sleep(time.Second)
	}
}

For go1.4, go1.5, go1.6, and go1.7:

The atomic.AddUint64(&a, uint64(1)) statement works as expected.

$ go version
go version go1.4-bootstrap-20161024 linux/amd64
go version go1.5.4 linux/amd64
go version go1.6.4 linux/amd64
go version go1.7.5 linux/amd64

$ go build atomic.go && ./atomic
4 4
0
96231831
192599210
289043510
385369439
481772231
578143106
674509741
770966820
867408361
963866833
1060299901
<SNIP>
^C

For go1.8 and go tip:

The atomic.AddUint64(&a, uint64(1)) statement appears to have no effect.

go version go1.8 linux/amd64
go version devel +1e69aef Sat Feb 18 19:01:08 2017 +0000 linux/amd64
go version devel +1e69aef Sat Feb 18 19:01:08 2017 +0000 windows/amd64

$ uname -a
Linux peter 4.8.0-36-generic #36~16.04.1-Ubuntu SMP Sun Feb 5 09:39:57 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

$ go env
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/peter/gopath"
GORACE=""
GOROOT="/home/peter/go"
GOTOOLDIR="/home/peter/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build842347949=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="0"
PKG_CONFIG="pkg-config"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"

$ go build atomic.go && ./atomic
4 4
0
0
0
0
0
0
0
0
0
<SNIP>
^C

Interestingly, we can make go1.8 and go tip work with a simple code modification that should not have any effect:

atomic_pass.go:

package main

import (
	"fmt"
	"runtime"
	"sync/atomic"
	"time"
)

var a uint64 = 0

func main() {
	runtime.GOMAXPROCS(runtime.NumCPU())
	fmt.Println(runtime.NumCPU(), runtime.GOMAXPROCS(0))

	go func() {
		for {
			new := atomic.AddUint64(&a, uint64(1))
			if new == 0 {
				runtime.Gosched() // or print()
			}
		}
	}()

	for {
		val := atomic.LoadUint64(&a)
		fmt.Println(val)
		time.Sleep(time.Second)
	}
}

$ go version
go version go1.8 linux/amd64
$ go build atomic_pass.go && ./atomic_pass
4 4
0
126492073
253613883
378888824
506065798
633247293
760383560
887553077
1014723419
<SNIP>
^C

After AddUint64(&a, uint64(1), new should be greater than zero and runtime.Gosched() should not be executed. Substituting print() for runtime.Gosched() also works. If the line runtime.Gosched() // or print() is commented out the program reverts to failure.

The text was updated successfully, but these errors were encountered:

odeke-em · 2017-02-19T03:17:06Z

Hmm. Let me tag some runtime and sync experts @aclements @dvyukov @randall77.

cespare · 2017-02-19T03:29:17Z

Looking at the assembly, the increment (and in fact the whole for loop) has been (over-)optimized away.

(Edit: I missed the jmp initially -- as @dvyukov points out, there is an empty loop.)

I'll bisect.

cespare · 2017-02-19T04:55:06Z

A git bisect indicates d6098e4.

cmd/compile: intrinsify sync/atomic for amd64

Uses the same implementation as runtime/internal/atomic.

Reorganize the intrinsic detector to make it more table-driven.

Also works on amd64p32.

/cc @randall77

rjeczalik · 2017-02-20T12:50:43Z

This bug applies also to atomic.StoreUint64 and atomic.CompareAndSwapUint64 functions, they also reproduce the buggy behaviour shown in the atomic.go repro. Just a heads up, as it may be not obvious just by reading the description.

dvyukov · 2017-02-20T13:17:38Z

The atomic.AddUint64 loop is compiled to an empty loop.
Strictly saying this is confirming behavior as we don't give any guarantees about scheduler (just like any other language, e.g. C++). This can be explained as "the goroutine is just not scheduled".
However I think we should fix this and not compile this to an empty loop. Akin to the C++ guarantee of "atomic writes should be visible to other threads in a finite amount of time". It is OK to combine up to inifinte number of non atomic writes, because that's invisible without data races, and I am not a fan of making programs with data races any more useful. But side effects of atomic writes are observable without data races, so it's fine to combine any finite number of atomic writes (e.g. atomic.Store(&x, 1); atomic.Store(&x, 2) can be compiled as atomic.Store(&x, 2)), but not an infinite number of atomic writes. Language implementation become useless when they take formal properties to the extreme.

dgryski · 2017-02-20T13:20:01Z

Is the runtime miscompiled like this?

dr2chase · 2017-02-20T14:26:37Z

I don't think the runtime contains any loops like that, so yes (same optimizations) and no (no code matching this amusing optimization).

Is this breaking any real applications? I had to write a compiler test program (for loop rescheduling, hence could not contain any function calls) very carefully to ensure that the loop remained, but that was only a test.

rjeczalik · 2017-02-20T14:35:04Z

Is this breaking any real applications?

@dr2chase: I'm using a for-loop with atomic read+cas, so I guess in go1.8 it won't work anymore: index.go#L55-L62.

RaduBerinde · 2017-02-20T14:44:24Z

@rjeczalik from @dvyukov's description above, I don't think that type of code would be affected (the loop has an exit condition).

AlekSi · 2017-02-20T14:47:23Z

Is this breaking any real applications?

The code in the original post is real enough. This program has side-effect – it prints to stdout. Someone can use this output to produce different results.

ucirello · 2017-02-20T15:47:21Z

It also affects AddUintptr/LoadUintptr, AddInt32/LoadInt32 and AddUint32/LoadUint32.

canoon · 2017-02-21T01:24:44Z

Isn't 0 a valid value after the loop has executed 2^64 times and the atomic has overflowed?

josharian · 2017-02-21T02:25:57Z

Isn't 0 a valid value after the loop has executed 2^64 times and the atomic has overflowed?

Yeah...after 584 years (at 1 ns / loop).

ghost · 2017-02-21T03:15:02Z

This is perfectly valid given Go's memory model. Are there any plans to update the memory model to include happens-before semantics on the atomic operations?

cespare · 2017-02-21T03:27:20Z

@nikdeapen there is always #5045: "define how sync/atomic interacts with memory model".

However, I'm 99% certain that any memory model updates for #5045 won't add more happens-before edges to this program.

The issue here is more one of visibility -- as @dvyukov said, the expectation is something like "atomic writes should be visible to other threads in a finite amount of time". Perhaps something similar to that would be codified in a future MM change regarding sync/atomic.

canoon · 2017-02-21T03:57:02Z

@josharian where are you getting the 1ns from? Is there a minimum amount of time an operation needs to take?

ghost · 2017-02-21T04:12:41Z

@cespare thanks for the reply.

Java has volatile (and atomic) variables which provide happens-before semantics that allow them to be used as flags and other types of synchronizers. In Go you would also have provide some type of synchronization along with the atomic variable to ensure memory visibility. But if you are already using another type of synchronization it seems unlikely that you would even need the atomic update. Maybe i'm misssing something, but I just don't see too many realistic use cases for atomic variables if they don't have happens-before semantics.

The "atomic writes should be visible to other threads in a finite amount of time" statement seem too "maybe-maybe-not" for a programming language. It seems like this would lead to bugs that are nearly impossible to test for.

I'd love to see happens-before on atomics but mutexes and channels work great for now.

dvyukov · 2017-02-21T08:07:47Z

Happens-before is unrelated to this case. Even if we document happens-before relations for atomic variables (note that currently they are still implied), there is nothing that will force the loop to see the updates as there is yet no any happens-before relations. Happens-before is only about visibility of secondary/dependent data, not about primary visibility.

To guarantee that the loop prints increasing values over times we need 2 guarantees:

Atomic writes are visible to other goroutine in a finite amount of time (yes, it's vague, but you can't do any better in a formal language specification, you can't put any absolute values in it, that's the guarantee that all other languages provide).
All runnable goroutines are executed in finite amount of time. It's also vague and I am not sure we need this. For example, neither C++ nor Java provide this guarantee. However, since we have own scheduler it's more realistic for us to provide such guarantee (but gccgo and other impls that use thread can be in trouble then).

dvyukov · 2017-02-21T08:33:39Z

@dr2chase I can imagine some real uses of this (though, probably not very representative for Go). For example, during very fine-grained parallelization you may have a dedicated goroutine running on the HT sibling spin wait for input using atomic.Load, do some (inline) computations and store result back using atomic.Store, and then repeat.

However, now I realized that any such program will necessary hang dead during GC (as the loop is not preemptible even with 1.7). And since we have forced GCs every few minutes, not allocating will not help. So I bet the original program hangs dead with 1.7 after 5 minutes, which can't be considered as "working".

AlekSi · 2017-02-21T16:51:03Z

So I bet the original program hangs dead with 1.7 after 5 minutes

Can't verify that – running it on macOS for an hour now, it's working, numbers are increasing.

RLH · 2017-02-21T18:04:19Z

"Atomic writes are visible to other goroutine in a finite amount of time." Memory model rules are usually framed as "if a then b" as opposed to "time and single events happen". I did a search for the above rule and couldn't immediately find it in any C/C++ standards doc. A reference would be of interest. I am also skeptical that a HW architectural spec would discuss time and contain the guarantees we would need.

…

On Tue, Feb 21, 2017 at 11:51 AM, Alexey Palazhchenko < ***@***.***> wrote: So I bet the original program hangs dead with 1.7 after 5 minutes Can't verify that – running it on macOS for an hour now, it's working, numbers are increasing. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#19182 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA7Wn5r4ATQWoIowS1-LY3u8XIBDoqa5ks5rexYDgaJpZM4MFV4Y> .

dr2chase · 2017-02-21T19:06:03Z

I think the "hang in 5 minutes" estimate was based on a conservative guess about GC frequency in usual programs (as opposed to this tiny test). Throw a little gratuitous memory allocation in there, it will hang soon enough. The test program I carefully wrote to avoid this was for the rescheduling-check-on-loop-backedges experiment; default compilation with no rescheduling check, call-free infinite loops cause GC to block forever.

gopherbot · 2017-02-21T20:30:17Z

CL https://golang.org/cl/37333 mentions this issue.

ianlancetaylor · 2017-02-21T21:03:08Z

@RLH The phrase to search for in the C++ standard is "forward progress."

As far as I can tell the C++ standard does not require such a program to complete, but it does "encourage" it.

If a thread offers concurrent forward progress guarantee, it will make progress (as defined above) in finite amount of time, for as long as it has not terminated, regardless of whether other threads (if any) are making progress.
The standard encourages, but doesn't require that the main thread and the threads started by std::thread offer concurrent forward progress guarantee.

dr2chase · 2017-02-21T21:11:13Z

By-the-way, try that program on a uniprocessor, let me know what it prints.

RLH · 2017-02-21T22:45:02Z

Yes thanks, it looks like the C++ documentation would consider a comparable C++ program to be a valid program and the atomic operation is eventually observable.

…

-- break glass here -- From http://en.cppreference.com/w/cpp/language/memory_model In a valid C++ program, every thread eventually does one of the following: - terminate - makes a call to an I/O library function - reads or modifies a volatile <http://en.cppreference.com/w/cpp/language/cv> object - performs an atomic operation or a synchronization operation No thread of execution can execute forever without performing any of these observable behaviors. Note that it means that a program with endless recursion or endless loop (whether implemented as a for-statement <http://en.cppreference.com/w/cpp/language/for> or by looping goto <http://en.cppreference.com/w/cpp/language/goto> or otherwise) has undefined behavior <http://en.cppreference.com/w/cpp/language/ub>. This allows the compilers to remove all loops that have no observable behavior, without having to prove that they would eventually terminate. A thread is said to *make progress* if it performs one of the execution steps above (I/O, volatile, atomic, or synchronization), blocks in a standard library function, or calls an atomic lock-free function that does not complete because of a non-blocked concurrent thread.

On Tue, Feb 21, 2017 at 4:11 PM, dr2chase ***@***.***> wrote: By-the-way, try that program on a uniprocessor, let me know what it prints. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#19182 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA7Wn_8CX9lKFzcxV5YGGFxZBLVrNLZYks5re1L6gaJpZM4MFV4Y> .

dvyukov · 2017-02-22T06:26:02Z

@dr2chase 5 minutes is actually 2 minutes, and it is based on:

// proc.go
var forcegcperiod int64 = 2 * 60 * 1e9

dvyukov · 2017-02-22T06:29:57Z

@RLH

Memory model rules are usually framed as "if a then b" as opposed to "time
and single events happen". I did a search for the above rule and couldn't
immediately find it in any C/C++ standards doc. A reference would be of
interest.

1.10/25
An implementation should ensure that the last value (in modification order) assigned by an atomic or
synchronization operation will become visible to all other threads in a finite period of time.

Yes, it's atypical, but it's still better to at least document the intention rather than say nothing. This is essentially the limit of what you can say in a formal language spec.

aclements · 2017-03-21T15:52:33Z

Re-opening for cherry-pick to 1.8.1.

This fixes #19182 upstream [1]. [1] golang/go#19182 Package-Manager: Portage-2.3.3, Repoman-2.3.2

gopherbot · 2017-04-05T15:30:13Z

CL https://golang.org/cl/39595 mentions this issue.

aclements · 2017-04-05T16:59:21Z

Cherry-picked to release.

…r do-not-remove Added a flag to generic and various architectures' atomic operations that are judged to have observable side effects and thus cannot be dead-code-eliminated. Test requires GOMAXPROCS > 1 without preemption in loop. Fixes #19182. Change-Id: Id2230031abd2cca0bbb32fd68fc8a58fb912070f Reviewed-on: https://go-review.googlesource.com/39595 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>

peterGo changed the title ~~sync/atomic: go1.8 regression: atomic.AddUint64 is unreliable~~ cmd/compile: go1.8 regression: loop elided Feb 19, 2017

ianlancetaylor added this to the Go1.8.1 milestone Feb 19, 2017

peterGo changed the title ~~cmd/compile: go1.8 regression: loop elided~~ cmd/compile: go1.8 regression: sync/atomic loop elided Feb 19, 2017

mandarjog mentioned this issue Feb 21, 2017

Switch to Go 1.8 istio/old_mixer_repo#265

Closed

mwhooker mentioned this issue Feb 21, 2017

build with go 1.8 hashicorp/packer#4568

Closed

gopherbot closed this as completed in 11b2830 Feb 22, 2017

dmcgowan mentioned this issue Mar 3, 2017

go 1.8: update dockerfile and circle distribution/distribution#2207

Merged

aclements reopened this Mar 21, 2017

gentoo-bot pushed a commit to gentoo/gentoo that referenced this issue Mar 22, 2017

dev-lang/go: 1.8-r1 revision bump

8f8a61e

This fixes #19182 upstream [1]. [1] golang/go#19182 Package-Manager: Portage-2.3.3, Repoman-2.3.2

This was referenced Mar 27, 2017

Go 1.8 vitessio/vitess#2681

Closed

Bump minimum Go version from 1.7 to 1.8. vitessio/vitess#2682

Merged

This was referenced Apr 4, 2017

Use Go 1.8 travis-ci/jupiter-brain#50

Closed

build: use Go 1.8 on Travis CI travis-ci/worker#286

Closed

aclements closed this as completed Apr 5, 2017

golang locked and limited conversation to collaborators Apr 5, 2018

gopherbot added the FrozenDueToAge label Apr 5, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/compile: go1.8 regression: sync/atomic loop elided #19182

cmd/compile: go1.8 regression: sync/atomic loop elided #19182

peterGo commented Feb 19, 2017 •

edited

Loading

odeke-em commented Feb 19, 2017

cespare commented Feb 19, 2017 •

edited

Loading

cespare commented Feb 19, 2017 •

edited

Loading

rjeczalik commented Feb 20, 2017

dvyukov commented Feb 20, 2017

dgryski commented Feb 20, 2017

dr2chase commented Feb 20, 2017

rjeczalik commented Feb 20, 2017

RaduBerinde commented Feb 20, 2017 •

edited

Loading

AlekSi commented Feb 20, 2017

ucirello commented Feb 20, 2017

canoon commented Feb 21, 2017

josharian commented Feb 21, 2017

ghost commented Feb 21, 2017

cespare commented Feb 21, 2017

canoon commented Feb 21, 2017

ghost commented Feb 21, 2017

dvyukov commented Feb 21, 2017

dvyukov commented Feb 21, 2017

AlekSi commented Feb 21, 2017

RLH commented Feb 21, 2017 via email

dr2chase commented Feb 21, 2017

gopherbot commented Feb 21, 2017

ianlancetaylor commented Feb 21, 2017 •

edited

Loading

dr2chase commented Feb 21, 2017

RLH commented Feb 21, 2017 via email

dvyukov commented Feb 22, 2017

dvyukov commented Feb 22, 2017

aclements commented Mar 21, 2017

gopherbot commented Apr 5, 2017

aclements commented Apr 5, 2017

cmd/compile: go1.8 regression: sync/atomic loop elided #19182

cmd/compile: go1.8 regression: sync/atomic loop elided #19182

Comments

peterGo commented Feb 19, 2017 • edited Loading

odeke-em commented Feb 19, 2017

cespare commented Feb 19, 2017 • edited Loading

cespare commented Feb 19, 2017 • edited Loading

rjeczalik commented Feb 20, 2017

dvyukov commented Feb 20, 2017

dgryski commented Feb 20, 2017

dr2chase commented Feb 20, 2017

rjeczalik commented Feb 20, 2017

RaduBerinde commented Feb 20, 2017 • edited Loading

AlekSi commented Feb 20, 2017

ucirello commented Feb 20, 2017

canoon commented Feb 21, 2017

josharian commented Feb 21, 2017

ghost commented Feb 21, 2017

cespare commented Feb 21, 2017

canoon commented Feb 21, 2017

ghost commented Feb 21, 2017

dvyukov commented Feb 21, 2017

dvyukov commented Feb 21, 2017

AlekSi commented Feb 21, 2017

RLH commented Feb 21, 2017 via email

dr2chase commented Feb 21, 2017

gopherbot commented Feb 21, 2017

ianlancetaylor commented Feb 21, 2017 • edited Loading

dr2chase commented Feb 21, 2017

RLH commented Feb 21, 2017 via email

dvyukov commented Feb 22, 2017

dvyukov commented Feb 22, 2017

aclements commented Mar 21, 2017

gopherbot commented Apr 5, 2017

aclements commented Apr 5, 2017

peterGo commented Feb 19, 2017 •

edited

Loading

cespare commented Feb 19, 2017 •

edited

Loading

cespare commented Feb 19, 2017 •

edited

Loading

RaduBerinde commented Feb 20, 2017 •

edited

Loading

ianlancetaylor commented Feb 21, 2017 •

edited

Loading