-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: arm64 fatal error when compiling docker #13854
Comments
I'll try to reproduce this tomorrow. What hardware are you running on?
|
I am running on Cavium Thunderx arm64 platform. You can try on any arm64 platform On Thu, Jan 7, 2016 at 2:41 PM, Michael Hudson-Doyle <
|
Tentatively classifying as compiler problem although I suppose it could be linux-arm64-specific runtime code. The report is for Go 1.5.2 but we have no reason to believe the current tip is better. |
Ah, can you provide reproduction instructions that don't require sudo? (I can get access to machines where I have sudo, but it takes a while) |
@vijaykilari Can we get a reproduction that does not require sudo? Can you show us a backtrace with current tip with GOTRACEBACK=2 in the environment? Thanks. |
The fact that this is a 1 in 10 failure suggests that it's probably not a compiler bug. I looked at the assembly for arm64 for Cas and Xchg for possible errors that might manifest only under contention. I don't see anything, but I also am not terribly familiar with the architecture. @4ad, maybe there is some subtle bug in those? Missing barriers? Incorrect behavior in Cas or Xchg if the the store-release fails? Something else? None of that is particularly well tested, although the very basic non-concurrent case does get tested at startup. |
The code for all our arm64 atomics matches the code in the Linux kernel, and the code generated by GCC intrinsics. I have run the sync, sync/atomic, and runtime tests millions of times (quite literally, I've ran them in a loop for over a month) on the 48 core (or was it 2*48?) Cavium Thunderx platform (which @vijaykilari thinks we don't have access to). However, even with all of that working, maybe I misunderstand something about the Go memory model and Go needs something stronger than C here. We can try strengthening the atomics here, and see if this problem persists. |
I'd be surprised if Go needs more than the Linux kernel does. But since you have access to a relevant machine, @4ad, do you also have root access? If so, can you run the reproduction case that @vijaykilari suggested? |
I might get root access on it again, but it might take a while. I'll put on a request. |
@4ad, any luck? |
@4ad, which GCC intrinsics do they match? I'm wondering because they specifically have to be sequentially consistent to match the x86 atomics, and it's really easy to be less than sequentially consistent on ARM. Unfortunately, I don't know the semantics of the ARM instructions well enough to be sure, but I don't see any DMBs in the arm64 code, and my understanding is that STLR is only store-release, not sequentially consistent. |
It looks to me like GCC will emit a dmb instruction for an atomic load or store that uses __ATOMIC_SEQ_CST. |
@4ad, I modeled a sequential consistency test (the same standard test TestStoreLoadSeqCst64 uses) using our current ARM64 StoreUint64/LoadUint64 implementation in the ARMMEM tool and it confirms my concern that our current implementation is not sequentially consistent. (I was't able to get TestStoreLoadSeqCst64 to fail on the one physical ARM64 I have access to, but that doesn't mean it's not broken.)
(Unfortunately non-interactive mode fails with a stack overflow, but if you go in to interactive mode and complete either "Commit" action, then hit "Auto" it will work and prove that this snippet is not sequentially consistent.) If I put DMBs between the two pairs of STLR and LDAR, then it passes the test. |
@vijaykilari, are you still able to reproduce this? Would you be able to try it with a small patch to the Go runtime and sync/atomic packages? |
On 3 February 2016 at 12:08, Austin Clements notifications@github.com
Cheers, |
Based on /proc/cpuinfo, it's an "APM X-Gene Mustang" |
I have access to the same: |
@aclements please send a CL with the barriers, in both runtime/internal/atomic and sync/atomic. Thanks. |
Yes, I can test your patch. Please share the details. On Wed, Feb 3, 2016 at 4:40 AM, Austin Clements notifications@github.com
|
Let's just check in the patch and then test it. More memory barriers cannot
make arm64 worse, and if they've been model-checked that's good enough for
me.
Russ
|
I spent quite a while trying to figure out what the minimal barriers are to make this work and eventually gave up. https://go-review.googlesource.com/19182 just adds full barriers all over the place. I'll keep looking in to this, but that's certainly sufficient, if overkill. I'm also not convinced I'm using ARMMEM correctly. Or the tool has a bug. I tried a |
CL https://golang.org/cl/19182 mentions this issue. |
@ianlancetaylor, when does GCC emit a DMB? I wasn't able to get it to (much to my surprise, it just emitted stlr and ldar, like we do, regardless of what loads or stores of which variables I did). |
I should add that I tried with gcc 4.8.2. |
@vijaykilari, can you try it with https://go-review.googlesource.com/19182? |
Looking deeper, it looks GCC uses a dmb for the older __sync builtins, but does not for the newer __atomic builtins. I'm not sure why. This does suggest that I was mistaken, and that you can get adequate synchronization without dmb. For example, GCC will generate dmb for this input file:
|
Hmm. That doesn't generate a dmb for me with gcc 4.8.2. Are you sure that's 64-bit ARMv8? I've reread the ARM memory model several times now and I think the way we implement atomics on arm64 means there's a (sequentially consistent) total order over atomic operations, but allows roach-motel reordering between atomics operations and non-atomic operations. This is certainly weaker than what we implement on x86 and I believe on 32-bit arm. It's possibly also weaker than our unwritten memory model or the assumptions of the runtime. @dvyukov? However, I've also been reading over the lock/unlock implementation and I don't see how this could lead to this particular panic. |
@bradfitz Not yet, still working on it.
@aclements the __atomic ones, like you and @ianlancetaylor eventually found out. |
That is definitely arm64, and explicitly specifying -march=armv8-a doesn't change it. I'm using GCC tip, though. Looks like it changed a few months ago after plenty of discussion at https://gcc.gnu.org/PR65697 . |
This is be a known hardware bug with the revision of the chip Vijay is using. I did not workaround it in GCC yet because that revision was not going into production and newer versions of the chip is fixed. It is worked around in OpenJDK though: http://hg.openjdk.java.net/jdk9/dev/hotspot/rev/6e2422a230fd . I should say the only place where a DMB is needed is right before the LDAR for the errata. |
ahh, thanks for the info.
I guess for 1.6, CL 19182 could go in, and in 1.7, we parse the
/proc/cpuinfo
to switch between two atomics implementations?
Austin, any performance benchmark results for your CL?
|
@apinski-cavium (or anyone), any sense of how common this preproduction hardware is? @minux, no benchmarks. I'll try to collect some this morning. |
@apinski-cavium, can you point me to the errata? I wasn't able to find it. |
We don't have it documented in the public (yet). Here is the part of it from internal errata page (which is exactly the comment above):
--- CUT --- This only happens for the following output of /proc/cpuinfo: That is when variant is 1, ThunderX chip does not have the errata. |
Also golang is not a jit so it should not be checking /proc/cpuinfo at all unless someone says he wants a native compiling. There should be an option to enable/disable this workaround by default. Note as I mentioned this variant of the chip is not going to production, we should have the default default to be disabled. |
@apinski-cavium, thanks for the errata. I'm inclined to simply not fix this, since it sounds like this hardware is quite rare and it's not even fixed in GCC. Also, since Go is not JITed, we would have to add a branch to all of the ARM64 atomics to implement the workaround, which would incur a (small) performance penalty on all ARM64 devices.
I'm not sure what you mean by this. There is nothing but native compilation, so if we wanted to work around this, we would have to check /proc/cpuinfo (or somehow get the CPU variant and part) at runtime and optionally enable this workaround. We can't make this decision at compile time or let the user make this decision at compile time because they may compile on a machine that doesn't have this errata and then run on a machine that does. (We do this sort of thing for hardware features on ARM and x86, though so far we get everything from auxv sections or direct detection.) |
We add codegen options incredibly sparingly. I don't think we'll add something like GO386 or GOARM for an unreleased processor. |
I don't think it is worth fixing either. It might be worth detecting the bad processor and failing early, though. |
@aclements @apinski-cavium is a GCC developer, and the GCC approach for these issues is always another command line option. @apinski-cavium Go takes a different approach. By comparison to GCC, the compiler has approximately zero compiler options. Instead, when essential decisions must be made based on the CPU, it is done using a runtime check and a runtime branch. The Go compiler doesn't do any sort of advanced processor-specific scheduling anyhow. |
I am going to close this bug. It seems the right default assumption that pre-production hardware is not commonly used. People using pre-production hardware should typically expect problems (generally, not just with Go). On top of that, the arm64 port is currently only experimental. Detecting this one buggy piece of hardware not in common use in our experimental port seems to me unnecessary complexity. For @vijaykilari and others, I think Austin's CL does fix the problem. Since people using arm64 are building from source already, it seems fine to have to apply an extra software fix for the hardware bug. At the time we decide to make the arm64 port a first-class port, if evidence has arrived that the use of this buggy hardware is in fact common, then we can revisit detection and/or mitigation. But our default should be to assume it is not common. |
While generating docker binary for arm64, I face below fatal error reported from
go compiler as below.
I have shared my docker source code at https://github.com/vijaykilari/docker
clone to arm64 platform and issue command 'sudo make build'
This issue occurs 1 in 10 times
The text was updated successfully, but these errors were encountered: