New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: SIGILL in runtime.mapiternext on amd64 with 1.6rc2 #14306
Comments
I see that your test case has no concurrency, but have you tried running it with the -race option anyhow? |
Can you tell us which instruction is at the reported PC value when the program gets a SIGILL trap? You can use |
I ran the test several times with hashmap.go:717 0xa918 48895c2440 MOVQ BX, 0x40(SP)
hashmap.go:718 0xa91d 4883f800 CMPQ $0x0, AX
hashmap.go:718 0xa921 0f8428050000 JE 0xae4f
hashmap.go:718 0xa927 440fb6c2 MOVZX DL, R8
hashmap.go:718 0xa92b 4983f808 CMPQ $0x8, R8
hashmap.go:718 0xa92f 0f8313050000 JAE 0xae48
hashmap.go:718 0xa935 4a8d2c00 LEAQ 0(AX)(R8*1), BP
hashmap.go:718 0xa939 0fb65d00 MOVZX 0(BP), BX // <---- here
hashmap.go:718 0xa93d 80fb00 CMPL $0x0, BL
hashmap.go:718 0xa940 0f8400010000 JE 0xaa46
hashmap.go:718 0xa946 4889442478 MOVQ AX, 0x78(SP)
hashmap.go:718 0xa94b 8854242e MOVL DL, 0x2e(SP)
hashmap.go:718 0xa94f 440fb6c2 MOVZX DL, R8
hashmap.go:718 0xa953 4983f808 CMPQ $0x8, R8
hashmap.go:718 0xa957 0f83e4040000 JAE 0xae41
hashmap.go:718 0xa95d 4a8d2c00 LEAQ 0(AX)(R8*1), BP
hashmap.go:718 0xa961 0fb65d00 MOVZX 0(BP), BX
hashmap.go:718 0xa965 80fb01 CMPL $0x1, BL
hashmap.go:718 0xa968 0f84d8000000 JE 0xaa46
hashmap.go:719 0xa96e 488b5c2430 MOVQ 0x30(SP), BX
if b.tophash[offi] != empty && b.tophash[offi] != evacuatedEmpty { |
The crash seems easily reproducible.
all of these were at the same edit: Can repro even faster-er by passing |
Can you try reproducing the problem on a different physical machine? |
Just tried and got the same result on OS X 10.11+Xcode 7.2.1. What happens if you run the test with GOGC=off? On my laptop, I don't see any crash when GOGC=off. |
Interesting. I have a Intel(R) Core(TM) i7-3520M on my laptop. It's running GNU/Linux, not Darwin, but the generated instructions around where the SIGILL occurs is exactly the same. I can't recreate the problem. Can you show us the stack trace you get if you set the environment variable GOTRACEBACK=system while running the test? |
I reproduced this on Linux in one of our backend clusters (in an Ubuntu 15.10 Docker container running on CoreOS 835.12.0 with kernel 4.2.2-coreos-r2 on an Intel E5-2620):
Signature is different through:
That PC is the same instruction:
|
@mikioh I've run 123 iterations of the above command with @ianlancetaylor here's the output with
|
If this only happens when the GC is running then I wonder if the stack is somehow shrinking at a bad time. Does it happen if you let the GC run as usual but set the environment variable GODEBUG=gcshrinkstackoff=1 ? |
Yes this happens with |
I've confirmed this on linux/amd64, rev
% go version On Fri, Feb 12, 2016 at 6:17 PM, Benoit Sigoure notifications@github.com
|
Looking into this further, the crash happens here
That is, in the case that should not happen. The key value may be somehow corrupt, because if we remove the printing of the key from the b.Fatal, eventually we get: BenchmarkGetFromMapWithMapKey-80 --- FAIL: BenchmarkGetFromMapWithMapKey-80 I thought this was a liveness issue, but moving the key declaration to the package scope does not solve the problem:
BenchmarkGetFromMapWithMapKey-119 --- FAIL: BenchmarkGetFromMapWithMapKey-119 |
Neither |
I have extracted a repo, http://play.golang.org/p/L4AQ2hPIB0 This fails quickly run with a random GOMAXPROCS The program should not exit with the WTF message, that means the key constructed was not found in the map, printing the key will most likely cause a more serious panic because its values are garbage. |
My guess at this point is a missing write barrier. It's in the New function of Dave's repo:
Since we take the address of t and that address is returned, t is heap allocated. When t is initialized with t := intf.(type), we allocate a chunk of memory using newobject (the type being map[string]interface{}) and store the data portion of the interface to it. That write does not have a write barrier, and I think it should. |
Fix out for review: https://go-review.googlesource.com/#/c/19481/ |
CL https://golang.org/cl/19481 mentions this issue. |
What version of Go are you using (go version)?
What operating system and processor architecture are you using?
Mac OS X 10.9.5 on Intel Core i7-3615QM
What did you do?
in the package
github.com/aristanetworks/goarista
checked out at revision tsuna/goarista@d53bff3What did you expect to see?
Benchmarks should pass.
What did you see instead?
I've seen this once:
The benchmark is very simple and doesn't involve any concurrency.
I recently switched to 1.6rc2 for my dev env on my laptop, and I've seen another crash with a similar signature in another more complicated test that involves non-open source code. I blamed the crash on a potential race condition to track down and made a note for later. But now that this crash happened on a very simple test case in a very small and self-contained library that is open-source, I feel compelled to open a bug to have this investigated.
The other crash I've seen was (there are 2 stack frames I "anonymized", the other ones involve open-source code too):
In 6 days of using Go 1.6rc2 these are the only two weird crashes that I've seen.
The text was updated successfully, but these errors were encountered: