Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: fault in runtime.mapiternext #14904

Closed
davecheney opened this issue Mar 22, 2016 · 11 comments
Closed

runtime: fault in runtime.mapiternext #14904

davecheney opened this issue Mar 22, 2016 · 11 comments
Milestone

Comments

@davecheney
Copy link
Contributor

Please answer these questions before submitting your issue. Thanks!

  1. What version of Go are you using (go version)?

go linux/amd64, rev d4663e1

  1. What operating system and processor architecture are you using (go env)?

linux/amd64

  1. What did you do?

    cd $GOPATH/src/github.com/juju/juju/cmd/juju/status
    go test -i -v . && go test -count=20

Unfortunately reproducing this bug requires our monstrous test suite. For those who want to reproduce the issue I can setup a AWS instance with the correct dependencies.

  1. What did you expect to see?

tests pass

  1. What did you see instead?
unexpected fault address 0x0
fatal error: fault
[signal 0xb code=0x80 addr=0x0 pc=0x409722]

goroutine 610 [running]:
runtime.throw(0x12fdcfb, 0x5)
    /home/dfc/go/src/runtime/panic.go:566 +0x8c fp=0xc820aed2a8 sp=0xc820aed290
runtime.sigpanic()
    /home/dfc/go/src/runtime/sigpanic_unix.go:27 +0x27d fp=0xc820aed2f8 sp=0xc820aed2a8
runtime.mapiternext(0xc8206ae4e0)
    /home/dfc/go/src/runtime/hashmap.go:718 +0xc2 fp=0xc820aed390 sp=0xc820aed2f8
runtime.mapiterinit(0x1064940, 0xc820297980, 0xc8206ae4e0)
    /home/dfc/go/src/runtime/hashmap.go:666 +0x1bf fp=0xc820aed3c8 sp=0xc820aed390
reflect.mapiterinit(0x1064940, 0xc820297980, 0x95)
    /home/dfc/go/src/runtime/hashmap.go:1014 +0x4a fp=0xc820aed3f0 sp=0xc820aed3c8
reflect.Value.MapKeys(0x1064940, 0xc820ac8398, 0x95, 0x0, 0x0, 0x45c)
    /home/dfc/go/src/reflect/value.go:1074 +0xb1 fp=0xc820aed490 sp=0xc820aed3f0
encoding/json.(*mapEncoder).encode(0xc820252330, 0xc82020e0b0, 0x1064940, 0xc820ac8398, 0x95, 0x400)
    /home/dfc/go/src/encoding/json/encode.go:614 +0x9a fp=0xc820aed530 sp=0xc820aed490
encoding/json.(*mapEncoder).(encoding/json.encode)-fm(0xc82020e0b0, 0x1064940, 0xc820ac8398, 0x95, 0x0)
    /home/dfc/go/src/encoding/json/encode.go:632 +0x51 fp=0xc820aed568 sp=0xc820aed530
encoding/json.(*structEncoder).encode(0xc8202f02a0, 0xc82020e0b0, 0x1203a80, 0xc820ac8360, 0x99, 0xc820ac8300)
    /home/dfc/go/src/encoding/json/encode.go:587 +0x230 fp=0xc820aed6d0 sp=0xc820aed568
encoding/json.(*structEncoder).(encoding/json.encode)-fm(0xc82020e0b0, 0x1203a80, 0xc820ac8360, 0x99, 0xc82038cb00)
    /home/dfc/go/src/encoding/json/encode.go:601 +0x51 fp=0xc820aed708 sp=0xc820aed6d0
encoding/json.(*mapEncoder).encode(0xc8202523d8, 0xc82020e0b0, 0x1065240, 0xc82017b948, 0x95, 0x200)
    /home/dfc/go/src/encoding/json/encode.go:622 +0x245 fp=0xc820aed7a8 sp=0xc820aed708
encoding/json.(*mapEncoder).(encoding/json.encode)-fm(0xc82020e0b0, 0x1065240, 0xc82017b948, 0x95, 0x0)
    /home/dfc/go/src/encoding/json/encode.go:632 +0x51 fp=0xc820aed7e0 sp=0xc820aed7a8
encoding/json.(*structEncoder).encode(0xc8205fdef0, 0xc82020e0b0, 0x1194940, 0xc82017b920, 0x99, 0x1194900)
    /home/dfc/go/src/encoding/json/encode.go:587 +0x230 fp=0xc820aed948 sp=0xc820aed7e0
encoding/json.(*structEncoder).(encoding/json.encode)-fm(0xc82020e0b0, 0x1194940, 0xc82017b920, 0x99, 0xc82017b900)
    /home/dfc/go/src/encoding/json/encode.go:601 +0x51 fp=0xc820aed980 sp=0xc820aed948
encoding/json.(*encodeState).reflectValue(0xc82020e0b0, 0x1194940, 0xc82017b920, 0x99)
    /home/dfc/go/src/encoding/json/encode.go:301 +0x6e fp=0xc820aed9b0 sp=0xc820aed980
encoding/json.interfaceEncoder(0xc82020e0b0, 0x10358a0, 0xc8206e83f0, 0x194, 0x0)
    /home/dfc/go/src/encoding/json/encode.go:560 +0xba fp=0xc820aed9e8 sp=0xc820aed9b0
encoding/json.(*structEncoder).encode(0xc8203df0e0, 0xc82020e0b0, 0x11efda0, 0xc8206e8380, 0x199, 0x0)
    /home/dfc/go/src/encoding/json/encode.go:587 +0x230 fp=0xc820aedb50 sp=0xc820aed9e8
encoding/json.(*structEncoder).(encoding/json.encode)-fm(0xc82020e0b0, 0x11efda0, 0xc8206e8380, 0x199, 0xc8206e8300)
    /home/dfc/go/src/encoding/json/encode.go:601 +0x51 fp=0xc820aedb88 sp=0xc820aedb50
encoding/json.(*ptrEncoder).encode(0xc8204a2100, 0xc82020e0b0, 0x1082b40, 0xc8206e8380, 0x16, 0x1082b00)
    /home/dfc/go/src/encoding/json/encode.go:709 +0xd0 fp=0xc820aedbc0 sp=0xc820aedb88
encoding/json.(*ptrEncoder).(encoding/json.encode)-fm(0xc82020e0b0, 0x1082b40, 0xc8206e8380, 0x16, 0xc8206e8300)
    /home/dfc/go/src/encoding/json/encode.go:714 +0x51 fp=0xc820aedbf8 sp=0xc820aedbc0
encoding/json.(*encodeState).reflectValue(0xc82020e0b0, 0x1082b40, 0xc8206e8380, 0x16)
    /home/dfc/go/src/encoding/json/encode.go:301 +0x6e fp=0xc820aedc28 sp=0xc820aedbf8
encoding/json.(*encodeState).marshal(0xc82020e0b0, 0x1082b40, 0xc8206e8380, 0x0, 0x0)
    /home/dfc/go/src/encoding/json/encode.go:274 +0x93 fp=0xc820aedc58 sp=0xc820aedc28
encoding/json.Marshal(0x1082b40, 0xc8206e8380, 0x11efda0, 0x2, 0x134e5d8, 0xc8206e8380, 0x80)
    /home/dfc/go/src/encoding/json/encode.go:139 +0x5f fp=0xc820aedc90 sp=0xc820aedc58
golang.org/x/net/websocket.jsonMarshal(0x1082b40, 0xc8206e8380, 0xc82013ae80, 0xc82002ec00, 0x7f6f11a9e000, 0x0, 0x7, 0x7)
    /home/dfc/src/golang.org/x/net/websocket/websocket.go:385 +0x2b fp=0xc820aedcd0 sp=0xc820aedc90
golang.org/x/net/websocket.Codec.Send(0x15a32d0, 0x15a32d8, 0xc8206ac240, 0x1082b40, 0xc8206e8380, 0x0, 0x0)
    /home/dfc/src/golang.org/x/net/websocket/websocket.go:288 +0x58 fp=0xc820aedd50 sp=0xc820aedcd0
github.com/juju/juju/rpc/jsoncodec.wsJSONConn.Send(0xc8206ac240, 0x1082b40, 0xc8206e8380, 0x1052ee0, 0xc82038ca40)
    /home/dfc/src/github.com/juju/juju/rpc/jsoncodec/conn.go:21 +0x4d fp=0xc820aedd90 sp=0xc820aedd50
github.com/juju/juju/rpc/jsoncodec.(*Codec).WriteMessage(0xc8202a8b00, 0xc8206ae180, 0x1194940, 0xc82017b920, 0x0, 0x0)
    /home/dfc/src/github.com/juju/juju/rpc/jsoncodec/codec.go:182 +0x19a fp=0xc820aede20 sp=0xc820aedd90
github.com/juju/juju/rpc.(*Conn).runRequest(0xc8204fe6e0, 0x7f6f11a32ea0, 0xc820303180, 0x15a24a0, 0x3, 0xc8205f9022, 0x6, 0x1, 0x0, 0x0, ...)
    /home/dfc/src/github.com/juju/juju/rpc/server.go:572 +0x387 fp=0xc820aeded0 sp=0xc820aede20
runtime.goexit()
    /home/dfc/go/src/runtime/asm_amd64.s:2006 +0x1 fp=0xc820aeded8 sp=0xc820aeded0
created by github.com/juju/juju/rpc.(*Conn).handleRequest
    /home/dfc/src/github.com/juju/juju/rpc/server.go:489 +0x66d

/cc @khr @aclements

@davecheney
Copy link
Contributor Author

Go 1.5.3 is fine, Go 1.6 is fine. I'm trying to bisect now.

@ianlancetaylor ianlancetaylor added this to the Go1.7 milestone Mar 22, 2016
@davecheney
Copy link
Contributor Author

I've bisected this as far as 481fe59. However, for possibly unrelated reasons, the test suite I am using to verify this fault does not run (fails to connect to the database on startup). I believe that this is an unrelated failure, but it prevents me from making a more accurate bisection.

With that said, disabling ssa at this commit stops the crash.

@davecheney
Copy link
Contributor Author

Please disregard the previous comment, the problem is much earlier than 481fe59

@davecheney
Copy link
Contributor Author

I'm sorry to report that the offending revision is 9d854fd, the SSA merge.

If anyone wants to investigate I can configure a AWS VM with an environment to reproduce the problem.

@davecheney
Copy link
Contributor Author

Program received signal SIGSEGV, Segmentation fault.
runtime.mapiternext (it=0xc8204a26c0) at /home/dfc/go/src/runtime/hashmap.go:718
718                     if b.tophash[offi] != empty && b.tophash[offi] != evacuatedEmpty {
(gdb) disassemble 
Dump of assembler code for function runtime.mapiternext:
   0x0000000000409660 <+0>:     mov    %fs:0xfffffffffffffff8,%rcx
   0x0000000000409669 <+9>:     lea    -0x10(%rsp),%rax
   0x000000000040966e <+14>:    cmp    0x10(%rcx),%rax
   0x0000000000409672 <+18>:    jbe    0x409d31 <runtime.mapiternext+1745>
   0x0000000000409678 <+24>:    sub    $0x90,%rsp
   0x000000000040967f <+31>:    mov    0x98(%rsp),%rcx
   0x0000000000409687 <+39>:    mov    0x18(%rcx),%rdx
   0x000000000040968b <+43>:    mov    %rdx,0x70(%rsp)
   0x0000000000409690 <+48>:    mov    0x10(%rcx),%rbx
   0x0000000000409694 <+52>:    mov    %rbx,0x50(%rsp)
   0x0000000000409699 <+57>:    mov    0x50(%rcx),%rbp
   0x000000000040969d <+61>:    mov    0x28(%rcx),%rsi
   0x00000000004096a1 <+65>:    movzbl 0x4b(%rcx),%edi
   0x00000000004096a5 <+69>:    mov    0x58(%rcx),%rcx
   0x00000000004096a9 <+73>:    mov    0x38(%rbx),%r8
   0x00000000004096ad <+77>:    mov    0x18(%r8),%rax
   0x00000000004096b1 <+81>:    mov    %rax,0x88(%rsp)
   0x00000000004096b9 <+89>:    test   %rsi,%rsi
   0x00000000004096bc <+92>:    je     0x409768 <runtime.mapiternext+264>
   0x00000000004096c2 <+98>:    mov    %rbp,0x38(%rsp)
   0x00000000004096c7 <+103>:   mov    %rsi,0x80(%rsp)
   0x00000000004096cf <+111>:   mov    %rcx,0x30(%rsp)
   0x00000000004096d4 <+116>:   mov    %dil,0x2f(%rsp)
   0x00000000004096d9 <+121>:   cmp    $0x8,%dil
   0x00000000004096dd <+125>:   jae    0x409750 <runtime.mapiternext+240>
   0x00000000004096df <+127>:   mov    0x98(%rsp),%r8
   0x00000000004096e7 <+135>:   movzbl 0x48(%r8),%r8d
   0x00000000004096ec <+140>:   add    %edi,%r8d
   0x00000000004096ef <+143>:   movzbl 0x58(%rbx),%r9d
   0x00000000004096f4 <+148>:   mov    %r8,%r10
   0x00000000004096f7 <+151>:   and    $0x7,%r10
   0x00000000004096fb <+155>:   mov    %r9,%r11
   0x00000000004096fe <+158>:   imul   %r10,%r11
   0x0000000000409702 <+162>:   lea    0x8(%r11,%rsi,1),%r11
   0x0000000000409707 <+167>:   mov    %r11,0x60(%rsp)
   0x000000000040970c <+172>:   movzbl 0x5a(%rbx),%r12d
   0x0000000000409711 <+177>:   imul   %r10,%r12
---Type <return> to continue, or q <return> to quit---
   0x0000000000409715 <+181>:   lea    0x8(%r12,%r9,8),%r9
   0x000000000040971a <+186>:   add    %rsi,%r9
   0x000000000040971d <+189>:   mov    %r9,0x48(%rsp)
=> 0x0000000000409722 <+194>:   test   %al,(%rsi)
   0x0000000000409724 <+196>:   and    $0x7,%r8
   0x0000000000409728 <+200>:   mov    %r8,0x40(%rsp)
   0x000000000040972d <+205>:   movzbl (%rsi,%r8,1),%r10d
   0x0000000000409732 <+210>:   cmp    $0x0,%r10b
   0x0000000000409736 <+214>:   je     0x409742 <runtime.mapiternext+226>
   0x0000000000409738 <+216>:   cmp    $0x1,%r10b
   0x000000000040973c <+220>:   jne    0x409896 <runtime.mapiternext+566>
   0x0000000000409742 <+226>:   add    $0x1,%edi
   0x0000000000409745 <+229>:   mov    %dil,0x2f(%rsp)
   0x000000000040974a <+234>:   cmp    $0x8,%dil
   0x000000000040974e <+238>:   jb     0x4096df <runtime.mapiternext+127>
   0x0000000000409750 <+240>:   movzwl 0x5c(%rbx),%edi
   0x0000000000409754 <+244>:   lea    -0x8(%rdi,%rsi,1),%rsi
   0x0000000000409759 <+249>:   mov    (%rsi),%rsi
   0x000000000040975c <+252>:   mov    $0x0,%dil
   0x000000000040975f <+255>:   test   %rsi,%rsi
   0x0000000000409762 <+258>:   jne    0x4096c2 <runtime.mapiternext+98>
   0x0000000000409768 <+264>:   mov    0x98(%rsp),%rcx
   0x0000000000409770 <+272>:   mov    %rcx,0x78(%rsp)
   0x0000000000409775 <+277>:   mov    0x40(%rcx),%rsi
   0x0000000000409779 <+281>:   cmp    %rbp,%rsi
   0x000000000040977c <+284>:   jne    0x40978b <runtime.mapiternext+299>
   0x000000000040977e <+286>:   movzbl 0x49(%rcx),%esi
   0x0000000000409782 <+290>:   test   %sil,%sil
   0x0000000000409785 <+293>:   jne    0x409877 <runtime.mapiternext+535>
   0x000000000040978b <+299>:   mov    0x18(%rdx),%rsi
   0x000000000040978f <+303>:   test   %rsi,%rsi
   0x0000000000409792 <+306>:   je     0x409855 <runtime.mapiternext+501>
   0x0000000000409798 <+312>:   movzbl 0x4a(%rcx),%edi
   0x000000000040979c <+316>:   movzbl 0x9(%rdx),%r8d
   0x00000000004097a1 <+321>:   cmp    %r8b,%dil
   0x00000000004097a4 <+324>:   jne    0x409855 <runtime.mapiternext+501>
   0x00000000004097aa <+330>:   add    $0xffffffff,%edi
   0x00000000004097ad <+333>:   mov    %edi,%ecx
---Type <return> to continue, or q <return> to quit---q
Quit
(gdb) info registers 
rax            0x1bc7c70        29129840
rbx            0x1064940        17189184
rcx            0xffffffffffffffff       -1
rdx            0xc82029b980     859533064576
rsi            0xbcd63e80c362fcf0       -4839612026808238864
rdi            0xbcd63e80c362fc00       -4839612026808239104
rbp            0x2a777ea8ee3339e0       0x2a777ea8ee3339e0
rsp            0xc8206b32f8     0xc8206b32f8
r8             0xc362fc00       3278044160
r9             0xbcd63e80c362fd78       -4839612026808238728
r10            0x0      0
r11            0xbcd63e80c362fcf8       -4839612026808238856
r12            0x0      0
r13            0xc81ffdaec8     859530178248
r14            0x8      8
r15            0x6      6
rip            0x409722 0x409722 <runtime.mapiternext+194>
eflags         0x10286  [ PF SF IF RF ]
cs             0x33     51
ss             0x2b     43
ds             0x0      0
es             0x0      0
fs             0x0      0
gs             0x0      0

rsi is garbage

@davecheney
Copy link
Contributor Author

Setting GOGC=20 triggers the problem faster, setting GOGC=off hides it.

@randall77
Copy link
Contributor

This looks like corruption of the hash map header. The buckets field is junk.
Dave, if you can set up a way for me to reproduce it, that would be great. I don't think I will be able to get to it today, however, so no hurry.

@randall77
Copy link
Contributor

Current suspect: github.com/juju/juju/apiserver/client/status.go:(*statusContext).processService
At the very end, it does "return processStatus". But processStatus is also a named return value. I think this is confusing SSA, as at the end of the function it generates:

v472    23075 (status.go:658)   VARDEF  "".processedStatus+16(FP)
v265    23076 (status.go:658)   MOVQ    "".processedStatus+16(FP), AX
v267    23077 (status.go:658)   MOVQ    AX, "".processedStatus+16(FP)
v161    23078 (status.go:658)   LEAQ    "".processedStatus+24(FP), DI
v157    23079 (status.go:658)   MOVQ    DI, SI
v473    23080 (status.go:658)   DUFFCOPY    $658
b57 23081 (status.go:658)   RET

It's basically doing processedStatus = processedStatus using DUFFCOPY, but the VARDEF at the start tells the live variable analysis that processedStatus is dead before this code. Thus the GC won't scan any pointers in this structure, and it contains (among other things) maps.

So named return values being copied to themselves in a return statement may be buggy. I'll try to make a small repro and a fix tomorrow.

Changing "return processedStatus" to "return" fixed everything. Someone might want to do that to the juju sources, as it is better style and matches the other returns in the function. But I'll definitely fix this on the SSA end, we should generate correct code regardless.

@randall77 randall77 self-assigned this Mar 24, 2016
@davecheney
Copy link
Contributor Author

Thanks Keith. I'll make that change now as a workaround.

On Thu, Mar 24, 2016 at 4:07 PM, Keith Randall notifications@github.com
wrote:

Current suspect:
github.com/juju/juju/apiserver/client/status.go:(*statusContext).processService
At the very end, it does "return processStatus". But processStatus is also
a named return value. I think this is confusing SSA, as at the end of the
function it generates:

v472 23075 (status.go:658) VARDEF "".processedStatus+16(FP)
v265 23076 (status.go:658) MOVQ "".processedStatus+16(FP), AX
v267 23077 (status.go:658) MOVQ AX, "".processedStatus+16(FP)
v161 23078 (status.go:658) LEAQ "".processedStatus+24(FP), DI
v157 23079 (status.go:658) MOVQ DI, SI
v473 23080 (status.go:658) DUFFCOPY $658
b57 23081 (status.go:658) RET

It's basically doing processedStatus = processedStatus using DUFFCOPY, but
the VARDEF at the start tells the live variable analysis that
processedStatus is dead before this code. Thus the GC won't scan any
pointers in this structure, and it contains (among other things) maps.

So named return values being copied to themselves in a return statement
may be buggy. I'll try to make a small repro and a fix tomorrow.

Changing "return processedStatus" to "return" fixed everything. Someone
might want to do that to the juju sources, as it is better style and
matches the other returns in the function. But I'll definitely fix this on
the SSA end, we should generate correct code regardless.


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#14904 (comment)

jujubot added a commit to juju/juju that referenced this issue Mar 24, 2016
apiserver/client: work around compiler bug with named return values

Upstream golang/go#14904

Returning a complex variable by value with a mix of named and implicit
returns was throwing off the compiler. This caused the memory occupied
by the return value to be considered dead before the end of the
function, leading to the gc freeing that memory prematurely.

The bug is not believed to not affect any shipping Go 1.6 compiler, but
it seems prudent to work around the compiler issue in our code.

(Review request: http://reviews.vapour.ws/r/4336/)
@gopherbot
Copy link

CL https://golang.org/cl/21233 mentions this issue.

gopherbot pushed a commit that referenced this issue Mar 31, 2016
Compound AUTO types weren't named previously.  That was because live
variable analysis (plive.go) doesn't handle spilling to compound types.
It can't handle them because there is no valid place to put VARDEFs when
regalloc is spilling compound types.

compound types = multiword builtin types: complex, string, slice, and
interface.

Instead, we split named AUTOs into individual one-word variables.  For
example, a string s gets split into a byte ptr s.ptr and an integer
s.len.  Those two variables can be spilled to / restored from
independently.  As a result, live variable analysis can handle them
because they are one-word objects.

This CL will change how AUTOs are described in DWARF information.
Consider the code:

func f(s string, i int) int {
    x := s[i:i+5]
    g()
    return lookup(x)
}

The old compiler would spill x to two consecutive slots on the stack,
both named x (at offsets 0 and 8).  The new compiler spills the pointer
of x to a slot named x.ptr.  It doesn't spill x.len at all, as it is a
constant (5) and can be rematerialized for the call to lookup.

So compound objects may not be spilled in their entirety, and even if
they are they won't necessarily be contiguous.  Such is the price of
optimization.

Re-enable live variable analysis tests.  One test remains disabled, it
fails because of #14904.

Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801
Reviewed-on: https://go-review.googlesource.com/21233
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
@gopherbot
Copy link

CL https://golang.org/cl/21470 mentions this issue.

@golang golang locked and limited conversation to collaborators Apr 6, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants