-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: invalid GC instruction 0xd183f0 at 0xc208217f28 #7748
Labels
Milestone
Comments
Currently the build fails with: $ go test launchpad.net/juju-core/cmd/jujud # launchpad.net/juju-core/utils/ssh /ssd/src/gopath/src/launchpad.net/juju-core/utils/ssh/ssh_gocrypto.go:84: undefined: ssh.ClientConn FAIL launchpad.net/juju-core/cmd/jujud [build failed] I've updated go.crypto to r191, which roughly relates to the report date. Then it started failing with: $ go test launchpad.net/juju-core/cmd/jujud # launchpad.net/juju-core/worker/instancepoller /ssd/src/gopath/src/launchpad.net/juju-core/worker/instancepoller/aggregate.go:67: undefined: ratelimit.New FAIL launchpad.net/juju-core/cmd/jujud [build failed] I've reverted launchpad.net/juju-core to r2597, which roughly relates to the report date. But it still fails to build. Please provide exact revisions that build. |
I'm sorry. I neglected to mention that we use a tool we developed to pin dependency versions. Please try $ go get launchpad.net/godeps $ cd $GOPATH/src/launchpad.net/juju-core $ godeps -u dependencies.tsv There will be a bunch of output as godeps updates each dependency to the correct revision |
CL https://golang.org/cl/88100048 mentions this issue. |
I narrowed down the problem by recompiling the runtime with Debug=3 in mgc0.c and then I ran ./upgrader.test 2>&1 | gzip > upgrader.log.gz With Debug=3 there's a lot of repetitive output; the gzip is a trick to reduce I/O and disk requirements, often dramatically. This one didn't generate too much output in the end. I wanted the full scan trace, bug setting Debug=3 also enabled some sanity checks that found the mismatched type information even earlier and gave a mostly reasonable message. With some tweaks, the message is: scanblock 0xc2084d0120 32 type 0xc1a960 common.ToolsGetter pc=0xa2aca0 gc_iface @0xc2084d0130: 0x7f8c9c97a7d8/0x0 0xc208213200 gc_aptr @0xc2084d0130: 0xc2081cc140 scanblock 0xc208090190 80 type 0xc208182e80 methodargs(upgrader.Upgrader)(func(params.Entities) (params.ToolsResults, error)) pc=0xc208094b00 gc_ptr @0xc208090190: 0xc208748090 ti=0x9f1160 invalid gc type info for 'upgrader.UpgraderAPI', type info 0x9f1160 [1]=0x9, block info 0xa90680 [1]=0x1 fatal error: invalid gc type info There's no direct way to map from the gc program to the name of the enclosing type, but the enclosing type structure holds the gc pointer next to the string name pointer, so if you can find the gc pointer in the type structure, the string pointer is next: (gdb) find &rodata, &erodata, 0x9f1160 0xa0c5f8 0xc313b8 2 patterns found. (gdb) x/2xg 0xa0c5f8 0xa0c5f8: 0x00000000009f1160 0x0000000000000000 # not this one (gdb) x/2xg 0xc313b8 0xc313b8: 0x00000000009f1160 0x0000000000d9a6e0 # must be this one (gdb) x/2xg 0xd9a6e0 0xd9a6e0: 0x0000000000d9a6f0 0x0000000000000011 # there's the string (gdb) x/17xc 0xd9a6f0 0xd9a6f0: 117 'u' 112 'p' 103 'g' 114 'r' 97 'a' 100 'd' 101 'e' 114 'r' 0xd9a6f8: 46 '.' 85 'U' 112 'p' 103 'g' 114 'r' 97 'a' 100 'd' 101 'e' 0xd9a700: 114 'r' (gdb) (gdb) find &rodata, &erodata, 0xa90680 0xa0c638 0xcac138 2 patterns found. (gdb) x/2xg 0xa0c638 0xa0c638: 0x0000000000a90680 0x0000000000000000 # again, not this one (gdb) x/2xg 0xcac138 0xcac138: 0x0000000000a90680 0x0000000000d9a720 # yes, this one (it's almost always the last one) (gdb) x/2xg 0xd9a720 0xd9a720: 0x0000000000d9a730 0x0000000000000014 (gdb) x/20xc 0xd9a730 0xd9a730: 117 'u' 112 'p' 103 'g' 114 'r' 97 'a' 100 'd' 101 'e' 114 'r' 0xd9a738: 46 '.' 85 'U' 112 'p' 103 'g' 114 'r' 97 'a' 100 'd' 101 'e' 0xd9a740: 114 'r' 65 'A' 80 'P' 73 'I' (gdb) As it happens, I trust the 'block info', which is registered during malloc, much more these days than I trust the 'type info' obtained directly from the GC program, which has had a variety of bugs in corner cases. The block info says this pointer is an updater.UpdaterAPI, while the type info says it is an updater.Updater Looking at the code, the former is a struct and the latter is an interface. Knowing that updater.Updater is an interface makes the lines above the crash suspect: methodargs(upgrader.Upgrader)(func(params.Entities) is strange because most methods take a concrete value as the receiver, not an interface. (The exception is something like io.Reader.Read, which is a func(io.Reader, []byte) (int, error).) And that's the bug: reflect is writing down a function frame with an interface-typed receiver instead of using the type of the thing in the interface. On the Ubuntu VM you set up, if I run the test it fails >>50% of the time. With CL 88100048 applied, it passes 20x in a row. Pretty sure that's the fix. Status changed to Started. |
CL https://golang.org/cl/88090045 mentions this issue. |
> I trust the 'block info', which is registered during malloc, much more these days than I trust the 'type info' obtained directly from the GC program seconded I've added that type info check when I debugged series of similar divergences. Unfortunately the check has false positives, there are legal divergences (legal in the sense that the current implementation allows that divergences and then specifically works around them). |
This issue was closed by revision fcf8a77. Status changed to Fixed. |
This issue was closed.
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
The text was updated successfully, but these errors were encountered: