-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reflect: reflect.unpackEface reflect/value.go:174 unexpected fault address 0x0 #7841
Labels
Comments
I found a slight variation in the crash caused by different input parameters. unexpected fault address 0x656d697477 fatal error: fault [signal 0xb code=0x1 addr=0x656d697477 pc=0x5b1029] goroutine 509 [running]: runtime.throw(0x12e4942) /go/src/pkg/runtime/panic.c:520 +0x69 fp=0xc208825128 runtime.sigpanic() /go/src/pkg/runtime/os_linux.c:240 +0x13f fp=0xc208825140 reflect.implements(0xb55860, 0xc208e431b0, 0xb55860) /go/src/pkg/reflect/type.go:1194 +0x369 fp=0xc2088251e8 reflect.(*rtype).Implements(0xc208e431b0, 0x7f612d8371c8, 0xb55860, 0xc208e431b0) /go/src/pkg/reflect/type.go:1133 +0x11c fp=0xc208825220 encoding/json.newTypeEncoder(0x7f612d8371c8, 0xc208e43170, 0xc208825201, 0xc208825290) /go/src/pkg/encoding/json/encode.go:360 +0x121 fp=0xc208825260 encoding/json.typeEncoder(0x7f612d8371c8, 0xc208e43170, 0x0) /go/src/pkg/encoding/json/encode.go:340 +0x2d1 fp=0xc2088252c8 encoding/json.valueEncoder(0xc208e43170, 0x0, 0x0, 0x2, 0xc208cf2390) /go/src/pkg/encoding/json/encode.go:311 +0x8c fp=0xc208825308 encoding/json.(*encodeState).reflectValue(0xc208528b00, 0xc208e43170, 0x0, 0x0, 0x2) /go/src/pkg/encoding/json/encode.go:297 +0x53 fp=0xc208825340 encoding/json.interfaceEncoder(0xc208528b00, 0xa882c0, 0xc208e43160, 0x0, 0x146, 0xa88200) |
I've spent some time trying to simulate parts of the servers behavior, but nothing has reproduced the crash so far. I suspect the servers interaction with the GC is just too complex to simulate with simplified code. In a moment of clarity I had the idea of turning off the GC. With GOGC=off I am unable to reproduce the crash, so I think we're on the right track blaming the GC. Is there any debugging I can turn on to help diagnose the issue? Can I rule out parts of the new GC by building Go at specific commits and test before/after behavior? |
You can try changing the top of src/pkg/runtime/mgc0.c to say 'Debug=2' instead of 'Debug=0' and see if that crashes earlier. If so, try 'Debug=3' to get a detailed log. You will want to redirect standard error to a file in that case. I like to redirect piped through gzip to reduce the size of the file, which can help I/O significantly your-prog 2>&1 | gzip > log.gz |
The string "downloading" appears in some of the JSON the server has fetched from a backend service and decoded into a map[string]interface{}. That map is then stored in an interface{} inside the struct that is being encoded when the crash occurs. The string is in a JSON array, so that part of the stack trace makes sense. I tried running under gdb to examine the memory more, but the stack in gdb is nonsense. The function names and source lines are off: reflect.implements (T=0xc2083333a0, V=0xc208333338, ~r2=16) at /go/src/pkg/net/http/chunked.go:1 |
With Debug=2 it crashes in a slightly different place: runtime.sigpanic() /go/src/pkg/runtime/os_linux.c:240 +0x13f fp=0xc208a26cc8 encoding/json.(*encodeState).string(0xc2080400b0, 0x707474683c5f6e65, 0x64657062642f2f3a, 0x186, 0x0, 0x0) 0xc2080400b0 appears in the GC Debug output and in the stack trace a lot: $ grep 0xc2080400b0 log scanblock 0xc2080400b0 176 type 0x9ff801 sync.poolLocal pc=0x859000 scanblock 0xc2080400b0 176 type 0x9ff801 sync.poolLocal pc=0x859000 scanblock 0xc2080400b0 176 type 0x9ff801 sync.poolLocal pc=0x859000 scanblock 0xc2080400b0 176 type 0x9ff801 sync.poolLocal pc=0x859000 scanblock 0xc2080400b0 176 type 0x9ff801 sync.poolLocal pc=0x859000 encoding/json.(*encodeState).string(0xc2080400b0, 0x707474683c5f6e65, 0x64657062642f2f3a, 0x186, 0x0, 0x0) encoding/json.stringEncoder(0xc2080400b0, 0x89f420, 0xc208d3a640, 0x0, 0x186, 0x0) encoding/json.(*structEncoder).encode(0xc208c88a20, 0xc2080400b0, 0x9e4200, 0xc208d3a640, 0x0, 0x196, 0x0) encoding/json.*structEncoder.(encoding/json.encode)·fm(0xc2080400b0, 0x9e4200, 0xc208d3a640, 0x0, 0x196, 0xc208d3a600) encoding/json.(*ptrEncoder).encode(0xc2081da0b8, 0xc2080400b0, 0x873ce0, 0xc208d3a640, 0x0, 0x160, 0xc208d3a600) encoding/json.*ptrEncoder.(encoding/json.encode)·fm(0xc2080400b0, 0x873ce0, 0xc208d3a640, 0x0, 0x160, 0x0) encoding/json.(*encodeState).reflectValue(0xc2080400b0, 0x873ce0, 0xc208d3a640, 0x0, 0x160) encoding/json.interfaceEncoder(0xc2080400b0, 0x896fc0, 0xc208362e10, 0x0, 0x146, 0x896f00) encoding/json.(*arrayEncoder).encode(0xc20803e120, 0xc2080400b0, 0x881ea0, 0xc20868cda8, 0x0, 0x176, 0xb47d00) encoding/json.*arrayEncoder.(encoding/json.encode)·fm(0xc2080400b0, 0x881ea0, 0xc20868cda8, 0x0, 0x176, 0x0) encoding/json.(*sliceEncoder).encode(0xc20803e128, 0xc2080400b0, 0x881ea0, 0xc20868cda8, 0x0, 0x176, 0x200) encoding/json.*sliceEncoder.(encoding/json.encode)·fm(0xc2080400b0, 0x881ea0, 0xc20868cda8, 0x0, 0x176, 0x0) encoding/json.(*structEncoder).encode(0xc2089f41b0, 0xc2080400b0, 0xa0d4c0, 0xc20868cd20, 0x0, 0x196, 0x0) encoding/json.*structEncoder.(encoding/json.encode)·fm(0xc2080400b0, 0xa0d4c0, 0xc20868cd20, 0x0, 0x196, 0xc20868cd00) encoding/json.(*ptrEncoder).encode(0xc208568188, 0xc2080400b0, 0x873c20, 0xc20868cd20, 0x0, 0x160, 0xc20868cd00) encoding/json.*ptrEncoder.(encoding/json.encode)·fm(0xc2080400b0, 0x873c20, 0xc20868cd20, 0x0, 0x160, 0x0) encoding/json.(*encodeState).reflectValue(0xc2080400b0, 0x873c20, 0xc20868cd20, 0x0, 0x160) encoding/json.interfaceEncoder(0xc2080400b0, 0x896fc0, 0xc208d056d0, 0x0, 0x146, 0x896f00) encoding/json.(*arrayEncoder).encode(0xc20803e120, 0xc2080400b0, 0x881ea0, 0xc208208bd0, 0x0, 0x176, 0xb48a00) encoding/json.*arrayEncoder.(encoding/json.encode)·fm(0xc2080400b0, 0x881ea0, 0xc208208bd0, 0x0, 0x176, 0x0) encoding/json.(*sliceEncoder).encode(0xc20803e128, 0xc2080400b0, 0x881ea0, 0xc208208bd0, 0x0, 0x176, 0x0) encoding/json.*sliceEncoder.(encoding/json.encode)·fm(0xc2080400b0, 0x881ea0, 0xc208208bd0, 0x0, 0x176, 0x0) encoding/json.(*structEncoder).encode(0xc208582030, 0xc2080400b0, 0xa502a0, 0xc208208b40, 0x0, 0x196, 0x0) encoding/json.*structEncoder.(encoding/json.encode)·fm(0xc2080400b0, 0xa502a0, 0xc208208b40, 0x0, 0x196, 0xc208208b00) encoding/json.(*ptrEncoder).encode(0xc208218048, 0xc2080400b0, 0x872660, 0xc2085d40e0, 0x0, 0x166, 0x0) encoding/json.*ptrEncoder.(encoding/json.encode)·fm(0xc2080400b0, 0x872660, 0xc2085d40e0, 0x0, 0x166, 0x872600) encoding/json.(*arrayEncoder).encode(0xc208218060, 0xc2080400b0, 0x87ede0, 0xc208516040, 0x0, 0x172, 0x1062300) encoding/json.*arrayEncoder.(encoding/json.encode)·fm(0xc2080400b0, 0x87ede0, 0xc208516040, 0x0, 0x172, 0x7fdcf4130100) encoding/json.(*sliceEncoder).encode(0xc208218068, 0xc2080400b0, 0x87ede0, 0xc208516040, 0x0, 0x172, 0x7fdcf4135200) encoding/json.*sliceEncoder.(encoding/json.encode)·fm(0xc2080400b0, 0x87ede0, 0xc208516040, 0x0, 0x172, 0x100) encoding/json.(*encodeState).reflectValue(0xc2080400b0, 0x87ede0, 0xc208516040, 0x0, 0x172) encoding/json.(*encodeState).marshal(0xc2080400b0, 0x87ede0, 0xc208516040, 0x0, 0x0) |
With Debug=3: runtime.panic(0x9f6120, 0x1045833) /go/src/pkg/runtime/panic.c:248 +0x18d reflect.(*rtype).ptrTo(0xc2080c7500, 0x7f30d903b1c8) /go/src/pkg/reflect/type.go:1084 +0x40c reflect.PtrTo(0x7f30d903b1c8, 0xc2080c7500, 0x0, 0x0) /go/src/pkg/reflect/type.go:1045 +0x5a encoding/json.newTypeEncoder(0x7f30d903b1c8, 0xc2080c7500, 0xc208d41001, 0xc208d410d0) /go/src/pkg/encoding/json/encode.go:360 +0xaa encoding/json.typeEncoder(0x7f30d903b1c8, 0xc2080c7500, 0x0) /go/src/pkg/encoding/json/encode.go:340 +0x1c5 encoding/json.valueEncoder(0xc2080c7500, 0xdeaddeaddeaddead, 0x0, 0x2, 0x74000014) /go/src/pkg/encoding/json/encode.go:311 +0x79 encoding/json.(*encodeState).reflectValue(0xc20875f080, 0xc2080c7500, 0xdeaddeaddeaddead, 0x0, 0x2) /go/src/pkg/encoding/json/encode.go:297 +0x45 encoding/json.interfaceEncoder(0xc20875f080, 0x897540, 0xc2080c7340, 0x0, 0x146, 0x0) /go/src/pkg/encoding/json/encode.go:555 +0xe9 encoding/json.(*structEncoder).encode(0xc208cee870, 0xc20875f080, 0xa0da40, 0xc2080c7260, 0x0, 0x196, 0xc200000000) /go/src/pkg/encoding/json/encode.go:582 +0x2ad Anything I can grep out of the log to help? |
toffaletti, please do the following: 1. crash it several times and attach 5 different crash reports, I am mostly interested in the crash report header and the first stack (this can help to localize the issue) 2. run with GODEBUG=efence=1 environment variable, and attach several crash reports (this can also help to localize the issue) For now it looks like an interface value gets incorrectly freed (potentially when reached through reflect.Value), but I am not sure. I am also interested in the exact data structure that you serialize. I've tried to recreate it based on the traces, I see that there is "slice -> ptr -> struct -> slice -> interface", here is what I've got: http://play.golang.org/p/y45Zs0kd4D But it does not crash. toffaletti, please try to modify this program as close to your real code as possible (in terms of data types, structure of the object that you serialize and how you call reflect.Marshall). And see if you can reproduce the crash on that program. If we would be able to reproduce it on such a small program, then would allow us to debug it quickly. |
http://play.golang.org/p/O8eiPDUJzJ This is the best I can do with the types involved, but I can't make it crash either. My best guess, just because this is the most unusual thing I've noticed looking at the types, is that it might have something to do with some cycles in the struct. Some of the []interface{} are assigned to types that then reference the parent or other types that reference them, however the cycles are always hidden behind interface{}. Sorry I can't be of more help. This code is quite complex and there is a good deal of it that probably isn't even being used anymore. I plan on trying to remove a lot of the dead code and once that is done perhaps I'll be able to distill it down to something that crashes which I am able to share, but don't count on it. |
toffaletti, please do the following: 1. crash it several times and attach 5 different crash reports, I am mostly interested in the crash report header and the first stack (this can help to localize the issue) 2. run with GODEBUG=efence=1 environment variable, and attach several crash reports (this can also help to localize the issue) |
Attached 5 crash reports. I haven't been able to make it crash with GODEBUG=efence=1 yet. The process just takes more and more memory (a lot of virtual memory too) until I restart it and try again. Attachments:
|
toffaletti, it seems like you can reproduce this crash pretty easily. I fixed a bug earlier today that could cause crashes of the general form you are seeing (partially corrupt multiword values on stack). Could you please update to tip (let's say 2fb7165d5fdc to be concrete) and see if you can still reproduce the problem? Thanks. |
I went with latest tip (1 commit past 2fb7165d5fdc): go version devel +3ac2308bca2a Tue May 13 01:09:38 2014 -0400 linux/amd64 I was still able to reproduce the crash, though it took longer than it ever has before. unexpected fault address 0x0 fatal error: fault [signal 0xb code=0x80 addr=0x0 pc=0x5468a4] goroutine 15011 [running]: runtime.throw(0x10d1a42) /go/src/pkg/runtime/panic.c:520 +0x69 fp=0xc209f13300 runtime.sigpanic() /go/src/pkg/runtime/os_linux.c:240 +0x13f fp=0xc209f13318 reflect.implements(0x99b7a0, 0xc208248e10, 0x99b7a0) /go/src/pkg/reflect/type.go:1194 +0x154 fp=0xc209f13370 reflect.(*rtype).Implements(0xc208248e10, 0x7f3b750e11d8, 0x99b7a0, 0x9f133f8) /go/src/pkg/reflect/type.go:1133 +0x110 fp=0xc209f133a8 encoding/json.newTypeEncoder(0x7f3b750e11d8, 0xc208248e10, 0xc209f13401, 0xc209f133f8) /go/src/pkg/encoding/json/encode.go:356 +0x47 fp=0xc209f133d8 encoding/json.typeEncoder(0x7f3b750e11d8, 0xc208248e10, 0x0) /go/src/pkg/encoding/json/encode.go:340 +0x1c5 fp=0xc209f13428 encoding/json.valueEncoder(0xc208248e10, 0x0, 0x0, 0x2, 0x40) /go/src/pkg/encoding/json/encode.go:311 +0x8d fp=0xc209f13468 encoding/json.(*encodeState).reflectValue(0xc2085724d0, 0xc208248e10, 0x0, 0x0, 0x2) /go/src/pkg/encoding/json/encode.go:297 +0x45 fp=0xc209f134a0 encoding/json.interfaceEncoder(0xc2085724d0, 0x8c3d60, 0xc208248df0, 0x0, 0x146, 0x8c3d00) /go/src/pkg/encoding/json/encode.go:555 +0xe9 fp=0xc209f134f0 encoding/json.(*arrayEncoder).encode(0xc2083f00c0, 0xc2085724d0, 0x8ad020, 0xc20852bdd0, 0x0, 0x176, 0x4bef00) /go/src/pkg/encoding/json/encode.go:685 +0x132 fp=0xc209f13550 encoding/json.*arrayEncoder.(encoding/json.encode)·fm(0xc2085724d0, 0x8ad020, 0xc20852bdd0, 0x0, 0x176, 0x73000000) /go/src/pkg/encoding/json/encode.go:692 +0x62 fp=0xc209f13590 encoding/json.(*sliceEncoder).encode(0xc2083f00e0, 0xc2085724d0, 0x8ad020, 0xc20852bdd0, 0x0, 0x176, 0x100) /go/src/pkg/encoding/json/encode.go:662 +0xce fp=0xc209f135d0 encoding/json.*sliceEncoder.(encoding/json.encode)·fm(0xc2085724d0, 0x8ad020, 0xc20852bdd0, 0x0, 0x176, 0x0) /go/src/pkg/encoding/json/encode.go:671 +0x62 fp=0xc209f13610 encoding/json.(*structEncoder).encode(0xc2084a55f0, 0xc2085724d0, 0xa99cc0, 0xc20852bd40, 0x0, 0x196, 0x7f3b00000000) /go/src/pkg/encoding/json/encode.go:582 +0x2ad fp=0xc209f13788 encoding/json.*structEncoder.(encoding/json.encode)·fm(0xc2085724d0, 0xa99cc0, 0xc20852bd40, 0x0, 0x196, 0xc20852bd00) /go/src/pkg/encoding/json/encode.go:596 +0x62 fp=0xc209f137c8 encoding/json.(*ptrEncoder).encode(0xc2083be198, 0xc2085724d0, 0x89cee0, 0xc20836c228, 0x0, 0x166, 0x0) /go/src/pkg/encoding/json/encode.go:704 +0xfc fp=0xc209f13818 encoding/json.*ptrEncoder.(encoding/json.encode)·fm(0xc2085724d0, 0x89cee0, 0xc20836c228, 0x0, 0x166, 0x89ce00) /go/src/pkg/encoding/json/encode.go:709 +0x62 fp=0xc209f13858 encoding/json.(*arrayEncoder).encode(0xc2083be1a0, 0xc2085724d0, 0x8a9de0, 0xc209f33ba0, 0x0, 0x172, 0x10f5400) /go/src/pkg/encoding/json/encode.go:685 +0x132 fp=0xc209f138b8 encoding/json.*arrayEncoder.(encoding/json.encode)·fm(0xc2085724d0, 0x8a9de0, 0xc209f33ba0, 0x0, 0x172, 0x7f3b750e1100) /go/src/pkg/encoding/json/encode.go:692 +0x62 fp=0xc209f138f8 encoding/json.(*sliceEncoder).encode(0xc2083be1b0, 0xc2085724d0, 0x8a9de0, 0xc209f33ba0, 0x0, 0x172, 0x1112b00) /go/src/pkg/encoding/json/encode.go:662 +0xce fp=0xc209f13938 encoding/json.*sliceEncoder.(encoding/json.encode)·fm(0xc2085724d0, 0x8a9de0, 0xc209f33ba0, 0x0, 0x172, 0x100) /go/src/pkg/encoding/json/encode.go:671 +0x62 fp=0xc209f13978 encoding/json.(*encodeState).reflectValue(0xc2085724d0, 0x8a9de0, 0xc209f33ba0, 0x0, 0x172) /go/src/pkg/encoding/json/encode.go:297 +0x85 fp=0xc209f139b0 encoding/json.(*encodeState).marshal(0xc2085724d0, 0x8a9de0, 0xc209f33ba0, 0x0, 0x0) /go/src/pkg/encoding/json/encode.go:268 +0xc2 fp=0xc209f139f8 encoding/json.(*Encoder).Encode(0xc2095841a0, 0x8a9de0, 0xc209f33ba0, 0x0, 0x0) /go/src/pkg/encoding/json/stream.go:161 +0x94 fp=0xc209f13a70 |
It looks like there is no synchronization between threads logging with Debug=3, so the log becomes pretty garbled once the concurrent requests start. I've put as much information as I can into the log. If there are specific pointer values from the log you think would help, let me know. I know how frustrating this must be. Attachments:
|
I asked about PtrTo etc because the type that reflect is being asked about is an allocated pointer (the 40-bit number 0xc208248e10), which is either a runtime-constructed type or not a type at all (a bug). It seems likely to be a bug (and the reason for the crash) but I'm not 100% sure. One thing that might be interesting would be to edit reflect/type.go to put a print before the Implements method calls implements: func (t *rtype) Implements(u Type) bool { if u == nil { panic("reflect: nil type passed to Type.Implements") } if u.Kind() != Interface { panic("reflect: non-interface type passed to Type.Implements") } println("reflect.implements", u.(*rtype), t) return implements(u.(*rtype), t) } and then run the program and see what kinds of values it prints normally and if they are obviously different for the crashing call. In addition to that, it would be useful to turn off Debug=3 but then run with GODEBUG=allocfreetrace=1. That should not cause mixed output like the Debug=3 trace did. There will be file names and function names in the trace. If you would prefer to mail me the trace directly that's fine. If you want to go to the trouble to obfuscate the names you can do that too but please do something deterministic so that the same name appearing multiple times gets the same rewrite each time. |
The attached crash-debug-3.log contains: [signal 0xb code=0x80 addr=0x0 pc=0x54736b] goroutine 1824 [running]: runtime.throw(0x10d2a42) /go/src/pkg/runtime/panic.c:520 +0x69 fp=0xc208d9b300 runtime.sigpanic() /go/src/pkg/runtime/os_linux.c:240 +0x13f fp=0xc208d9b318 reflect.implements(0x99c260, 0x7f6c901b5a60, 0x99c260) /go/src/pkg/reflect/type.go:1197 +0x1ab fp=0xc208d9b370 reflect.(*rtype).Implements(0x7f6c901b5a60, 0x7f6c901a01d8, 0x99c260, 0x208d9b3f8) /go/src/pkg/reflect/type.go:1133 +0x110 fp=0xc208d9b3a8 encoding/json.newTypeEncoder(0x7f6c901a01d8, 0x7f6c901b5a60, 0xc208d9b401, 0xc208d9b3f8) /go/src/pkg/encoding/json/encode.go:356 +0x47 fp=0xc208d9b3d8 encoding/json.typeEncoder(0x7f6c901a01d8, 0x7f6c901b5a60, 0x0) /go/src/pkg/encoding/json/encode.go:340 +0x1c5 fp=0xc208d9b428 encoding/json.valueEncoder(0x7f6c901b5a60, 0xc2080fe620, 0x0, 0x2, 0x1ec) /go/src/pkg/encoding/json/encode.go:311 +0x8d fp=0xc208d9b468 encoding/json.(*encodeState).reflectValue(0xc2080cbad0, 0x7f6c901b5a60, 0xc2080fe620, 0x0, 0x2) /go/src/pkg/encoding/json/encode.go:297 +0x45 fp=0xc208d9b4a0 encoding/json.interfaceEncoder(0xc2080cbad0, 0x8c4820, 0xc209958250, 0x0, 0x146, 0x8c4800) /go/src/pkg/encoding/json/encode.go:555 +0xe9 fp=0xc208d9b4f0 Where the type is 0x7f6c901a01d8, which definitely neither static type nor dynamic type. Which is clearly bogus. It looks like a C heap pointer. Does the program uses cgo? |
Russ, I'll try both of those suggestions. Dmitry, the program does use CGO, but I've ruled it out by removing that package and the program still crashes in the same way: fatal error: fault [signal 0xb code=0x80 addr=0x0 pc=0x546144] goroutine 100 [running]: runtime.throw(0x10d0a02) /go/src/pkg/runtime/panic.c:520 +0x69 fp=0xc208bf1300 runtime.sigpanic() /go/src/pkg/runtime/os_linux.c:240 +0x13f fp=0xc208bf1318 reflect.implements(0x99aa80, 0xc208eb1fc0, 0x99aa80) /go/src/pkg/reflect/type.go:1194 +0x154 fp=0xc208bf1370 reflect.(*rtype).Implements(0xc208eb1fc0, 0x7f86e621d1d8, 0x99aa80, 0xc208eb1fc0) /go/src/pkg/reflect/type.go:1133 +0x110 fp=0xc208bf13a8 encoding/json.newTypeEncoder(0x7f86e621d1d8, 0xc208eb1fa0, 0xc208bf1401, 0xc208bf13f8) /go/src/pkg/encoding/json/encode.go:360 +0xd8 fp=0xc208bf13d8 encoding/json.typeEncoder(0x7f86e621d1d8, 0xc208eb1fa0, 0x0) /go/src/pkg/encoding/json/encode.go:340 +0x1c5 fp=0xc208bf1428 encoding/json.valueEncoder(0xc208eb1fa0, 0x0, 0x0, 0x2, 0x1ec) /go/src/pkg/encoding/json/encode.go:311 +0x8d fp=0xc208bf1468 etc... |
Log from adding println to reflect Implements. Attachments:
|
I found another place where we were using CGO I had forgotten about because we retired that code, but it was still being called once during setup. I removed that also but it still crashes and it still has pointers that look like C heap pointers: reflect.(*rtype).Implements(0xc2082c9540, 0x7fc332e9f1d8, 0x987400, 0xc2082c9540) /go/src/pkg/reflect/type.go:1134 +0x177 fp=0xc209cf33a8 encoding/json.newTypeEncoder(0x7fc332e9f1d8, 0xc208835d40, 0xc209cf3401, 0xc209cf33f8) All of our CGO code was C++, and I'm not seeing libstdc++ linked anymore so I'm pretty sure I've gotten rid of all our CGO calls now. linux-vdso.so.1 => (0x00007fff405fe000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f8dc8bac000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f8dc87e4000) /lib64/ld-linux-x86-64.so.2 (0x00007f8dc8ddd000) Is CGO used in the std library at all? Those are the only packages I haven't audited for CGO usage. |
If you run the program under gdb (gdb ./mybin) and set a breakpoint to runtime.cgocall ("b runtime.cgocall"), and then run ("r"). If the breakpoint is hit, somebody calls cgo. Then you can modify runtime·cgocall() in pkg/runtime/cgocall.c and insert runtime·throw("HERE"); in the beginning of the function, rebuild runtime. Crash will show who calls cgo. |
Definitely still some CGO calls left. The gdb backtraces are pretty useless, though if they can be trusted at all the calls during setup are: getpwuid_r from os package cmalloc from runtime package (can't tell where or why this is called) net.cgoLookupIPCNAME And while making concurrent requests: getadddrinfo from the net package freeaddrinfo |
The stack pointer doesn't really make sense. I said an 'm stack' pointer, which should not be found on an ordinary goroutine stack and certainly not in the middle of a JSON structure. I think we've spent as much time as we can on this and aren't going to get anywhere without being able to see and run the code ourselves. There may be a bug on our side, and if so I'd like to fix it, but so far there's no evidence it's not a problem in your code. Labels changed: added release-none, removed release-go1.3. |
I believe I figured it out and I'm fairly certain it is a bug in our code, not Go. A race that went undetected by the race detector. The fact that its worked for so long under Go1.2 is probably luck. I've been going field by field through the struct that was being serialized and I ran across some code that looks something like this: http://play.golang.org/p/MosS_UQQLm The code was attempting lazy loading of the Images by returning the *Image and then setting the fields inside another goroutine, relying on the channel send/recv to synchronize. I don't think that is safe. As soon as I removed this lazy loading, the crash went away. Apologies for wasting so much of your time. Perhaps this can become a feature request and test case for the race detector. |
Thanks for following up. I don't actually see a problem in your playground code, though I may not have looked closely enough. It's OK to set a field in a struct and then send a value on a channel indicating that the struct is initialized. When the channel value is received, the receiving goroutine can safely examine the struct fields. |
I'm still working on this and I'll open a new ticket if I can actually reproduce and prove a bug here. However, I wanted to update and say I unravelled the twisty maze of assignments the *Image goes through and at no point was the pointer dereferenced so I was wrong about there being a racy copy of the Image struct fields. I was also able to reproduce the crash after removing all concurrent assignment to the Image struct fields, so there goes that theory. |
I'd love to be able to figure out a reproducer, but I've spent days trying with no luck. The only thing I've discovered is how to make it more difficult to reproduce because after doing these things I haven't been able to make it crash. - Run with GOGC=off - Run with GODEBUG=allocfreetrace=1 - Run with GODEBUG=efence=1 - Don't make concurrent requests Here's where it gets weird. If I remove certain lines of code inside a goroutine, or if I don't spawn that goroutine but just call the code serially the crash behavior changes. Depending on how many lines I remove it goes from taking seconds to crash to minutes or not crashing for hours. I've tried to figure out what, if anything, is buggy in that code, but it all looks fine to me. In fact, none of the functions being called access any non-local variables or allocate memory beyond function arguments on the stack or return values. None of that code has access to the structure being serialized to JSON in the goroutine that crashes. My best guess at this point is that there is no bug in the code I'm removing, but that having it active raises the memory usage just enough in the right area to trigger the bug. The goroutine seems special only because it is at the deepest level of the call stack. |
This issue was closed.
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
by toffaletti:
The text was updated successfully, but these errors were encountered: