Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reflect: reflect.unpackEface reflect/value.go:174 unexpected fault address 0x0 #7841

Closed
gopherbot opened this issue Apr 23, 2014 · 46 comments
Closed

Comments

@gopherbot
Copy link
Contributor

by toffaletti:

What does 'go version' print?

go version devel +f613443bb13a Tue Apr 22 21:12:15 2014 -0700 linux/amd64

What steps reproduce the problem?

I haven't been able to distill this to code I can share, but it involves concurrent
requests to an http server that spawns many goroutines and connects to other services to
assemble results. The crash happens in the code that is writing the response back to the
client. The crash is repeatable, but only when access to the server is concurrent. I've
compiled the code with the race detector and it crashes before any data races are
discovered.

The server has other resources which exercise most of the same code paths as the
crashing one but use static file input instead of talking to backend services and
spawning goroutines. I've yet to reproduce the crash using these resources.

The crash is not repeatable on 1.2.1 and is data race clean for a large set of inputs.

What happened?

unexpected fault address 0x0                                                            
                                                                                        
                                                 fatal error: fault                     
                                                                                        
                                                                                        
         [signal 0x7 code=0x80 addr=0x0 pc=0x5b4315]                                    
                                                                                        
                                                          

goroutine 115 [running]:
runtime.throw(0x12e48d6)
        /go/src/pkg/runtime/panic.c:520 +0x69 fp=0xc209eef238
runtime.sigpanic()
        /go/src/pkg/runtime/os_linux.c:232 +0x25b fp=0xc209eef250
reflect.unpackEface(0x64616f6c6e776f64, 0x676e69, 0x0, 0x0, 0x0, 0x0)
        /go/src/pkg/reflect/value.go:174 +0xd5 fp=0xc209eef2b0
reflect.Value.Elem(0xa882c0, 0xc2098662d0, 0x0, 0x146, 0x0, 0x0, 0x0, 0x0)
        /go/src/pkg/reflect/value.go:843 +0xdd fp=0xc209eef340
encoding/json.interfaceEncoder(0xc209ab8160, 0xa882c0, 0xc2098662d0, 0x0, 0x146,
0xa88200)
        /go/src/pkg/encoding/json/encode.go:555 +0xc6 fp=0xc209eef390
encoding/json.(*arrayEncoder).encode(0xc20815e248, 0xc209ab8160, 0xa73140, 0xc20855ab10,
0x0, 0x176, 0x47e300)
        /go/src/pkg/encoding/json/encode.go:685 +0x191 fp=0xc209eef410
encoding/json.*arrayEncoder.(encoding/json.encode)·fm(0xc209ab8160, 0xa73140,
0xc20855ab10, 0x0, 0x176, 0x47e300)
        /go/src/pkg/encoding/json/encode.go:692 +0x70 fp=0xc209eef450
encoding/json.(*sliceEncoder).encode(0xc20815e110, 0xc209ab8160, 0xa73140, 0xc20855ab10,
0x0, 0x176, 0x100)
        /go/src/pkg/encoding/json/encode.go:662 +0xf1 fp=0xc209eef490
encoding/json.*sliceEncoder.(encoding/json.encode)·fm(0xc209ab8160, 0xa73140,
0xc20855ab10, 0x0, 0x176, 0x0)
        /go/src/pkg/encoding/json/encode.go:671 +0x70 fp=0xc209eef4d0
encoding/json.(*structEncoder).encode(0xc209432030, 0xc209ab8160, 0xc41880,
0xc20855aa80, 0x0, 0x196, 0x47e300)
        /go/src/pkg/encoding/json/encode.go:582 +0x337 fp=0xc209eef648
encoding/json.*structEncoder.(encoding/json.encode)·fm(0xc209ab8160, 0xc41880,
0xc20855aa80, 0x0, 0x196, 0xc20855aa00)
        /go/src/pkg/encoding/json/encode.go:596 +0x70 fp=0xc209eef688
encoding/json.(*ptrEncoder).encode(0xc20815e180, 0xc209ab8160, 0xa615c0, 0xc209138b50,
0x0, 0x166, 0x0)
        /go/src/pkg/encoding/json/encode.go:704 +0x15d fp=0xc209eef6f8
encoding/json.*ptrEncoder.(encoding/json.encode)·fm(0xc209ab8160, 0xa615c0,
0xc209138b50, 0x0, 0x166, 0xa61500)
        /go/src/pkg/encoding/json/encode.go:709 +0x70 fp=0xc209eef738
encoding/json.(*arrayEncoder).encode(0xc20815e188, 0xc209ab8160, 0xa70080, 0xc20a4eec60,
0x0, 0x172, 0x47e300)
        /go/src/pkg/encoding/json/encode.go:685 +0x191 fp=0xc209eef7b8
encoding/json.*arrayEncoder.(encoding/json.encode)·fm(0xc209ab8160, 0xa70080,
0xc20a4eec60, 0x0, 0x172, 0x7fdb2af0c100)
        /go/src/pkg/encoding/json/encode.go:692 +0x70 fp=0xc209eef7f8
encoding/json.(*sliceEncoder).encode(0xc20815e190, 0xc209ab8160, 0xa70080, 0xc20a4eec60,
0x0, 0x172, 0x47e300)
        /go/src/pkg/encoding/json/encode.go:662 +0xf1 fp=0xc209eef838
encoding/json.*sliceEncoder.(encoding/json.encode)·fm(0xc209ab8160, 0xa70080,
0xc20a4eec60, 0x0, 0x172, 0xc20a4eec00)
        /go/src/pkg/encoding/json/encode.go:671 +0x70 fp=0xc209eef878
encoding/json.(*encodeState).reflectValue(0xc209ab8160, 0xa70080, 0xc20a4eec60, 0x0,
0x172)
        /go/src/pkg/encoding/json/encode.go:297 +0x93 fp=0xc209eef8b0
encoding/json.(*encodeState).marshal(0xc209ab8160, 0xa70080, 0xc20a4eec60, 0x0, 0x0)
        /go/src/pkg/encoding/json/encode.go:268 +0xf6 fp=0xc209eef900
encoding/json.(*Encoder).Encode(0xc20945cb60, 0xa70080, 0xc20a4eec60, 0x0, 0x0)
        /go/src/pkg/encoding/json/stream.go:161 +0xf4 fp=0xc209eef9b0

What should have happened instead?

Please provide any additional information below.

Perhaps related to #6788?
@gopherbot
Copy link
Contributor Author

Comment 1 by toffaletti:

I found a slight variation in the crash caused by different input parameters.
unexpected fault address 0x656d697477
fatal error: fault
[signal 0xb code=0x1 addr=0x656d697477 pc=0x5b1029]
goroutine 509 [running]:
runtime.throw(0x12e4942)
        /go/src/pkg/runtime/panic.c:520 +0x69 fp=0xc208825128
runtime.sigpanic()
        /go/src/pkg/runtime/os_linux.c:240 +0x13f fp=0xc208825140
reflect.implements(0xb55860, 0xc208e431b0, 0xb55860)
        /go/src/pkg/reflect/type.go:1194 +0x369 fp=0xc2088251e8
reflect.(*rtype).Implements(0xc208e431b0, 0x7f612d8371c8, 0xb55860, 0xc208e431b0)
        /go/src/pkg/reflect/type.go:1133 +0x11c fp=0xc208825220
encoding/json.newTypeEncoder(0x7f612d8371c8, 0xc208e43170, 0xc208825201, 0xc208825290)
        /go/src/pkg/encoding/json/encode.go:360 +0x121 fp=0xc208825260
encoding/json.typeEncoder(0x7f612d8371c8, 0xc208e43170, 0x0)
        /go/src/pkg/encoding/json/encode.go:340 +0x2d1 fp=0xc2088252c8
encoding/json.valueEncoder(0xc208e43170, 0x0, 0x0, 0x2, 0xc208cf2390)
        /go/src/pkg/encoding/json/encode.go:311 +0x8c fp=0xc208825308
encoding/json.(*encodeState).reflectValue(0xc208528b00, 0xc208e43170, 0x0, 0x0, 0x2)
        /go/src/pkg/encoding/json/encode.go:297 +0x53 fp=0xc208825340
encoding/json.interfaceEncoder(0xc208528b00, 0xa882c0, 0xc208e43160, 0x0, 0x146,
0xa88200)

@ianlancetaylor
Copy link
Member

Comment 2:

This sounds like a GC issue.  It's going to be hard for us to debug by proxy.  Anything
you can do to get us a test case we can run ourselves would be greatly appreciated.

Labels changed: added repo-main, release-go1.3.

@rsc
Copy link
Contributor

rsc commented Apr 23, 2014

Comment 3:

In the absence of a test case, could you tell us more about where the string
"downloading" might come from? That's the bytes that are in the interface{} instead of a
valid interface representation.

Status changed to Accepted.

@gopherbot
Copy link
Contributor Author

Comment 4 by toffaletti:

I've spent some time trying to simulate parts of the servers behavior, but nothing has
reproduced the crash so far. I suspect the servers interaction with the GC is just too
complex to simulate with simplified code.
In a moment of clarity I had the idea of turning off the GC. With GOGC=off I am unable
to reproduce the crash, so I think we're on the right track blaming the GC.
Is there any debugging I can turn on to help diagnose the issue?
Can I rule out parts of the new GC by building Go at specific commits and test
before/after behavior?

@rsc
Copy link
Contributor

rsc commented Apr 23, 2014

Comment 5:

You can try changing the top of src/pkg/runtime/mgc0.c to say 'Debug=2' instead of
'Debug=0' and see if that crashes earlier. If so, try 'Debug=3' to get a detailed log.
You will want to redirect standard error to a file in that case. I like to redirect
piped through gzip to reduce the size of the file, which can help I/O significantly
your-prog 2>&1 | gzip > log.gz

@gopherbot
Copy link
Contributor Author

Comment 6 by toffaletti:

The string "downloading" appears in some of the JSON the server has fetched from a
backend service and decoded into a map[string]interface{}. That map is then stored in an
interface{} inside the struct that is being encoded when the crash occurs. The string is
in a JSON array, so that part of the stack trace makes sense.
I tried running under gdb to examine the memory more, but the stack in gdb is nonsense.
The function names and source lines are off:
reflect.implements (T=0xc2083333a0, V=0xc208333338, ~r2=16) at
/go/src/pkg/net/http/chunked.go:1

@gopherbot
Copy link
Contributor Author

Comment 7 by toffaletti:

With Debug=2 it crashes in a slightly different place:
runtime.sigpanic()
        /go/src/pkg/runtime/os_linux.c:240 +0x13f fp=0xc208a26cc8
encoding/json.(*encodeState).string(0xc2080400b0, 0x707474683c5f6e65,
0x64657062642f2f3a, 0x186, 0x0, 0x0)
0xc2080400b0 appears in the GC Debug output and in the stack trace a lot:
$ grep 0xc2080400b0 log 
scanblock 0xc2080400b0 176 type 0x9ff801 sync.poolLocal pc=0x859000
scanblock 0xc2080400b0 176 type 0x9ff801 sync.poolLocal pc=0x859000
scanblock 0xc2080400b0 176 type 0x9ff801 sync.poolLocal pc=0x859000
scanblock 0xc2080400b0 176 type 0x9ff801 sync.poolLocal pc=0x859000
scanblock 0xc2080400b0 176 type 0x9ff801 sync.poolLocal pc=0x859000
encoding/json.(*encodeState).string(0xc2080400b0, 0x707474683c5f6e65,
0x64657062642f2f3a, 0x186, 0x0, 0x0)
encoding/json.stringEncoder(0xc2080400b0, 0x89f420, 0xc208d3a640, 0x0, 0x186, 0x0)
encoding/json.(*structEncoder).encode(0xc208c88a20, 0xc2080400b0, 0x9e4200,
0xc208d3a640, 0x0, 0x196, 0x0)
encoding/json.*structEncoder.(encoding/json.encode)·fm(0xc2080400b0, 0x9e4200,
0xc208d3a640, 0x0, 0x196, 0xc208d3a600)
encoding/json.(*ptrEncoder).encode(0xc2081da0b8, 0xc2080400b0, 0x873ce0, 0xc208d3a640,
0x0, 0x160, 0xc208d3a600)
encoding/json.*ptrEncoder.(encoding/json.encode)·fm(0xc2080400b0, 0x873ce0,
0xc208d3a640, 0x0, 0x160, 0x0)
encoding/json.(*encodeState).reflectValue(0xc2080400b0, 0x873ce0, 0xc208d3a640, 0x0,
0x160)
encoding/json.interfaceEncoder(0xc2080400b0, 0x896fc0, 0xc208362e10, 0x0, 0x146,
0x896f00)
encoding/json.(*arrayEncoder).encode(0xc20803e120, 0xc2080400b0, 0x881ea0, 0xc20868cda8,
0x0, 0x176, 0xb47d00)
encoding/json.*arrayEncoder.(encoding/json.encode)·fm(0xc2080400b0, 0x881ea0,
0xc20868cda8, 0x0, 0x176, 0x0)
encoding/json.(*sliceEncoder).encode(0xc20803e128, 0xc2080400b0, 0x881ea0, 0xc20868cda8,
0x0, 0x176, 0x200)
encoding/json.*sliceEncoder.(encoding/json.encode)·fm(0xc2080400b0, 0x881ea0,
0xc20868cda8, 0x0, 0x176, 0x0)
encoding/json.(*structEncoder).encode(0xc2089f41b0, 0xc2080400b0, 0xa0d4c0,
0xc20868cd20, 0x0, 0x196, 0x0)
encoding/json.*structEncoder.(encoding/json.encode)·fm(0xc2080400b0, 0xa0d4c0,
0xc20868cd20, 0x0, 0x196, 0xc20868cd00)
encoding/json.(*ptrEncoder).encode(0xc208568188, 0xc2080400b0, 0x873c20, 0xc20868cd20,
0x0, 0x160, 0xc20868cd00)
encoding/json.*ptrEncoder.(encoding/json.encode)·fm(0xc2080400b0, 0x873c20,
0xc20868cd20, 0x0, 0x160, 0x0)
encoding/json.(*encodeState).reflectValue(0xc2080400b0, 0x873c20, 0xc20868cd20, 0x0,
0x160)
encoding/json.interfaceEncoder(0xc2080400b0, 0x896fc0, 0xc208d056d0, 0x0, 0x146,
0x896f00)
encoding/json.(*arrayEncoder).encode(0xc20803e120, 0xc2080400b0, 0x881ea0, 0xc208208bd0,
0x0, 0x176, 0xb48a00)
encoding/json.*arrayEncoder.(encoding/json.encode)·fm(0xc2080400b0, 0x881ea0,
0xc208208bd0, 0x0, 0x176, 0x0)
encoding/json.(*sliceEncoder).encode(0xc20803e128, 0xc2080400b0, 0x881ea0, 0xc208208bd0,
0x0, 0x176, 0x0)
encoding/json.*sliceEncoder.(encoding/json.encode)·fm(0xc2080400b0, 0x881ea0,
0xc208208bd0, 0x0, 0x176, 0x0)
encoding/json.(*structEncoder).encode(0xc208582030, 0xc2080400b0, 0xa502a0,
0xc208208b40, 0x0, 0x196, 0x0)
encoding/json.*structEncoder.(encoding/json.encode)·fm(0xc2080400b0, 0xa502a0,
0xc208208b40, 0x0, 0x196, 0xc208208b00)
encoding/json.(*ptrEncoder).encode(0xc208218048, 0xc2080400b0, 0x872660, 0xc2085d40e0,
0x0, 0x166, 0x0)
encoding/json.*ptrEncoder.(encoding/json.encode)·fm(0xc2080400b0, 0x872660,
0xc2085d40e0, 0x0, 0x166, 0x872600)
encoding/json.(*arrayEncoder).encode(0xc208218060, 0xc2080400b0, 0x87ede0, 0xc208516040,
0x0, 0x172, 0x1062300)
encoding/json.*arrayEncoder.(encoding/json.encode)·fm(0xc2080400b0, 0x87ede0,
0xc208516040, 0x0, 0x172, 0x7fdcf4130100)
encoding/json.(*sliceEncoder).encode(0xc208218068, 0xc2080400b0, 0x87ede0, 0xc208516040,
0x0, 0x172, 0x7fdcf4135200)
encoding/json.*sliceEncoder.(encoding/json.encode)·fm(0xc2080400b0, 0x87ede0,
0xc208516040, 0x0, 0x172, 0x100)
encoding/json.(*encodeState).reflectValue(0xc2080400b0, 0x87ede0, 0xc208516040, 0x0,
0x172)
encoding/json.(*encodeState).marshal(0xc2080400b0, 0x87ede0, 0xc208516040, 0x0, 0x0)

@gopherbot
Copy link
Contributor Author

Comment 8 by toffaletti:

With Debug=3:
runtime.panic(0x9f6120, 0x1045833)
        /go/src/pkg/runtime/panic.c:248 +0x18d
reflect.(*rtype).ptrTo(0xc2080c7500, 0x7f30d903b1c8)
        /go/src/pkg/reflect/type.go:1084 +0x40c
reflect.PtrTo(0x7f30d903b1c8, 0xc2080c7500, 0x0, 0x0)
        /go/src/pkg/reflect/type.go:1045 +0x5a
encoding/json.newTypeEncoder(0x7f30d903b1c8, 0xc2080c7500, 0xc208d41001, 0xc208d410d0)
        /go/src/pkg/encoding/json/encode.go:360 +0xaa
encoding/json.typeEncoder(0x7f30d903b1c8, 0xc2080c7500, 0x0)
        /go/src/pkg/encoding/json/encode.go:340 +0x1c5
encoding/json.valueEncoder(0xc2080c7500, 0xdeaddeaddeaddead, 0x0, 0x2, 0x74000014)
        /go/src/pkg/encoding/json/encode.go:311 +0x79
encoding/json.(*encodeState).reflectValue(0xc20875f080, 0xc2080c7500,
0xdeaddeaddeaddead, 0x0, 0x2)
        /go/src/pkg/encoding/json/encode.go:297 +0x45
encoding/json.interfaceEncoder(0xc20875f080, 0x897540, 0xc2080c7340, 0x0, 0x146, 0x0)
        /go/src/pkg/encoding/json/encode.go:555 +0xe9
encoding/json.(*structEncoder).encode(0xc208cee870, 0xc20875f080, 0xa0da40,
0xc2080c7260, 0x0, 0x196, 0xc200000000)
        /go/src/pkg/encoding/json/encode.go:582 +0x2ad
Anything I can grep out of the log to help?

@dvyukov
Copy link
Member

dvyukov commented May 2, 2014

Comment 9:

toffaletti, please do the following:
1. crash it several times and attach 5 different crash reports, I am mostly interested
in the crash report header and the first stack (this can help to localize the issue)
2. run with GODEBUG=efence=1 environment variable, and attach several crash reports
(this can also help to localize the issue)
For now it looks like an interface value gets incorrectly freed (potentially when
reached through reflect.Value), but I am not sure.
I am also interested in the exact data structure that you serialize. I've tried to
recreate it based on the traces, I see that there is "slice -> ptr -> struct -> slice ->
interface", here is what I've got:
http://play.golang.org/p/y45Zs0kd4D
But it does not crash.
toffaletti, please try to modify this program as close to your real code as possible (in
terms of data types, structure of the object that you serialize and how you call
reflect.Marshall). And see if you can reproduce the crash on that program. If we would
be able to reproduce it on such a small program, then would allow us to debug it quickly.

@gopherbot
Copy link
Contributor Author

Comment 10 by toffaletti:

http://play.golang.org/p/O8eiPDUJzJ
This is the best I can do with the types involved, but I can't make it crash either.
My best guess, just because this is the most unusual thing I've noticed looking at the
types, is that it might have something to do with some cycles in the struct. Some of the
[]interface{} are assigned to types that then reference the parent or other types that
reference them, however the cycles are always hidden behind interface{}.
Sorry I can't be of more help. This code is quite complex and there is a good deal of it
that probably isn't even being used anymore. I plan on trying to remove a lot of the
dead code and once that is done perhaps I'll be able to distill it down to something
that crashes which I am able to share, but don't count on it.

@dvyukov
Copy link
Member

dvyukov commented May 12, 2014

Comment 11:

toffaletti, please do the following:
1. crash it several times and attach 5 different crash reports, I am mostly interested
in the crash report header and the first stack (this can help to localize the issue)
2. run with GODEBUG=efence=1 environment variable, and attach several crash reports
(this can also help to localize the issue)

@dvyukov
Copy link
Member

dvyukov commented May 12, 2014

Comment 12:

I've tried to reproduce the crash using your program (as is, and with some
modifications), but it does not crash for me.

@gopherbot
Copy link
Contributor Author

Comment 13 by toffaletti:

Attached 5 crash reports. I haven't been able to make it crash with GODEBUG=efence=1
yet. The process just takes more and more memory (a lot of virtual memory too) until I
restart it and try again.

Attachments:

  1. crashes.log (29323 bytes)

@rsc
Copy link
Contributor

rsc commented May 13, 2014

Comment 14:

toffaletti, it seems like you can reproduce this crash pretty easily. I fixed a bug
earlier today that could cause crashes of the general form you are seeing (partially
corrupt multiword values on stack). Could you please update to tip (let's say
2fb7165d5fdc to be concrete) and see if you can still reproduce the problem?
Thanks.

@gopherbot
Copy link
Contributor Author

Comment 15 by toffaletti:

I went with latest tip (1 commit past 2fb7165d5fdc):
go version devel +3ac2308bca2a Tue May 13 01:09:38 2014 -0400 linux/amd64
I was still able to reproduce the crash, though it took longer than it ever has before.
unexpected fault address 0x0
fatal error: fault
[signal 0xb code=0x80 addr=0x0 pc=0x5468a4]
goroutine 15011 [running]:
runtime.throw(0x10d1a42)
        /go/src/pkg/runtime/panic.c:520 +0x69 fp=0xc209f13300
runtime.sigpanic()
        /go/src/pkg/runtime/os_linux.c:240 +0x13f fp=0xc209f13318
reflect.implements(0x99b7a0, 0xc208248e10, 0x99b7a0)
        /go/src/pkg/reflect/type.go:1194 +0x154 fp=0xc209f13370
reflect.(*rtype).Implements(0xc208248e10, 0x7f3b750e11d8, 0x99b7a0, 0x9f133f8)
        /go/src/pkg/reflect/type.go:1133 +0x110 fp=0xc209f133a8
encoding/json.newTypeEncoder(0x7f3b750e11d8, 0xc208248e10, 0xc209f13401, 0xc209f133f8)
        /go/src/pkg/encoding/json/encode.go:356 +0x47 fp=0xc209f133d8
encoding/json.typeEncoder(0x7f3b750e11d8, 0xc208248e10, 0x0)
        /go/src/pkg/encoding/json/encode.go:340 +0x1c5 fp=0xc209f13428
encoding/json.valueEncoder(0xc208248e10, 0x0, 0x0, 0x2, 0x40)
        /go/src/pkg/encoding/json/encode.go:311 +0x8d fp=0xc209f13468
encoding/json.(*encodeState).reflectValue(0xc2085724d0, 0xc208248e10, 0x0, 0x0, 0x2)
        /go/src/pkg/encoding/json/encode.go:297 +0x45 fp=0xc209f134a0
encoding/json.interfaceEncoder(0xc2085724d0, 0x8c3d60, 0xc208248df0, 0x0, 0x146,
0x8c3d00)
        /go/src/pkg/encoding/json/encode.go:555 +0xe9 fp=0xc209f134f0
encoding/json.(*arrayEncoder).encode(0xc2083f00c0, 0xc2085724d0, 0x8ad020, 0xc20852bdd0,
0x0, 0x176, 0x4bef00)
        /go/src/pkg/encoding/json/encode.go:685 +0x132 fp=0xc209f13550
encoding/json.*arrayEncoder.(encoding/json.encode)·fm(0xc2085724d0, 0x8ad020,
0xc20852bdd0, 0x0, 0x176, 0x73000000)
        /go/src/pkg/encoding/json/encode.go:692 +0x62 fp=0xc209f13590
encoding/json.(*sliceEncoder).encode(0xc2083f00e0, 0xc2085724d0, 0x8ad020, 0xc20852bdd0,
0x0, 0x176, 0x100)
        /go/src/pkg/encoding/json/encode.go:662 +0xce fp=0xc209f135d0
encoding/json.*sliceEncoder.(encoding/json.encode)·fm(0xc2085724d0, 0x8ad020,
0xc20852bdd0, 0x0, 0x176, 0x0)
        /go/src/pkg/encoding/json/encode.go:671 +0x62 fp=0xc209f13610
encoding/json.(*structEncoder).encode(0xc2084a55f0, 0xc2085724d0, 0xa99cc0,
0xc20852bd40, 0x0, 0x196, 0x7f3b00000000)
        /go/src/pkg/encoding/json/encode.go:582 +0x2ad fp=0xc209f13788
encoding/json.*structEncoder.(encoding/json.encode)·fm(0xc2085724d0, 0xa99cc0,
0xc20852bd40, 0x0, 0x196, 0xc20852bd00)
        /go/src/pkg/encoding/json/encode.go:596 +0x62 fp=0xc209f137c8
encoding/json.(*ptrEncoder).encode(0xc2083be198, 0xc2085724d0, 0x89cee0, 0xc20836c228,
0x0, 0x166, 0x0)
        /go/src/pkg/encoding/json/encode.go:704 +0xfc fp=0xc209f13818
encoding/json.*ptrEncoder.(encoding/json.encode)·fm(0xc2085724d0, 0x89cee0,
0xc20836c228, 0x0, 0x166, 0x89ce00)
        /go/src/pkg/encoding/json/encode.go:709 +0x62 fp=0xc209f13858
encoding/json.(*arrayEncoder).encode(0xc2083be1a0, 0xc2085724d0, 0x8a9de0, 0xc209f33ba0,
0x0, 0x172, 0x10f5400)
        /go/src/pkg/encoding/json/encode.go:685 +0x132 fp=0xc209f138b8
encoding/json.*arrayEncoder.(encoding/json.encode)·fm(0xc2085724d0, 0x8a9de0,
0xc209f33ba0, 0x0, 0x172, 0x7f3b750e1100)
        /go/src/pkg/encoding/json/encode.go:692 +0x62 fp=0xc209f138f8
encoding/json.(*sliceEncoder).encode(0xc2083be1b0, 0xc2085724d0, 0x8a9de0, 0xc209f33ba0,
0x0, 0x172, 0x1112b00)
        /go/src/pkg/encoding/json/encode.go:662 +0xce fp=0xc209f13938
encoding/json.*sliceEncoder.(encoding/json.encode)·fm(0xc2085724d0, 0x8a9de0,
0xc209f33ba0, 0x0, 0x172, 0x100)
        /go/src/pkg/encoding/json/encode.go:671 +0x62 fp=0xc209f13978
encoding/json.(*encodeState).reflectValue(0xc2085724d0, 0x8a9de0, 0xc209f33ba0, 0x0,
0x172)
        /go/src/pkg/encoding/json/encode.go:297 +0x85 fp=0xc209f139b0
encoding/json.(*encodeState).marshal(0xc2085724d0, 0x8a9de0, 0xc209f33ba0, 0x0, 0x0)
        /go/src/pkg/encoding/json/encode.go:268 +0xc2 fp=0xc209f139f8
encoding/json.(*Encoder).Encode(0xc2095841a0, 0x8a9de0, 0xc209f33ba0, 0x0, 0x0)
        /go/src/pkg/encoding/json/stream.go:161 +0x94 fp=0xc209f13a70

@dvyukov
Copy link
Member

dvyukov commented May 13, 2014

Comment 16:

Hi toffaletti,
I've submitted another fix few seconds ago:
parent: 19952:b9c40afbd66c tip
 reflect: fix map type generation
Please retest with it.

@gopherbot
Copy link
Contributor Author

Comment 17 by toffaletti:

go version devel +b9c40afbd66c Tue May 13 09:53:47 2014 +0400 linux/amd64
crashes with same stack as previous build.

@rsc
Copy link
Contributor

rsc commented May 13, 2014

Comment 18:

Is there any way we can get more information about this? What I'd really like is a full
log generated with Debug=3. That will show us a lot of information about how the pointer
values evolve (and hopefully about how they are corrupted) during the execution of the
program.

@gopherbot
Copy link
Contributor Author

Comment 19 by toffaletti:

It looks like there is no synchronization between threads logging with Debug=3, so the
log becomes pretty garbled once the concurrent requests start. I've put as much
information as I can into the log. If there are specific pointer values from the log you
think would help, let me know. I know how frustrating this must be.

Attachments:

  1. crash-debug3-log.gz (8667 bytes)

@rsc
Copy link
Contributor

rsc commented May 13, 2014

Comment 20:

Does your program have any calls to the type constructors reflect.PtrTo,
reflect.SliceOf, reflect.MapOf, reflect.ChanOf?

@gopherbot
Copy link
Contributor Author

Comment 21 by toffaletti:

There are some calls to reflect.MapOf once during startup when reading the config file,
but I'm pretty this is new code and it was crashing prior to that being added. I will
double check that and rule it out.

@gopherbot
Copy link
Contributor Author

Comment 22 by toffaletti:

I found an additional call in the goprotobuf package which we're using:
/src/code.google.com/p/goprotobuf/proto/properties.go:  prop.extendable =
reflect.PtrTo(t).Implements(extendableProtoType)
Though I'm not yet sure whether we're actually calling that part of the code.

@gopherbot
Copy link
Contributor Author

Comment 23 by toffaletti:

Confirmed we are calling reflect.PtrTo via goprotobuf package. The call to reflect.MapOf
was added Apr, 26 so after I filed this.

@rsc
Copy link
Contributor

rsc commented May 14, 2014

Comment 24:

I asked about PtrTo etc because the type that reflect is being asked about is an
allocated pointer (the 40-bit number 0xc208248e10), which is either a
runtime-constructed type or not a type at all (a bug). It seems likely to be a bug (and
the reason for the crash) but I'm not 100% sure. One thing that might be interesting
would be to edit reflect/type.go to put a print before the Implements method calls
implements:
func (t *rtype) Implements(u Type) bool {
        if u == nil {
                panic("reflect: nil type passed to Type.Implements")
        }
        if u.Kind() != Interface {
                panic("reflect: non-interface type passed to Type.Implements")
        }
println("reflect.implements", u.(*rtype), t)
        return implements(u.(*rtype), t)
}
and then run the program and see what kinds of values it prints normally and if they are
obviously different for the crashing call. 
In addition to that, it would be useful to turn off Debug=3 but then run with
GODEBUG=allocfreetrace=1. That should not cause mixed output like the Debug=3 trace did.
There will be file names and function names in the trace. If you would prefer to mail me
the trace directly that's fine. If you want to go to the trouble to obfuscate the names
you can do that too but please do something deterministic so that the same name
appearing multiple times gets the same rewrite each time.

@dvyukov
Copy link
Member

dvyukov commented May 14, 2014

Comment 25:

The attached crash-debug-3.log contains:
[signal 0xb code=0x80 addr=0x0 pc=0x54736b]
goroutine 1824 [running]:
runtime.throw(0x10d2a42)
    /go/src/pkg/runtime/panic.c:520 +0x69 fp=0xc208d9b300
runtime.sigpanic()
    /go/src/pkg/runtime/os_linux.c:240 +0x13f fp=0xc208d9b318
reflect.implements(0x99c260, 0x7f6c901b5a60, 0x99c260)
    /go/src/pkg/reflect/type.go:1197 +0x1ab fp=0xc208d9b370
reflect.(*rtype).Implements(0x7f6c901b5a60, 0x7f6c901a01d8, 0x99c260, 0x208d9b3f8)
    /go/src/pkg/reflect/type.go:1133 +0x110 fp=0xc208d9b3a8
encoding/json.newTypeEncoder(0x7f6c901a01d8, 0x7f6c901b5a60, 0xc208d9b401, 0xc208d9b3f8)
    /go/src/pkg/encoding/json/encode.go:356 +0x47 fp=0xc208d9b3d8
encoding/json.typeEncoder(0x7f6c901a01d8, 0x7f6c901b5a60, 0x0)
    /go/src/pkg/encoding/json/encode.go:340 +0x1c5 fp=0xc208d9b428
encoding/json.valueEncoder(0x7f6c901b5a60, 0xc2080fe620, 0x0, 0x2, 0x1ec)
    /go/src/pkg/encoding/json/encode.go:311 +0x8d fp=0xc208d9b468
encoding/json.(*encodeState).reflectValue(0xc2080cbad0, 0x7f6c901b5a60, 0xc2080fe620,
0x0, 0x2)
    /go/src/pkg/encoding/json/encode.go:297 +0x45 fp=0xc208d9b4a0
encoding/json.interfaceEncoder(0xc2080cbad0, 0x8c4820, 0xc209958250, 0x0, 0x146,
0x8c4800)
    /go/src/pkg/encoding/json/encode.go:555 +0xe9 fp=0xc208d9b4f0
Where the type is 0x7f6c901a01d8, which definitely neither static type nor dynamic type.
Which is clearly bogus.
It looks like a C heap pointer.
Does the program uses cgo?

@gopherbot
Copy link
Contributor Author

Comment 26 by toffaletti:

Russ, I'll try both of those suggestions.
Dmitry, the program does use CGO, but I've ruled it out by removing that package and the
program still crashes in the same way:
fatal error: fault
[signal 0xb code=0x80 addr=0x0 pc=0x546144]
goroutine 100 [running]:
runtime.throw(0x10d0a02)
        /go/src/pkg/runtime/panic.c:520 +0x69 fp=0xc208bf1300
runtime.sigpanic()
        /go/src/pkg/runtime/os_linux.c:240 +0x13f fp=0xc208bf1318
reflect.implements(0x99aa80, 0xc208eb1fc0, 0x99aa80)
        /go/src/pkg/reflect/type.go:1194 +0x154 fp=0xc208bf1370
reflect.(*rtype).Implements(0xc208eb1fc0, 0x7f86e621d1d8, 0x99aa80, 0xc208eb1fc0)
        /go/src/pkg/reflect/type.go:1133 +0x110 fp=0xc208bf13a8
encoding/json.newTypeEncoder(0x7f86e621d1d8, 0xc208eb1fa0, 0xc208bf1401, 0xc208bf13f8)
        /go/src/pkg/encoding/json/encode.go:360 +0xd8 fp=0xc208bf13d8
encoding/json.typeEncoder(0x7f86e621d1d8, 0xc208eb1fa0, 0x0)
        /go/src/pkg/encoding/json/encode.go:340 +0x1c5 fp=0xc208bf1428
encoding/json.valueEncoder(0xc208eb1fa0, 0x0, 0x0, 0x2, 0x1ec)
        /go/src/pkg/encoding/json/encode.go:311 +0x8d fp=0xc208bf1468
etc...

@dvyukov
Copy link
Member

dvyukov commented May 14, 2014

Comment 27:

Are you sure that you've removed cgo?
There is still 0x7f86e621d1d8 in the crash report, 0x7f86e621d1d8 is a C heap.

@gopherbot
Copy link
Contributor Author

Comment 28 by toffaletti:

Log from adding println to reflect Implements.

Attachments:

  1. crash-implements.log.gz (609 bytes)

@gopherbot
Copy link
Contributor Author

Comment 29 by toffaletti:

I found another place where we were using CGO I had forgotten about because we retired
that code, but it was still being called once during setup. I removed that also but it
still crashes and it still has pointers that look like C heap pointers:
reflect.(*rtype).Implements(0xc2082c9540, 0x7fc332e9f1d8, 0x987400, 0xc2082c9540)
        /go/src/pkg/reflect/type.go:1134 +0x177 fp=0xc209cf33a8
encoding/json.newTypeEncoder(0x7fc332e9f1d8, 0xc208835d40, 0xc209cf3401, 0xc209cf33f8)
All of our CGO code was C++, and I'm not seeing libstdc++ linked anymore so I'm pretty
sure I've gotten rid of all our CGO calls now.
        linux-vdso.so.1 =>  (0x00007fff405fe000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f8dc8bac000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f8dc87e4000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f8dc8ddd000)
Is CGO used in the std library at all? Those are the only packages I haven't audited for
CGO usage.

@dvyukov
Copy link
Member

dvyukov commented May 14, 2014

Comment 30:

If you run the program under gdb (gdb ./mybin) and set a breakpoint to runtime.cgocall
("b runtime.cgocall"), and then run ("r").
If the breakpoint is hit, somebody calls cgo.
Then you can modify runtime·cgocall() in pkg/runtime/cgocall.c and insert
    runtime·throw("HERE");
in the beginning of the function, rebuild runtime.
Crash will show who calls cgo.

@gopherbot
Copy link
Contributor Author

Comment 31 by toffaletti:

Definitely still some CGO calls left. The gdb backtraces are pretty useless, though if
they can be trusted at all the calls during setup are:
getpwuid_r from os package
cmalloc from runtime package (can't tell where or why this is called)
net.cgoLookupIPCNAME
And while making concurrent requests:
getadddrinfo from the net package
freeaddrinfo

@gopherbot
Copy link
Contributor Author

Comment 32 by toffaletti:

To clarify: mygetpwuid_r from lookupUnix in os/user

@gopherbot
Copy link
Contributor Author

Comment 33 by toffaletti:

I haven't been able to get a crash with GODEBUG=allocfreetrace=1. The log file is 4G so
far...

@rsc
Copy link
Contributor

rsc commented May 15, 2014

Comment 34:

That's one of those modes that works well piped through gzip.

@rsc
Copy link
Contributor

rsc commented May 15, 2014

Comment 35:

The other option for 0x7f6c901a01d8 is that it is an m stack pointer.

@gopherbot
Copy link
Contributor Author

Comment 36 by toffaletti:

A stack pointer would make more sense. The structure being serialized to JSON during the
panic is strictly made up of pure Go types. Nothing CGO related should be in there at
all.

@rsc
Copy link
Contributor

rsc commented May 15, 2014

Comment 37:

The stack pointer doesn't really make sense. I said an 'm stack' pointer, which should
not be found on an ordinary goroutine stack and certainly not in the middle of a JSON
structure.
I think we've spent as much time as we can on this and aren't going to get anywhere
without being able to see and run the code ourselves. There may be a bug on our side,
and if so I'd like to fix it, but so far there's no evidence it's not a problem in your
code.

Labels changed: added release-none, removed release-go1.3.

@gopherbot
Copy link
Contributor Author

Comment 38 by toffaletti:

I believe I figured it out and I'm fairly certain it is a bug in our code, not Go. A
race that went undetected by the race detector. The fact that its worked for so long
under Go1.2 is probably luck.
I've been going field by field through the struct that was being serialized and I ran
across some code that looks something like this:
http://play.golang.org/p/MosS_UQQLm
The code was attempting lazy loading of the Images by returning the *Image and then
setting the fields inside another goroutine, relying on the channel send/recv to
synchronize. I don't think that is safe. As soon as I removed this lazy loading, the
crash went away.
Apologies for wasting so much of your time. Perhaps this can become a feature request
and test case for the race detector.

@ianlancetaylor
Copy link
Member

Comment 39:

Thanks for following up.  I don't actually see a problem in your playground code, though
I may not have looked closely enough.  It's OK to set a field in a struct and then send
a value on a channel indicating that the struct is initialized.  When the channel value
is received, the receiving goroutine can safely examine the struct fields.

@gopherbot
Copy link
Contributor Author

Comment 40 by toffaletti:

I might not have perfectly captured the problem code in the example. I need to spend
more time figuring out exactly what is happening, but I think there is a copy involved
before the field is actually assigned in the struct final struct that is serialized.

@ianlancetaylor
Copy link
Member

Comment 41:

Status changed to Retracted.

@gopherbot
Copy link
Contributor Author

Comment 42 by toffaletti:

I'm still working on this and I'll open a new ticket if I can actually reproduce and
prove a bug here. However, I wanted to update and say I unravelled the twisty maze of
assignments the *Image goes through and at no point was the pointer dereferenced so I
was wrong about there being a racy copy of the Image struct fields. I was also able to
reproduce the crash after removing all concurrent assignment to the Image struct fields,
so there goes that theory.

@dvyukov
Copy link
Member

dvyukov commented May 16, 2014

Comment 43:

> A race that went undetected by the race detector
If you can provide a reproducer, I will be happy to look into why it was undetected.
We've spent a lot of time on this issue. If people in future will be able to catch it by
just running -race, what would be great.

@gopherbot
Copy link
Contributor Author

Comment 44 by toffaletti:

I'd love to be able to figure out a reproducer, but I've spent days trying with no luck.
The only thing I've discovered is how to make it more difficult to reproduce because
after doing these things I haven't been able to make it crash.
- Run with GOGC=off
- Run with GODEBUG=allocfreetrace=1
- Run with GODEBUG=efence=1
- Don't make concurrent requests
Here's where it gets weird.
If I remove certain lines of code inside a goroutine, or if I don't spawn that goroutine
but just call the code serially the crash behavior changes. Depending on how many lines
I remove it goes from taking seconds to crash to minutes or not crashing for hours. I've
tried to figure out what, if anything, is buggy in that code, but it all looks fine to
me. In fact, none of the functions being called access any non-local variables or
allocate memory beyond function arguments on the stack or return values. None of that
code has access to the structure being serialized to JSON in the goroutine that crashes.
My best guess at this point is that there is no bug in the code I'm removing, but that
having it active raises the memory usage just enough in the right area to trigger the
bug. The goroutine seems special only because it is at the deepest level of the call
stack.

@gopherbot
Copy link
Contributor Author

Comment 45 by toffaletti:

As of e4e288424f3a the crash has stopped reproducing.

@rsc
Copy link
Contributor

rsc commented May 19, 2014

Comment 46:

I don't see how e4e288424f3a could fix this crash.

@golang golang locked and limited conversation to collaborators Jun 25, 2016
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants