-
Notifications
You must be signed in to change notification settings - Fork 18k
runtime: bad pointer in unexpected span #12138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
CC @RLH @aclements |
@zeebo, and chance you're running with -race? We fixed a memory corruption bug after rc1 was branched, but it only affects running in -race mode. If possible, it would be great if you can run your system with GODEBUG=gccheckmark=1. It slows things down quite a bit, but enables some very useful debug code. |
Also if you have any of the other go routine stacks that would help. On Thursday, August 13, 2015, Austin Clements notifications@github.com
|
@zeebo, is this also with GOOS=linux GOARCH=arm GOARM=5? If so, are you actually running on an ARMv5 or something newer? Just some quick diagnosis from the traceback: the bad pointer is the *funcval field of a "specialfinalizer" finalizer record. It looks like a pretty reasonable pointer, but points in to a span that's in state _MSpanDead. There's only one way we're supposed to be able to observe a dead span in the h_spans array: if we free a large span and coalesce it with neighboring spans, only the first and last page in the coalesced range are guaranteed to be in state "free". The pages in between may point to any span, including one in state "dead". Given how far "p" is from "s.start", this seems the most likely way we found the dead span. If this is ARM, there's also always the possibility of a missing memory barrier, since h_spans is read by the garbage collector without locking. If we in fact freed that span, that means we either somehow created a bogus finalizer record, or we created a fine finalizer record, but failed to mark the funcval in an earlier cycle and the span got collected. One weird thing is that I would expect that funcval to be in a small object span that was only one page and hence could not leave behind "dead" spans in h_spans. This is a long shot, @zeebo, but do you have any finalizers that are closures that access particularly large variables from their enclosing frame? Or, in general, any unusual finalizers? |
It is not in race mode. It is an ARMv5 in the sense that no binaries will run on it if I don't specify GOARM=5. Unfortunately, I'm not much of a hardware guy so I don't know what that all entails. I don't know if we have any unusual finalizers, but it's possible. We use them mostly for freeing resources in our openssl wrapper and some debug checks if we forget to close a file handle or something. I'll do a double check of the code base to see if anything sticks out as really strange. Here is more of the stack dump. I'm not sure if it's all of it, but this is what I have:
I'll see if I can start running it with GODEBUG=gccheckmark=1. Is there anything in particular I should be observing with that flag? |
I believe the default is GOARM=6, so if it doesn't run with the default, you must have ARMv5. (I ask because ARM made a mess of its concurrency primitives, so it's basically impossible to run a multithreaded binary compiled for ARM v5 correctly on ARM v7 hardware.)
Panics. :) This enables an assertion in the runtime, so if it fails, it will panic with a bunch of debugging info. |
I'm seeing the same thing on Go 1.5rc1 on Darwin x86_64 with no such error on Go 1.4.2
|
Full stack trace (Not sure it'll help without the source)
|
This also panics on HEAD, so the bug hasn't been fixed since rc1. |
@jtsylve, it looks like your problem is probably different, since it doesn't involve finalizers or dead spans, though it might be the same. The check that's failing is quite general. Can you run and reproduce the problem with GODEBUG=gccheckmark=1? Also, it sounds like you can reproduce it fairly quickly. If that's right, I'd also like to know whether it happens with GODEBUG=gcstoptheworld=1,gcshrinkstackoff=1 and, if that stops it from happening, what happens with GODEBUG=gcstackbarrieroff=1. Also, is there use of "unsafe" in your code? And is it possible to share the code or reduce it to a shareable test case? |
Unfortunately I can't share the majority of the code and it's now a rather complicated codebase, so I'm not sure if I can reduce it to a POC to create the panic, but I'll see what I can do. Bingo on the "unsafe" call. I'm using a custom // #cgo LDFLAGS: -lz
// #include <zlib.h>
//
// // inflateInit is a macro so we need to do this
// int InflateInit(z_streamp s) {
// return inflateInit(s);
// }
import "C"
import (
"bytes"
"errors"
"io"
"unsafe"
)
type reader struct {
r io.Reader
inbuf []byte
s C.z_stream
}
// NewReader creates a new ReadCloser. Reads from the returned ReadCloser
// read and decompress data from r. The implementation buffers input and
// may read more data than necessary from r. It is the caller's
// responsibility to call Close on the ReadCloser when done.
func NewReader(r io.Reader) (io.ReadCloser, error) {
rd := &reader{
r: r,
}
err := C.InflateInit(&rd.s)
if err != C.Z_OK {
return nil, errors.New("Could not init inflate.")
}
return rd, nil
}
func (r *reader) Read(p []byte) (n int, err error) {
if len(p) == 0 {
return 0, nil
}
// We can assume that the input will be smaller than
// output.
maxIn := int64(len(p))
if r.inbuf == nil {
buf := bytes.NewBuffer(make([]byte, 0, maxIn))
nIn, err := io.CopyN(buf, r.r, maxIn)
if err != nil && err != io.EOF {
return 0, err
}
if nIn == 0 && err == io.EOF {
return 0, io.ErrUnexpectedEOF
}
r.s.avail_in = C.uInt(nIn)
r.s.next_in = (*C.Bytef)(unsafe.Pointer(&buf.Bytes()[0]))
} else {
// We still have input from the last call
r.s.avail_in = C.uInt(len(r.inbuf))
r.s.next_in = (*C.Bytef)(unsafe.Pointer(&r.inbuf[0]))
}
var read int
var ret C.int
for ret != C.Z_STREAM_END {
out := p[read:]
r.s.avail_out = C.uInt(len(out))
r.s.next_out = (*C.Bytef)(unsafe.Pointer(&out[0]))
ret = C.inflate(&r.s, C.Z_NO_FLUSH)
if ret != C.Z_STREAM_END && ret != C.Z_OK {
return 0, errors.New("Could not deflate input.")
}
read = len(out) - int(r.s.avail_out)
if r.s.avail_out == 0 {
break
}
}
if ret == C.Z_STREAM_END {
return read, io.EOF
}
if r.s.avail_in == 0 {
r.inbuf = nil
} else {
r.inbuf = C.GoBytes(unsafe.Pointer(r.s.next_in), C.int(r.s.avail_in))
}
return read, nil
}
// Close implements the io.Closer interface
//
// Calling Close does not close the wrapped io.Reader originally passed to NewReader
func (r *reader) Close() error {
err := C.inflateEnd(&r.s)
if err != C.Z_OK {
return errors.New("Could not end inflate")
}
return nil
} I'm not seeing much different in the outputs of setting those variables, but here they are. Perhaps I'm doing it wrong?
|
Thanks for the quick reply, @jtsylve.
is definitely not safe. As soon as you leave that scope, the backing store of buf will be reachable only through the C heap (via the z_stream_s object malloc'd by inflateInit). Since the GC can't trace through the C heap, it's free to free buf. At that point all bets are off. I'm actually kind of surprised this works in 1.4.2. I think you're just getting lucky. Depending on where p comes from, the same may be true of
|
I bet we have similar code of dubious nature in our code base. I haven't had any crashes with the debug environment variable yet. I wonder if automated tooling can be written that just detects if the last usage of a variable in some function is in an unsafe.Pointer(&var) kind of form. I'd be ok with false positives if it helped me find all those issues. |
One of the things we're hoping to do for 1.6 is to write down the rules of pointers and Cgo. We've even tinkered with ways of enforcing these rules, but so far no concrete changes have come of that.
I know that go vet does some checking of unsafe.Pointer use, though I'm not sure what exactly. |
As far as I'm aware the only checking it does is to make sure you don't squirrel away pointer values as integer types so the GC is still aware of them. :( |
How can I achieve a similar result that is considered "safe" unsafe code? On Fri, Aug 14, 2015, 13:03 Jeff notifications@github.com wrote:
|
FYI: I've seen this a while back (Aug 5) on the dashboard: http://build.golang.org/log/11aafe2c6f15fae5d9ab375e677e8114d366e339 |
Thanks, but all of the instances of this on the dashboard were from a cluster when we first enabled the bad pointer check. Russ promptly disabled it, then fixed the problems we were able to find on arm64 and re-enabled it in 4addec3. |
So I fixed the scoping issue. That wasn't the problem that was causing panics; however, I was able to figure out what was. The I assume this is intended behavior for the GC, but it is a real "gotya" and should be added to that new cgo documentation that was mentioned. For reference here's the updated (working) code. diff: @@ -23,11 +23,16 @@ package zlib
// #cgo LDFLAGS: -lz
// #include <zlib.h>
+// #include <stdlib.h>
//
// // inflateInit is a macro so we need to do this
// int InflateInit(z_streamp s) {
// return inflateInit(s);
// }
+//
+// z_streamp new_zstream() {
+// return (z_streamp) calloc(sizeof(z_stream), 1);
+// }
import "C"
import (
@@ -40,7 +45,7 @@ import (
type reader struct {
r io.Reader
inbuf []byte
- s C.z_stream
+ s *C.z_stream
}
// NewReader creates a new ReadCloser. Reads from the returned ReadCloser
@@ -50,9 +55,10 @@ type reader struct {
func NewReader(r io.Reader) (io.ReadCloser, error) {
rd := &reader{
r: r,
+ s: C.new_zstream(),
}
- err := C.InflateInit(&rd.s)
+ err := C.InflateInit(rd.s)
if err != C.Z_OK {
return nil, errors.New("Could not init inflate.")
}
@@ -69,8 +75,9 @@ func (r *reader) Read(p []byte) (n int, err error) {
// output.
maxIn := int64(len(p))
+ var buf *bytes.Buffer
if r.inbuf == nil {
- buf := bytes.NewBuffer(make([]byte, 0, maxIn))
+ buf = bytes.NewBuffer(make([]byte, 0, maxIn))
nIn, err := io.CopyN(buf, r.r, maxIn)
if err != nil && err != io.EOF {
@@ -96,7 +103,7 @@ func (r *reader) Read(p []byte) (n int, err error) {
r.s.avail_out = C.uInt(len(out))
r.s.next_out = (*C.Bytef)(unsafe.Pointer(&out[0]))
- ret = C.inflate(&r.s, C.Z_NO_FLUSH)
+ ret = C.inflate(r.s, C.Z_NO_FLUSH)
if ret != C.Z_STREAM_END && ret != C.Z_OK {
return 0, errors.New("Could not deflate input.")
@@ -126,10 +133,12 @@ func (r *reader) Read(p []byte) (n int, err error) {
//
// Calling Close does not close the wrapped io.Reader originally passed to NewReader
func (r *reader) Close() error {
- err := C.inflateEnd(&r.s)
+ err := C.inflateEnd(r.s)
if err != C.Z_OK {
return errors.New("Could not end inflate")
}
+ C.free(unsafe.Pointer(r.s))
+
return nil
} Full code: // #cgo LDFLAGS: -lz
// #include <zlib.h>
// #include <stdlib.h>
//
// // inflateInit is a macro so we need to do this
// int InflateInit(z_streamp s) {
// return inflateInit(s);
// }
//
// z_streamp new_zstream() {
// return (z_streamp) calloc(sizeof(z_stream), 1);
// }
import "C"
import (
"bytes"
"errors"
"io"
"unsafe"
)
type reader struct {
r io.Reader
inbuf []byte
s *C.z_stream
}
// NewReader creates a new ReadCloser. Reads from the returned ReadCloser
// read and decompress data from r. The implementation buffers input and
// may read more data than necessary from r. It is the caller's
// responsibility to call Close on the ReadCloser when done.
func NewReader(r io.Reader) (io.ReadCloser, error) {
rd := &reader{
r: r,
s: C.new_zstream(),
}
err := C.InflateInit(rd.s)
if err != C.Z_OK {
return nil, errors.New("Could not init inflate.")
}
return rd, nil
}
func (r *reader) Read(p []byte) (n int, err error) {
if len(p) == 0 {
return 0, nil
}
// We can assume that the input will be smaller than
// output.
maxIn := int64(len(p))
var buf *bytes.Buffer
if r.inbuf == nil {
buf = bytes.NewBuffer(make([]byte, 0, maxIn))
nIn, err := io.CopyN(buf, r.r, maxIn)
if err != nil && err != io.EOF {
return 0, err
}
if nIn == 0 && err == io.EOF {
return 0, io.ErrUnexpectedEOF
}
r.s.avail_in = C.uInt(nIn)
r.s.next_in = (*C.Bytef)(unsafe.Pointer(&buf.Bytes()[0]))
} else {
// We still have input from the last call
r.s.avail_in = C.uInt(len(r.inbuf))
r.s.next_in = (*C.Bytef)(unsafe.Pointer(&r.inbuf[0]))
}
var read int
var ret C.int
for ret != C.Z_STREAM_END {
out := p[read:]
r.s.avail_out = C.uInt(len(out))
r.s.next_out = (*C.Bytef)(unsafe.Pointer(&out[0]))
ret = C.inflate(r.s, C.Z_NO_FLUSH)
if ret != C.Z_STREAM_END && ret != C.Z_OK {
return 0, errors.New("Could not deflate input.")
}
read = len(out) - int(r.s.avail_out)
if r.s.avail_out == 0 {
break
}
}
if ret == C.Z_STREAM_END {
return read, io.EOF
}
if r.s.avail_in == 0 {
r.inbuf = nil
} else {
r.inbuf = C.GoBytes(unsafe.Pointer(r.s.next_in), C.int(r.s.avail_in))
}
return read, nil
}
// Close implements the io.Closer interface
//
// Calling Close does not close the wrapped io.Reader originally passed to NewReader
func (r *reader) Close() error {
err := C.inflateEnd(r.s)
if err != C.Z_OK {
return errors.New("Could not end inflate")
}
C.free(unsafe.Pointer(r.s))
return nil
} |
Sorry, I'm in New Hampshire right now and my Internet barely works, but I'm
trying to keep up.
Joe, I'm afraid your code still has the problem I said. You're storing a
pointer to an object allocated on the Go heap in an object allocated on the
C heap. If you're going to do that, you must keep a reference to that
object from the Go heap for at least as long as it's accessible from C.
Otherwise there's simply no way the garbage collector can know it needs to
keep that object around. In general, it's much better to not do this at all
and to allocate any such pointers from the C heap.
Also, there's nothing wrong with storing a pointer to the C heap in a Go
object. In fact, you're still doing exactly that, you're just storing a
pointer to a pointer now.
|
BTW, @zeebo, if you have any other panics of this form (or other forms, for that matter), just having more samples would be very helpful for establishing or ruling out patterns. |
Initially I wasn't storing a pointer to a C heap, I was allocating the struct directly on the Go heap ( As far as the scoping issues, I may be missing something. The |
In my copy of zlib, z_stream is a pointer type and inflateInit mallocs
memory and returns the z_stream pointer to it. Perhaps your version is
different.
And, again, even if it were a struct and it contained pointers initialized
by zlib, that would be perfectly fine because the Go heap is allowed to
point to the C heap.
Sorry, I was sloppy when I said "scope." In fact, the garbage collector is
free to free buf the moment after you last mention it in your code, which
is before the inflate call.
|
Indeed it must be (from zlib.h shipped with OS X) typedef struct z_stream_s {
Bytef *next_in; /* next input byte */
uInt avail_in; /* number of bytes available at next_in */
uLong total_in; /* total nb of input bytes read so far */
Bytef *next_out; /* next output byte should be put there */
uInt avail_out; /* remaining free space at next_out */
uLong total_out; /* total nb of bytes output so far */
char *msg; /* last error message, NULL if no error */
struct internal_state FAR *state; /* not visible by applications */
alloc_func zalloc; /* used to allocate the internal state */
free_func zfree; /* used to free the internal state */
voidpf opaque; /* private data object passed to zalloc and zfree */
int data_type; /* best guess about the data type: binary or text */
uLong adler; /* adler32 value of the uncompressed data */
uLong reserved; /* reserved for future use */
} z_stream;
typedef z_stream FAR *z_streamp; ZEXTERN int ZEXPORT inflateInit OF((z_streamp strm)); All the more reason not to make assumptions about library changes.
What about pointers to the Go heap that C adjusts? In the above case the
That is something I didn't know about the GC! I wouldn't have assumed that was the case since the reference is still being stored in the variable, but it does make sense. |
Something else I just noticed is that some of those pointers in |
I upgraded my system to use go1.5rc1 and let it run for a couple of days. I observed this panic:
Just as last time (#11640) I'm willing to do whatever I can to help diagnose.
The text was updated successfully, but these errors were encountered: