Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: found bad pointer in Go heap (incorrect use of unsafe or cgo?) #19135

Closed
jtsylve opened this issue Feb 17, 2017 · 12 comments
Closed

runtime: found bad pointer in Go heap (incorrect use of unsafe or cgo?) #19135

jtsylve opened this issue Feb 17, 2017 · 12 comments

Comments

@jtsylve
Copy link
Contributor

jtsylve commented Feb 17, 2017

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go version go1.8 darwin/386

What operating system and processor architecture are you using (go env)?

OS X 10.11.6, and compiling my code as darwin/386 because I need a 32-bit shared library.

What did you do?

I've got the following package that's a wrapper for libz, which I've been using for years without issue. Once upgrading to Go 1.8, I'm getting runtime panics, due to aggressive cgo checks. The panic happens on line 91 (marked in comments) where I try to make sure that I don't leave a Go pointer to p in C memory after the Read function ends. The funny thing is, if I comment this line out I don't have any problems, even though I'm technically violating the rules.

Another thing, I've noticed is that if I set the buffer length on line 98 (marked in comments) to one less than the actual size of the buffer I don't have a problem. What I suspect is happening is that zlib is incrementing next_out after reading. If it's read the entire buffer, that pointer is now pointing to memory right after the go slice. If I set the length of the buffer to be one byte short, then that pointer is still valid memory.

Either way, we are talking about pointers on the C heap, so I don't know why I'm getting a panic when setting that pointer to NULL. Commenting out line 91 also fixes the issue.

package zlib

// #cgo !windows LDFLAGS: -lz
// #cgo windows LDFLAGS: -l:libz.a
// #include <zlib.h>
// #include <stdlib.h>
//
// // inflateInit is a macro so we need to do this
// int InflateInit(z_streamp s) {
//     return inflateInit(s);
// }
//
// // We do the zstream allocation in C, so that the Go
// // GC doesn't inspect the pointers inside that are
// // allocated by zlib and panic.
// z_streamp new_zstream() {
//     return (z_streamp) calloc(sizeof(z_stream), 1);
// }
import "C"

import (
	"errors"
	"fmt"
	"io"
	"sync"
	"unsafe"
)

type reader struct {
	r  io.Reader
	in []byte
	s  C.z_streamp
}

const defaultInputBufferSize = 32 * 1024 // 32 KiB

var (
	bufPool = sync.Pool{
		New: func() interface{} { return make([]byte, defaultInputBufferSize) },
	}
)

// NewReader creates a new ReadCloser. Reads from the returned ReadCloser
// read and decompress data from r. The implementation buffers input and
// may read more data than necessary from r. It is the caller's
// responsibility to call Close on the ReadCloser when done.
func NewReader(r io.Reader) (io.ReadCloser, error) {
	rd := &reader{
		r:  r,
		s:  C.new_zstream(),
		in: bufPool.Get().([]byte),
	}

	err := C.InflateInit(rd.s)
	if err != C.Z_OK {
		return nil, errors.New("Could not init inflate.")
	}

	return rd, nil
}

func (r *reader) Read(p []byte) (n int, err error) {
	if len(p) == 0 {
		return 0, nil
	}

	defer func() {
		// We have to make sure we don't leave a stale go
		// pointer in C memory after we return
		r.s.next_out = nil  // <-- line 91 this is where the runtime panic happens
	}()

	var read int
	var ret C.int
	for ret != C.Z_STREAM_END {
		out := p[read:]
		r.s.avail_out = C.uInt(len(out))
		r.s.next_out = (*C.Bytef)(unsafe.Pointer(&out[0])) // <-- line 98

		if r.s.avail_in == 0 {
			// No leftover bytes from the last call.

			// Let's read some from the reader.
			nIn, err := r.r.Read(r.in)
			if err != nil && err != io.EOF {
				return read, err
			}

			if nIn == 0 && err == io.EOF {
				// It's too soon for us to have reached the
				// end of input
				return read, io.ErrShortBuffer
			}

			r.s.avail_in = C.uInt(nIn)
			r.s.next_in = (*C.Bytef)(unsafe.Pointer(&r.in[0]))
		}

		ret = C.inflate(r.s, C.Z_NO_FLUSH)

		switch ret {
		case C.Z_NEED_DICT:
			ret = C.Z_DATA_ERROR
			fallthrough
		case C.Z_DATA_ERROR, C.Z_MEM_ERROR:
			return read, fmt.Errorf("Could not inflate input (%s): %s", C.GoString(C.zError(ret)), C.GoString(r.s.msg))
		}

		read += len(out) - int(r.s.avail_out)

		if r.s.avail_out == 0 {
			break
		}
	}

	if ret == C.Z_STREAM_END {
		return read, io.EOF
	}

	return read, nil
}

// Close implements the io.Closer interface
//
// Calling Close does not close the wrapped io.Reader originally passed to NewReader
func (r *reader) Close() error {
	if r.s.avail_in != 0 {
		// We've read more than we needed.  Let's backup if we can.
		if rs, ok := r.r.(io.Seeker); ok {
			rs.Seek(-int64(r.s.avail_in), io.SeekCurrent)
		}
	}

	r.s.next_in = nil

	err := C.inflateEnd(r.s)
	if err != C.Z_OK {
		return errors.New("Could not end inflate")
	}

	C.free(unsafe.Pointer(r.s))

	bufPool.Put(r.in)
	r.s = nil
	r.in = nil

	return nil
}

What did you see instead?

runtime: pointer 0x14e72000 to unallocated span idx=0xa739 span.base()=0x14e72000 span.limit=0x14bc6000 span.state=3
fatal error: found bad pointer in Go heap (incorrect use of unsafe or cgo?)

runtime stack:
runtime.throw(0x21253fe, 0x3e)
	/Users/joe/src/go/src/runtime/panic.go:596 +0x76
runtime.heapBitsForObject(0x14e72000, 0x0, 0x0, 0x13550dff, 0x0, 0x23a57b8, 0x14a2092c, 0x0)
	/Users/joe/src/go/src/runtime/mbitmap.go:433 +0x25d
runtime.shade(0x14e72000)
	/Users/joe/src/go/src/runtime/mgcmark.go:1340 +0x2f
runtime.gcmarkwb_m(0x32129bc, 0x0)
	/Users/joe/src/go/src/runtime/mbarrier.go:154 +0xb5
runtime.writebarrierptr_prewrite1.func1()
	/Users/joe/src/go/src/runtime/mbarrier.go:188 +0x4c
runtime.systemstack(0x22e6800)
	/Users/joe/src/go/src/runtime/asm_386.s:337 +0x52
runtime.mstart()
	/Users/joe/src/go/src/runtime/proc.go:1132

goroutine 41 [running]:
runtime.systemstack_switch()
	/Users/joe/src/go/src/runtime/asm_386.s:291 fp=0x14ca0470 sp=0x14ca046c
runtime.writebarrierptr_prewrite1(0x32129bc, 0x0)
	/Users/joe/src/go/src/runtime/mbarrier.go:189 +0x8c fp=0x14ca048c sp=0x14ca0470
runtime.writebarrierptr(0x32129bc, 0x0)
	/Users/joe/src/go/src/runtime/mbarrier.go:211 +0x38 fp=0x14ca04a4 sp=0x14ca048c
go4n6/compress/zlib.(*reader).Read.func1(0x14afec20)
	/Users/joe/src/repo004/GOPATH/src/go4n6/compress/zlib/zlib.go:91 +0x40 fp=0x14ca04b0 sp=0x14ca04a4
go4n6/compress/zlib.(*reader).Read(0x14afec20, 0x14e6a000, 0x8000, 0x8000, 0x8000, 0x22c95a0, 0x14a0e030)
...
@jtsylve jtsylve changed the title cgo: cgo: found bad pointer in Go heap (incorrect use of unsafe or cgo?) Feb 17, 2017
@bradfitz
Copy link
Contributor

Are you using Go 1.7 or Go 1.8? Your bug report is unclear.

/cc @ianlancetaylor

@jtsylve
Copy link
Contributor Author

jtsylve commented Feb 17, 2017

1.8, sorry. I'll edit the description. I've also tested against tip and it shows the same issue. It worked fine with 1.7.5.

@ianlancetaylor ianlancetaylor changed the title cgo: found bad pointer in Go heap (incorrect use of unsafe or cgo?) runtime: found bad pointer in Go heap (incorrect use of unsafe or cgo?) Feb 17, 2017
@ianlancetaylor
Copy link
Contributor

The error mentions cgo because that is a common cause of this kind of problem, but this is not a cgo check as such. This is an error from the garbage collector, telling you that it found a dangling pointer. Specifically the garbage collector came across a pointer with the value 0x14e72000, which points to an unallocated span.

I will look more later, have to go now.

@jtsylve
Copy link
Contributor Author

jtsylve commented Feb 17, 2017

I just don't understand how it's possible that r.s.next_out = nil causes garbage collection. It's overwriting a *C.Bytef that exists on the C heap. Hopefully you notice something I haven't when you get more time.

@ianlancetaylor
Copy link
Contributor

I think the problem is that your next_out field is pointing to a value on the Go heap, and the C code is advancing it to point past the Go heap value. In Go 1.8 the garbage collector has changed so that when you change the value of a pointer field, the old pointer is "shaded". In this case the C code has modified that old pointer so that it is no longer a valid Go pointer, since Go pointers, unlike C pointers, are not permitted to point past the end of an object. The garbage collector is now detecting that fact, and is complaining.

As far as I can tell your code is actually technically invalid according to the cgo rules, because you are storing a Go pointer in C memory. That is not permitted. It's not exactly what is causing this problem, though.

The best fix for this problem is probably going to be for you to write a little wrapper that receives a Go pointer, stores it in the next_out field (it's OK to store a Go pointer into memory during a call if the pointer was an argument to the C function), calls inflate, NULLs out the next_out field (since after you return from the call the Go pointer should no longer be in C memory), and then returns.

I'm going to close this because I don't think there is anything we can do on the Go side to help. Please comment if you disagree.

@jtsylve
Copy link
Contributor Author

jtsylve commented Feb 17, 2017

The best fix for this problem is probably going to be for you to write a little wrapper that receives a Go pointer, stores it in the next_out field (it's OK to store a Go pointer into memory during a call if the pointer was an argument to the C function), calls inflate, NULLs out the next_out field (since after you return from the call the Go pointer should no longer be in C memory), and then returns.

That's exactly what this function is doing. It's passing a Go pointer to the next_out field, calling inflate, and then before the function exits, it's NULLing out the pointer (in the defer). Ironically, if I remove the NULLing out part, then I don't get GC problems.

@jtsylve
Copy link
Contributor Author

jtsylve commented Feb 17, 2017

And yes, 0x14e72000 is the value of memory right after the p buffer, but that pointer only exists on the C heap and is overwritten after the call to inflate.

@jtsylve
Copy link
Contributor Author

jtsylve commented Feb 17, 2017

I'm sorry, I misunderstood. You want me to make the wrapper in C, not Go, so that there's no possibility of C holding on to the Go pointer after the function call to the wrapper. I've just given that a shot and it seems to work. Still no idea why this is only triggered if I nulled out that pointer, though.

@jtsylve jtsylve closed this as completed Feb 17, 2017
@ianlancetaylor
Copy link
Contributor

It's triggered because when you write nil to the pointer, the Go 1.8 write barrier is looking at the old value of the pointer and seeing that the old value is not a valid Go pointer. Details of the Go 1.8 write barrier are at https://github.com/golang/proposal/blob/master/design/17503-eliminate-rescan.md .

@jtsylve
Copy link
Contributor Author

jtsylve commented Feb 17, 2017

Very interesting read. Why does Go need to worry about GC at all for memory that's not allocated by Go? In my example the memory in question exists on the C heap due to the allocation from calloc.

@ianlancetaylor
Copy link
Contributor

Yes, the memory being changed is in the C heap. But the value being changed, the value stored in that memory, is a Go pointer. On line 98 you stored a Go pointer--a pointer to memory allocated by the Go memory allocator--into memory allocated by the C allocator. As I said earlier, that in itself is invalid: Go code is not permitted to store a Go pointer in C memory, even temporarily. But nothing catches that (I think it would be caught if you set GODEBUG=cgocheck=2 in the environment).

Then later, while in this invalid state, you store nil into the same C memory. Storing nil in C memory is fine. But this is where the Go 1.8 write barrier kicks in: the write barrier doesn't care whether you are storing a pointer in C memory or not. The write barrier assumes that the program is following the rules, and never stores a Go pointer in C memory. When the write barrier looks at the old value of the pointer, and sees that it is a Go pointer, it doesn't do any extra check that that Go pointer is living in C memory, because that would be invalid anyhow. Instead, in this case, it notices that it has a Go pointer that is dangling. And that is when the program crashes.

@jtsylve
Copy link
Contributor Author

jtsylve commented Feb 17, 2017

This has been very enlightening. Thanks for your time!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants