Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gccgo: cgo: false positive cgo argument has Go pointer to Go pointer #23391

Closed
tamird opened this issue Jan 9, 2018 · 44 comments
Closed

gccgo: cgo: false positive cgo argument has Go pointer to Go pointer #23391

tamird opened this issue Jan 9, 2018 · 44 comments
Milestone

Comments

@tamird
Copy link
Contributor

tamird commented Jan 9, 2018

What version of Go are you using (go version)?

go version go1.9 gccgo (GCC) 8.0.0 20180108 (experimental) linux/amd64

Description

The following code exists in CockroachDB:
https://github.com/cockroachdb/cockroach/blob/4e6ec545cc65dc0051a797996e9642b4b5e8bdc2/pkg/storage/engine/rocksdb.go#L447-L450

https://github.com/cockroachdb/cockroach/blob/4e6ec545cc65dc0051a797996e9642b4b5e8bdc2/pkg/storage/engine/rocksdb.go#L584

https://github.com/cockroachdb/cockroach/blob/4e6ec545cc65dc0051a797996e9642b4b5e8bdc2/c-deps/libroach/db.cc#L1616

Said plainly, initialization across cgo is done by passing a Go **C.DBEngine which points to nil to a C function that allocates and sets the pointed-to *C.DBEngine. This code works perfectly when compiled with gc, but produces this error when compiled with gccgo:

panic: runtime error: cgo argument has Go pointer to Go pointer

goroutine 24 [running]:
panic
        ../../../src/libgo/go/runtime/panic.go:543
github_com_cockroachdb_cockroach_pkg_storage_engine._cgoCheckPointer
        /usr/local/home/tduberstein/go/src/github.com/cockroachdb/cockroach/pkg/storage/engine/cgo-c-prolog-gccgo:74
engine.$nested21
        /usr/local/home/tduberstein/go/src/github.com/cockroachdb/cockroach/pkg/storage/engine/rocksdb.go:584
github_com_cockroachdb_cockroach_pkg_storage_engine.open.pN59_github_com_cockroachdb_cockroach_pkg_storage_engine.RocksDB
        /usr/local/home/tduberstein/go/src/github.com/cockroachdb/cockroach/pkg/storage/engine/rocksdb.go:584
engine.newMemRocksDB
        /usr/local/home/tduberstein/go/src/github.com/cockroachdb/cockroach/pkg/storage/engine/rocksdb.go:518
github_com_cockroachdb_cockroach_pkg_storage_engine.NewInMem
        /usr/local/home/tduberstein/go/src/github.com/cockroachdb/cockroach/pkg/storage/engine/in_mem.go:37
github_com_cockroachdb_cockroach_pkg_server.CreateEngines.pN50_github_com_cockroachdb_cockroach_pkg_server.Config
        /usr/local/home/tduberstein/go/src/github.com/cockroachdb/cockroach/pkg/server/config.go:483
github_com_cockroachdb_cockroach_pkg_server.Start.pN50_github_com_cockroachdb_cockroach_pkg_server.Server
        /usr/local/home/tduberstein/go/src/github.com/cockroachdb/cockroach/pkg/server/server.go:858
cli.$nested72
        /usr/local/home/tduberstein/go/src/github.com/cockroachdb/cockroach/pkg/cli/start.go:518
cli.$nested70
        /usr/local/home/tduberstein/go/src/github.com/cockroachdb/cockroach/pkg/cli/start.go:500
created by cli.runStart
        /usr/local/home/tduberstein/go/src/github.com/cockroachdb/cockroach/pkg/cli/start.go:474 +3255

According to https://golang.org/cmd/cgo/#hdr-Passing_pointers:

Go code may pass a Go pointer to C provided the Go memory to which it points does not contain any Go pointers.

It's not exactly clear if nil Go pointers are permitted, but the gc implementation certainly permits it. Perhaps gccgo is missing a null check?

@ianlancetaylor

@gopherbot gopherbot added this to the Gccgo milestone Jan 9, 2018
@ianlancetaylor
Copy link
Contributor

Can you show us a way to reproduce the problem locally?

@tamird
Copy link
Contributor Author

tamird commented Jan 26, 2018

Oddly, I can't reproduce this after rebasing my gccgo work on CockroachDB's latest master branch.

Nominally, the repro instructions are roughly the same as #23393 (comment), except an on-disk store must be used (./cockroach start --insecure instead of ./cockroach start --insecure -s type=mem,size=1GiB).

@tamird
Copy link
Contributor Author

tamird commented Jan 29, 2018

Ah, this reproduces occasionally using

cd $(go env GOPATH)/src/github.com/cockroachdb/cockroach
make stress PKG=./pkg/cli TESTS=TestZip

when go is gccgo (note the use of stress above, it takes a few runs).

@tamird
Copy link
Contributor Author

tamird commented Jan 31, 2018

@thanm care to look into this one, since you already have the necessary local setup?

@thanm
Copy link
Contributor

thanm commented Jan 31, 2018

I'll take a look later this morning

@tamird
Copy link
Contributor Author

tamird commented Jan 31, 2018

Did a bit of digging under gdb:

(gdb) bt
#0  runtime.inHeapOrStack (b=842356777072, b@entry=140737205361048) at ../../../src/libgo/go/runtime/mheap.go:418
#1  runtime.cgoIsGoPointer (p=p@entry=0xc420606070) at ../../../src/libgo/go/runtime/cgocall.go:272
#2  0x00007ffff6d02a30 in runtime.cgoIsGoPointer (p=0xc420606070) at ../../../src/libgo/go/runtime/cgocall.go:268
#3  runtime.cgoCheckPointer (ptr=..., args=...) at ../../../src/libgo/go/runtime/cgocall.go:64
#4  0x0000000000f7ab07 in github_com_cockroachdb_cockroach_pkg_storage_engine._cgoCheckPointer (ptr=..., args=...) at cgo-c-prolog-gccgo:74
#5  0x0000000000f3ee19 in engine.func1 (_cgo0=0xc420606070, _cgo1=..., _cgo2=...)
    at /home/tduberstein/local/go/src/github.com/cockroachdb/cockroach/pkg/storage/engine/rocksdb.go:584
#6  0x0000000000f3ea4e in github_com_cockroachdb_cockroach_pkg_storage_engine.RocksDB.open (r=0xc420606000)
    at /home/tduberstein/local/go/src/github.com/cockroachdb/cockroach/pkg/storage/engine/rocksdb.go:584
#7  0x0000000000f3df89 in engine.newMemRocksDB (attrs=..., param=..., MaxSizeBytes=536870912)
    at /home/tduberstein/local/go/src/github.com/cockroachdb/cockroach/pkg/storage/engine/rocksdb.go:518
#8  0x0000000000e31e9a in github_com_cockroachdb_cockroach_pkg_storage_engine.NewInMem (attrs=..., cacheSize=1048576)
    at /home/tduberstein/local/go/src/github.com/cockroachdb/cockroach/pkg/storage/engine/in_mem.go:37
#9  0x0000000000e501aa in engine.testBatchBasics (t=0xc420b1e0f0, writeOnly=false,
    commit=0x1e43840 <github_com_cockroachdb_cockroach_pkg_storage_engine.TestBatchBasics..func1..f>)
    at /home/tduberstein/local/go/src/github.com/cockroachdb/cockroach/pkg/storage/engine/batch_test.go:56
#10 0x0000000000e5296a in github_com_cockroachdb_cockroach_pkg_storage_engine.TestBatchBasics (t=0xc420b1e0f0)
    at /home/tduberstein/local/go/src/github.com/cockroachdb/cockroach/pkg/storage/engine/batch_test.go:132
#11 0x00007ffff6d914f0 in testing.tRunner (param=<optimized out>, fn=<optimized out>) at ../../../src/libgo/go/testing/testing.go:777
#12 0x00007ffff6e21d25 in __morestack () at ../../../src/libgcc/config/i386/morestack.S:546
#13 0x00007ffff6d915eb in testing.testing..thunk24 (__go_thunk_parameter=<optimized out>) at ../../../src/libgo/go/testing/testing.go:824
#14 0x00007ffff6d19ea9 in runtime.kickoff () at ../../../src/libgo/go/runtime/proc.go:1161
#15 0x00007ffff59f2d40 in ?? () from /lib64/libc.so.6
#16 0x0000000000000000 in ?? ()

The interesting stuff is happening in frame 6 (Go function passing a pointer) and frames 3 and below (Go functions checking that pointer).

(gdb) frame 6
#6  0x0000000000f3ea4e in github_com_cockroachdb_cockroach_pkg_storage_engine.RocksDB.open (r=0xc420606000)
    at /home/tduberstein/local/go/src/github.com/cockroachdb/cockroach/pkg/storage/engine/rocksdb.go:584
584             status := C.DBOpen(&r.rdb, goToCSlice([]byte(r.cfg.Dir)),
(gdb) print r.rdb
$6 = (.github.com/cockroachdb/cockroach/pkg/storage/engine._Ctype_struct_DBEngine *) 0x0

So we are passing a pointer to a null pointer.

(gdb) frame 3
#3  runtime.cgoCheckPointer (ptr=..., args=...) at ../../../src/libgo/go/runtime/cgocall.go:64
64                      if !cgoIsGoPointer(p) {
(gdb) list
59              if len(args) > 0 && (t.kind&kindMask == kindPtr || t.kind&kindMask == kindUnsafePointer) {
60                      p := ep.data
61                      if t.kind&kindDirectIface == 0 {
62                              p = *(*unsafe.Pointer)(p)
63                      }
64                      if !cgoIsGoPointer(p) {
65                              return
66                      }
67                      aep := (*eface)(unsafe.Pointer(&args[0]))
68                      switch aep._type.kind & kindMask {
(gdb) print ep.data
$10 = (Pointer) 0xc420606070
(gdb) print p
$11 = (void *) 0xc420606070
(gdb) print t.kind&(1<<5) == 0
$14 = 0
(gdb) x 0xc420606070
0xc420606070:   0x00000000

This is weird. We're skipping the dereference p = *(*unsafe.Pointer)(p), so we're going to check the **C.DBEngine instead of the *C.DBEngine, and that causes the failure. As expected, the memory pointed to by the **C.DBEngine is nil.

So, somehow the **C.DBEngine got compiled into a kindDirectIface type so the compiler thinks "the data pointer field in the interface value is exactly the value stored in the interface". That same comment goes on:

// In the current gccgo implementation, this is set
// only for pointer types (including unsafe.Pointer). In the future it
// could also be set for other types: channels, maps, functions,
// single-field structs and single-element arrays whose single field
// is simply a pointer.

In other words, gccgo is setting kindDirectIface for this type, which appears to be in violation of assumptions made when writing cgoCheckPointer (https://golang.org/cl/16003).

@ianlancetaylor
Copy link
Contributor

Where you print t.kind, can you print all of t, and also *t.string? Thanks.

@tamird
Copy link
Contributor Author

tamird commented Jan 31, 2018

(gdb) print t
$1 = (.runtime._type *) 0x1e56d60 <type...1.1github_com_cockroachdb_cockroach_pkg_storage_engine._Ctype_struct_DBEngine>
(gdb) print *t.string
$2 = 0x1e25f78 "**\tgithub_com_cockroachdb_cockroach_pkg_storage_engine\tengine._Ctype_struct_DBEngine"

@ianlancetaylor
Copy link
Contributor

Thanks. I'm not seeing what is wrong here. It's correct that kindDirectIface is set. It's correct that it is looking at p == &r.rdb. At this point it's determining, correctly, that p is a Go pointer. It's OK to pass a Go pointer to a cgo call. What is not OK is passing a Go pointer that points to a Go pointer. So the question is: what happens next? Somewhere after this it decides that p points to memory that contains a Go pointer. What are the steps that get there?

@tamird
Copy link
Contributor Author

tamird commented Jan 31, 2018

Somewhere after this it decides that p points to memory that contains a Go pointer. What are the steps that get there?

My reading of the code is that it decides that p points to Go memory. In other words, this dereference is intended to grab the pointed-to memory, but because it is skipped, the subsequent call to cgoIsGoPointer correctly returns true.

https://github.com/golang/gofrontend/blob/65eaa9003db4effc9c5ffe9c955e9534ba5d7d15/libgo/go/runtime/cgocall.go#L262-L289

@ianlancetaylor
Copy link
Contributor

In this case, p, which has been passed as &r.rdb, does point to Go memory. You are saying that that is wrong, but as far as I can see it is right. It's not a problem for p to point to Go memory. It's only a problem if the memory to which p points itself contains a pointer to Go memory.

@ianlancetaylor
Copy link
Contributor

To put it another way, where is the error reported? It's not reported by the code you discussed above.

@tamird
Copy link
Contributor Author

tamird commented Feb 1, 2018

OK, here's the actual panic backtrace:

#0  runtime.gopanic (e=...) at ../../../src/libgo/go/runtime/panic.go:498
#1  0x00007ffff6d0114b in runtime.cgoCheckUnknownPointer (msg=..., p=<optimized out>) at ../../../src/libgo/go/runtime/cgocall.go:238
#2  runtime.cgoCheckArg (t=<optimized out>, p=<optimized out>, indir=indir@entry=true, top=top@entry=true, msg=...)
    at ../../../src/libgo/go/runtime/cgocall.go:206
#3  0x00007ffff6d0124c in runtime.cgoCheckArg (t=<optimized out>, p=<optimized out>, indir=indir@entry=true, top=top@entry=true, msg=...)
    at ../../../src/libgo/go/runtime/cgocall.go:192
#4  0x00007ffff6d0124c in runtime.cgoCheckArg (
    t=t@entry=0x1e63960 <github_com_cockroachdb_cockroach_pkg_storage_engine..github_com_cockroachdb_cockroach_pkg_storage_engine._Ctype_struct___7..d>, p=<optimized out>, indir=<optimized out>, top=<optimized out>, msg=...) at ../../../src/libgo/go/runtime/cgocall.go:192
#5  0x00007ffff6d015bb in runtime.cgoCheckPointer (ptr=..., args=...) at ../../../src/libgo/go/runtime/cgocall.go:93
#6  0x0000000000f782d7 in github_com_cockroachdb_cockroach_pkg_storage_engine._cgoCheckPointer (ptr=..., args=...) at cgo-c-prolog-gccgo:74
#7  0x0000000000f3c4d4 in engine.func1 (_cgo0=0xc420a3c5f0, _cgo1=..., _cgo2=...)
    at /home/tduberstein/local/go/src/github.com/cockroachdb/cockroach/pkg/storage/engine/rocksdb.go:584
#8  0x0000000000f3c040 in github_com_cockroachdb_cockroach_pkg_storage_engine.RocksDB.open (r=0xc420a3c580)
    at /home/tduberstein/local/go/src/github.com/cockroachdb/cockroach/pkg/storage/engine/rocksdb.go:584
#9  0x0000000000f3b57b in engine.newMemRocksDB (attrs=..., param=..., MaxSizeBytes=536870912)
    at /home/tduberstein/local/go/src/github.com/cockroachdb/cockroach/pkg/storage/engine/rocksdb.go:518
#10 0x0000000000e2f30e in github_com_cockroachdb_cockroach_pkg_storage_engine.NewInMem (attrs=..., cacheSize=1048576)
    at /home/tduberstein/local/go/src/github.com/cockroachdb/cockroach/pkg/storage/engine/in_mem.go:37
#11 0x0000000000e4d84f in engine.testBatchBasics (t=0xc420b180f0, writeOnly=false, 
    commit=0x1e521a0 <github_com_cockroachdb_cockroach_pkg_storage_engine.TestBatchBasics..func1..f>)
    at /home/tduberstein/local/go/src/github.com/cockroachdb/cockroach/pkg/storage/engine/batch_test.go:56
#12 0x0000000000e5000f in github_com_cockroachdb_cockroach_pkg_storage_engine.TestBatchBasics (t=0xc420b180f0)
    at /home/tduberstein/local/go/src/github.com/cockroachdb/cockroach/pkg/storage/engine/batch_test.go:132
#13 0x00007ffff6d900d0 in testing.tRunner (param=<optimized out>, fn=<optimized out>) at ../../../src/libgo/go/testing/testing.go:777
#14 0x00007ffff6e20975 in __morestack () at ../../../src/libgcc/config/i386/morestack.S:546
#15 0x00007ffff6d901cb in testing.testing..thunk24 (__go_thunk_parameter=<optimized out>) at ../../../src/libgo/go/testing/testing.go:824
#16 0x00007ffff6d18a79 in runtime.kickoff () at ../../../src/libgo/go/runtime/proc.go:1161
#17 0x00007ffff59f0d40 in ?? () from /lib64/libc.so.6

confirming that the passed pointer points to a null pointer:

(gdb) frame 8
#8  0x0000000000f3c040 in github_com_cockroachdb_cockroach_pkg_storage_engine.RocksDB.open (r=0xc420a3c580)
    at /home/tduberstein/local/go/src/github.com/cockroachdb/cockroach/pkg/storage/engine/rocksdb.go:584
584             status := C.DBOpen(&r.rdb, goToCSlice([]byte(r.cfg.Dir)),
(gdb) print r.rdb
$4 = (.github.com/cockroachdb/cockroach/pkg/storage/engine._Ctype_struct_DBEngine *) 0x0

interesting stuff starts at frame 5:

(gdb) frame 5
#5  0x00007ffff6d015bb in runtime.cgoCheckPointer (ptr=..., args=...) at ../../../src/libgo/go/runtime/cgocall.go:93
93              cgoCheckArg(t, ep.data, t.kind&kindDirectIface == 0, top, cgoCheckPointerFail)
(gdb) list
88                      default:
89                              throw("can't happen")
90                      }
91              }
92
93              cgoCheckArg(t, ep.data, t.kind&kindDirectIface == 0, top, cgoCheckPointerFail)
94      }
95
96      const cgoCheckPointerFail = "cgo argument has Go pointer to Go pointer"
97      const cgoResultFail = "cgo result has Go pointer"

things we care about are arguments to cgoCheckArg:

(gdb) print t
$9 = (.runtime._type *) 0x1e645a0 <github_com_cockroachdb_cockroach_pkg_storage_engine..github_com_cockroachdb_cockroach_pkg_storage_engine._Ctype_struct___7..d>
(gdb) print ep.data
value has been optimized out
(gdb) print t.kind&(1<<5)
$10 = 0
(gdb) print top
$11 = <optimized out>

so we don't know the value of top, but it's always true if len(args) == 0:

(gdb) print args
$12 = {__values = <optimized out>, __count = 0, __capacity = <optimized out>}

interestingly, t is different than what I mistakenly thought it was previously:

(gdb) print *t.string
$13 = 0x1e34470 "\tgithub_com_cockroachdb_cockroach_pkg_storage_engine\tengine._Ctype_struct___7"

next frame:

(gdb) frame 4
#4  0x00007ffff6d0124c in runtime.cgoCheckArg (
    t=t@entry=0x1e645a0 <github_com_cockroachdb_cockroach_pkg_storage_engine..github_com_cockroachdb_cockroach_pkg_storage_engine._Ctype_struct___7..d>, p=<optimized out>, indir=<optimized out>, top=<optimized out>, msg=...) at ../../../src/libgo/go/runtime/cgocall.go:192
192                             cgoCheckArg(f.typ, add(p, f.offset), true, top, msg)
(gdb) list
187                             }
188                             cgoCheckArg(st.fields[0].typ, p, st.fields[0].typ.kind&kindDirectIface == 0, top, msg)
189                             return
190                     }
191                     for _, f := range st.fields {
192                             cgoCheckArg(f.typ, add(p, f.offset), true, top, msg)
193                     }
194             case kindPtr, kindUnsafePointer:
195                     if indir {
196                             p = *(*unsafe.Pointer)(p)

so this type is a struct (that's where we are in the type switch). we know indir is true from the previous frame, so we go straight into this range. unfortunately, f is optimized out and the next few frames have been inlined. frame 2:

(gdb) frame 2
#2  runtime.cgoCheckArg (t=<optimized out>, p=<optimized out>, indir=indir@entry=true, top=top@entry=true, msg=...)
    at ../../../src/libgo/go/runtime/cgocall.go:206
206                     cgoCheckUnknownPointer(p, msg)
(gdb) list
201                     }
202                     if !top {
203                             panic(errorString(msg))
204                     }
205
206                     cgoCheckUnknownPointer(p, msg)
207             }
208     }
209
210     // cgoCheckUnknownPointer is called for an arbitrary pointer into Go

this time we land in the kindPtr, kindUnsafePointer case, and:

(gdb) print indir
$27 = true
(gdb) print top
$28 = true

so we execute p = *(*unsafe.Pointer)(p) (it's just above the code pasted above) and proceed into cgoCheckUnknownPointer:

(gdb) frame 1
#1  0x00007ffff6d0114b in runtime.cgoCheckUnknownPointer (msg=..., p=<optimized out>) at ../../../src/libgo/go/runtime/cgocall.go:238
238                                     panic(errorString(msg))
(gdb) list
233                             if i != 1*sys.PtrSize && !hbits.morePointers() {
234                                     // No more possible pointers.
235                                     break
236                             }
237                             if hbits.isPointer() && cgoIsGoPointer(*(*unsafe.Pointer)(unsafe.Pointer(base + i))) {
238                                     panic(errorString(msg))
239                             }
240                             hbits = hbits.next()
241                     }
242

this is what's surpring. at this point, after the dereference, I would expect to end up with the null pointer we started with, but that's obviously not the case. the actual value of p is optimized out, but obviously cgoInRange returned true for it. there's more dereferencing going on here which I don't understand. Here's what information remains in this stack frame:

(gdb) info locals
span = 0x7ffff5658c88
b = 842351250160
hbits = {bitp = 0xc41fff9fa8 "Y\360\363\322t\360\374Y\360\363", <incomplete sequence \322>, shift = 3}
sink$0 = 25
n = 112
roots = <optimized out>
base = 842351250160
i = <optimized out>

let me know if there's more detail I can provide.

@ianlancetaylor
Copy link
Contributor

I didn't notice that you weren't showing the full function call. Looking at the source code, it's

	status := C.DBOpen(&r.rdb, goToCSlice([]byte(r.cfg.Dir)),
		C.DBOptions{
			cache:             r.cache.cache,
			block_size:        C.uint64_t(blockSize),
			wal_ttl_seconds:   C.uint64_t(walTTL),
			logging_enabled:   C.bool(log.V(3)),
			num_cpu:           C.int(runtime.NumCPU()),
			max_open_files:    C.int(maxOpenFiles),
			use_switching_env: C.bool(newVersion == versionCurrent),
			must_exist:        C.bool(r.cfg.MustExist),
			extra_options:     goToCSlice(r.cfg.ExtraOptions),
		})

The type passed to cgoCheckArg is _Ctype_struct___7. Building the package and looking at the generated _cgo_gotypes.go file shows type _Ctype_DBOptions = _Ctype_struct___7 and

type _Ctype_struct___7 struct {
	cache			*_Ctype_struct_DBCache
	block_size		_Ctype_uint64_t
	wal_ttl_seconds		_Ctype_uint64_t
	logging_enabled		_Ctype__Bool
	_			[3]byte
	num_cpu			_Ctype_int
	max_open_files		_Ctype_int
	use_switching_env	_Ctype__Bool
	must_exist		_Ctype__Bool
	_			[2]byte
	extra_options		_Ctype_struct___0
}

So the problem is not the &r.rdb argument; it's the C.DBOptions argument.

The stack trace shows it looking at this struct in frame 4, and then looking at another struct in frame 3. That suggests that the second struct is the extra_options field. The type of that field is

type _Ctype_struct___0 struct {
	data	*_Ctype_char
	len	_Ctype_int
	_	[4]byte
}

In frame 2 we are looking at a pointer, which in this case must be the data field. We're passing a struct of type C.DBOptions which has a field of type C.DBSlice. The data field of the latter struct is a Go pointer, and the runtime is complaining because it seems to be pointing to an allocated block that itself contains a Go pointer.

Looking back at the Go code, the field is initialized by a call to goToCSlice. That just passes a pointer to the start of a []byte. Normally that should be OK, but in this case we don't know what else might be in the same memory block as that []byte. If that []byte is allocated in the same memory region as something else that contains a Go pointer, then the runtime will report an error (this is #14210). I don't know if that is what is happening here, but it's something to check.

Can you find out what value is in r.cfg.ExtraOptions at the point of failure?

@tamird
Copy link
Contributor Author

tamird commented Feb 2, 2018

It's an empty a nil slice.

(gdb) frame 8
#8  0x0000000000eff807 in github_com_cockroachdb_cockroach_pkg_storage_engine.RocksDB.open (r=0xc4205bc000)
    at /home/tduberstein/local/go/src/github.com/cockroachdb/cockroach/pkg/storage/engine/rocksdb.go:585
585             status := C.DBOpen(&r.rdb, goToCSlice([]byte(r.cfg.Dir)),
(gdb) print r.cfg.ExtraOptions
$3 = {__values = 0x0, __count = 0, __capacity = 0}

@ianlancetaylor
Copy link
Contributor

Hmmm, OK, clearly there is something wrong with my analysis. There is another pointer in the C.DBOptions value: the cache field is set to r.cache.cache, which is a *C.DBCache value. What does that point to?

@thanm Any luck reproducing this locally?

@tamird
Copy link
Contributor Author

tamird commented Feb 2, 2018

Heh, figured you'd ask for that.

(gdb) print r.cache.cache
$4 = (.github.com/cockroachdb/cockroach/pkg/storage/engine._Ctype_struct_DBCache *) 0x7fffe7c13040
(gdb) print *r.cache.cache
$5 = {<No data fields>}

@tamird
Copy link
Contributor Author

tamird commented Feb 2, 2018

Oh, and the memory at that address is null.

(gdb) x 0x7fffe7c13040
0x7fffe7c13040: 0x00000000

@ianlancetaylor
Copy link
Contributor

Thanks. I think we need to know whether that value (0x7fffe7c13040) is a Go pointer (whether cgoIsGoPointer would return true for it) and, if so, what allocated block it is pointing to. If I'm reading the code correctly, it is set by a call to C.DBNewCache.

@tamird
Copy link
Contributor Author

tamird commented Feb 2, 2018 via email

@ianlancetaylor
Copy link
Contributor

Thanks, but does cgoIsGoPointer return true for it? And is that in fact the value that the runtime package is finding the error for?

@tamird
Copy link
Contributor Author

tamird commented Feb 2, 2018 via email

@ianlancetaylor
Copy link
Contributor

Do you feel like rebuilding libgo without optimization? You should be able to do that by running make GOCFLAGS_FOR_TARGET=-g.

@tamird
Copy link
Contributor Author

tamird commented Feb 2, 2018

Done:

nm /home/tduberstein/local/gcc/gcc/lib64/libgo.so.13 | grep cgoIsGo
00000000012df8a9 t runtime.cgoIsGoPointer

but how do I call it?

(gdb) info functions cgoIsGo
All functions matching regular expression "cgoIsGo":

File ../../../src/libgo/go/runtime/cgocall.go:
static bool runtime.cgoIsGoPointer(Pointer);
(gdb) print cgoIsGoPointer(0x7fffe4a13040)
No symbol "cgoIsGoPointer" in current context.
(gdb) print runtime.cgoIsGoPointer(0x7fffe4a13040)
No symbol "runtime" in current context.

@tamird
Copy link
Contributor Author

tamird commented Feb 2, 2018

Ah, just need to stringify it!

(gdb) print 'runtime.cgoIsGoPointer'(0x7fffe4a13040)
$2 = false

@tamird
Copy link
Contributor Author

tamird commented Feb 2, 2018

Now that nothing is inlined, this is much easier to reason about.
In frame 5 we can see the pointer in question is 0xc420083a00:

(gdb) info args
ptr = {
  __type_descriptor = 0x1e0e040 <github_com_cockroachdb_cockroach_pkg_storage_engine..github_com_cockroachdb_cockroach_pkg_storage_engine._Ctype_struct___7..d>, __object = 0xc420083a00}
args = {__values = 0x0, __count = 0, __capacity = 0}

so that's not r.cache.cache at all. In fact, I can't find this pointer anywhere. Note that C.DBOptions is being passed by value and here we are inspecting a pointer to it. This looks like miscompilation?

@tamird
Copy link
Contributor Author

tamird commented Feb 2, 2018

Here's the cgo stub (reformatted by hand):

status := func(_cgo0 **_Ctype_struct_DBEngine, _cgo1 _Ctype_struct___0, _cgo2 _Ctype_struct___7) _Ctype_struct___1 {
        _cgoCheckPointer(_cgo0, true)
        _cgoCheckPointer(_cgo2)
        return (_Cfunc_DBOpen)(_cgo0, _cgo1, _cgo2)
}(&r.rdb, goToCSlice([]byte(r.cfg.Dir)),
_Ctype_struct___7{
        cache:             r.cache.cache,
        block_size:        _Ctype_uint64_t(blockSize),
        wal_ttl_seconds:   _Ctype_uint64_t(walTTL),
        logging_enabled:   _Ctype__Bool(log.V(3)),
        num_cpu:           _Ctype_int(runtime.NumCPU()),
        max_open_files:    _Ctype_int(maxOpenFiles),
        use_switching_env: _Ctype__Bool(newVersion == versionCurrent),
        must_exist:        _Ctype__Bool(r.cfg.MustExist),
        extra_options:     goToCSlice(r.cfg.ExtraOptions),
})

so the slice isn't being checked, but the DBOptions is. Since cgoCheckPointer takes a interface{}, looks like we're allocating a Go pointer to this structure! Smells like a smoking gun to me.

@ianlancetaylor
Copy link
Contributor

Assigning DBOptions to an interface value, including allocating it, is normal behavior. We aren't going to check that pointer itself; we're going to check the value to which it points. Now that you have a version without optimization, can you find the pointer value that actually causes the problem? The one passed to cgoCheckUnknownPointer when it panics?

@tamird
Copy link
Contributor Author

tamird commented Feb 3, 2018 via email

@ianlancetaylor
Copy link
Contributor

Sorry, no, I am continually interrupted.

Most effective would be some way that I can reproduce the problem myself.

@tamird
Copy link
Contributor Author

tamird commented Feb 5, 2018

(gdb) frame 5
#5  0x00007ffff6cbb3e3 in runtime.cgoCheckPointer (ptr=..., args=...)
    at /usr/local/home/tduberstein/gcc/objdir/../src/libgo/go/runtime/cgocall.go:93
93              cgoCheckArg(t, ep.data, t.kind&kindDirectIface == 0, top, cgoCheckPointerFail)
(gdb) print *ep
$17 = {
  _type = 0x1d8b5c0 <github_com_cockroachdb_cockroach_pkg_storage_engine..github_com_cockroachdb_cockroach_pkg_storage_engine._Ctype_struct___7..d>, data = 0xc420085a00}
(gdb) print args
$18 = {__values = 0x0, __count = 0, __capacity = 0}

the C.DBOptions struct is at 0x1d8b5c0.

(gdb) frame 4
#4  0x00007ffff6cbbe75 in runtime.cgoCheckArg (
    t=0x1d8b5c0 <github_com_cockroachdb_cockroach_pkg_storage_engine..github_com_cockroachdb_cockroach_pkg_storage_engine._Ctype_struct___7..d>, 
    p=0xc420085a00, indir=true, top=true, msg=...) at /usr/local/home/tduberstein/gcc/objdir/../src/libgo/go/runtime/cgocall.go:192
192                             cgoCheckArg(f.typ, add(p, f.offset), true, top, msg)
(gdb) print f
$25 = {name = 0x1d8b5a0 <go..C3252>, pkgPath = 0x1d8b5b0 <go..C3253>, 
  typ = 0x1d6f660 <github_com_cockroachdb_cockroach_pkg_storage_engine..github_com_cockroachdb_cockroach_pkg_storage_engine._Ctype_struct___0..d>, 
  tag = 0x0, offset = 80}
(gdb) print *f.typ.string
$28 = 0x1d50400 "\tgithub_com_cockroachdb_cockroach_pkg_storage_engine\tengine._Ctype_struct___0"

we're inspecting one of the struct's fields, and it has type _Ctype_struct___0:

type _Ctype_struct___0 struct {
        data    *_Ctype_char
        len     _Ctype_int
        _       [4]byte
}

which is also the type of the second argument to C.DBOpen - it's the return value of goToCSlice: C.DBSlice. Must be extra_options, given:

typedef struct {
  DBCache* cache;
  uint64_t block_size;
  uint64_t wal_ttl_seconds;
  bool logging_enabled;
  int num_cpu;
  int max_open_files;
  bool use_switching_env;
  bool must_exist;
  bool read_only;
  DBSlice extra_options;
} DBOptions;
(gdb) frame 3
#3  0x00007ffff6cbbe75 in runtime.cgoCheckArg (
    t=0x1d6f660 <github_com_cockroachdb_cockroach_pkg_storage_engine..github_com_cockroachdb_cockroach_pkg_storage_engine._Ctype_struct___0..d>, 
    p=0xc420085a50, indir=true, top=true, msg=...) at /usr/local/home/tduberstein/gcc/objdir/../src/libgo/go/runtime/cgocall.go:192
192                             cgoCheckArg(f.typ, add(p, f.offset), true, top, msg)
(gdb) print p
$40 = (Pointer) 0xc420085a50
(gdb) print f
$47 = {name = 0x1d6f310 <go..C450>, pkgPath = 0x1d6f320 <go..C451>, 
  typ = 0x1d6f420 <type...1github_com_cockroachdb_cockroach_pkg_storage_engine._Ctype_char>, tag = 0x0, offset = 0}
(gdb) print *f.typ.string
$49 = 0x1d504b8 "*\tgithub_com_cockroachdb_cockroach_pkg_storage_engine\tengine._Ctype_char"

We're checking the first field, which is a *_Ctype_char, and we have a pointer to it.

(gdb) frame 2
#2  0x00007ffff6cbbf74 in runtime.cgoCheckArg (t=0x1d6f420 <type...1github_com_cockroachdb_cockroach_pkg_storage_engine._Ctype_char>, 
    p=0xc4200c0af0, indir=true, top=true, msg=...) at /usr/local/home/tduberstein/gcc/objdir/../src/libgo/go/runtime/cgocall.go:206
206                     cgoCheckUnknownPointer(p, msg)
(gdb) info args
t = 0x1d6f420 <type...1github_com_cockroachdb_cockroach_pkg_storage_engine._Ctype_char>
p = 0xc4200c0af0
indir = true
top = true
msg = 0x7ffff6fbb370 "cgo argument has Go pointer to Go pointer"
(gdb) print p
$51 = (Pointer) 0xc4200c0af0

Now we've deferenced the pointer - so now our p points to the *_Ctype_char itself.

(gdb) frame 1
#1  0x00007ffff6cbc248 in runtime.cgoCheckUnknownPointer (p=0xc4200c0af0, msg=...)
    at /usr/local/home/tduberstein/gcc/objdir/../src/libgo/go/runtime/cgocall.go:238
238                                     panic(errorString(msg))
(gdb) print p
$52 = (Pointer) 0xc4200c0af0

and the next frame panics.

However, checking this pointer is only valid if the slice is not empty, right? Let's check if it is:

(gdb) frame 8
#8  0x0000000000edfc5c in github_com_cockroachdb_cockroach_pkg_storage_engine.RocksDB.open (r=0xc4204a8000)
    at /usr/local/home/tduberstein/go/src/github.com/cockroachdb/cockroach/pkg/storage/engine/rocksdb.go:587
587             status := C.DBOpen(&r.rdb, goToCSlice([]byte(r.cfg.Dir)),
(gdb) print r.cfg.ExtraOptions
$53 = {__values = 0x0, __count = 0, __capacity = 0}

Seems like we're passing a nil slice into goToCSlice. What does that function do?

func goToCSlice(b []byte) C.DBSlice {
        if len(b) == 0 {
                return C.DBSlice{data: nil, len: 0}
        }
        return C.DBSlice{
                data: (*C.char)(unsafe.Pointer(&b[0])),
                len:  C.int(len(b)),
        }
}
(gdb) print 'engine.goToCSlice'(r.cfg.ExtraOptions)
$67 = {data = 0x0, len = 0, _ = "\000\000\000"}

and DBSlice is:

typedef struct {
  char* data;
  int len;
} DBSlice;

So something is going wrong here - we're passing a C.DBSlice with nil data pointer into C.DBOpen, but it's emerging on the other side with some garbage value. What did we get for the length of the slice?

(gdb) frame 3
#3  0x00007ffff6cbbe75 in runtime.cgoCheckArg (
    t=0x1d6f660 <github_com_cockroachdb_cockroach_pkg_storage_engine..github_com_cockroachdb_cockroach_pkg_storage_engine._Ctype_struct___0..d>, 
    p=0xc420085a50, indir=true, top=true, msg=...) at /usr/local/home/tduberstein/gcc/objdir/../src/libgo/go/runtime/cgocall.go:192
192                             cgoCheckArg(f.typ, add(p, f.offset), true, top, msg)
(gdb) print *(st.fields.__values+1)
$64 = {name = 0x1d6f470 <go..C458>, pkgPath = 0x1d6f480 <go..C459>, 
  typ = 0x1d6f560 <github_com_cockroachdb_cockroach_pkg_storage_engine..github_com_cockroachdb_cockroach_pkg_storage_engine._Ctype_int..d>, 
  tag = 0x0, offset = 16}
(gdb) frame 2
#2  0x00007ffff6cbbf74 in runtime.cgoCheckArg (t=0x1d6f420 <type...1github_com_cockroachdb_cockroach_pkg_storage_engine._Ctype_char>, 
    p=0xc4200c0af0, indir=true, top=true, msg=...) at /usr/local/home/tduberstein/gcc/objdir/../src/libgo/go/runtime/cgocall.go:206
206                     cgoCheckUnknownPointer(p, msg)
(gdb) print p
$65 = (Pointer) 0xc4200c0af0
(gdb) x 0xc4200c0af0+16
0xc4200c0b00:   00000000000000000000000000000001

This is looking really suspicious. We passed in nil data and a zero length but it came out the other side with length 1. @ianlancetaylor is this enough to go on?

@ianlancetaylor
Copy link
Contributor

I haven't really looked at the above yet, but I committed a patch on Saturday that could and did lead to corruption of the slice structure in very rare cases.

@tamird
Copy link
Contributor Author

tamird commented Feb 5, 2018

Could you link to the patch?

@ianlancetaylor
Copy link
Contributor

@tamird
Copy link
Contributor Author

tamird commented Feb 5, 2018

Rebuilt and retested - the issue still reproduces.

@ianlancetaylor
Copy link
Contributor

x 0xc4200c0af0+16

Why add 16? The length is at offset 8, and it's only four bytes long.

Not that that explains anything.

Can you go to your frame 7 (engine.func1) and print the value of cgo2? It should be the DBOptions struct. If the pointer is nil there we need to find out where it changes.

@tamird
Copy link
Contributor Author

tamird commented Feb 5, 2018 via email

@tamird
Copy link
Contributor Author

tamird commented Feb 5, 2018 via email

@tamird
Copy link
Contributor Author

tamird commented Feb 6, 2018

@ianlancetaylor cgo2 is not nil in frame 7:

(gdb) frame 7
#7  0x0000000000ee00f0 in engine.func1 (_cgo0=0xc4204a8070, _cgo1=..., _cgo2=...)
    at /usr/local/home/tduberstein/go/src/github.com/cockroachdb/cockroach/pkg/storage/engine/rocksdb.go:587
587             status := C.DBOpen(&r.rdb, goToCSlice([]byte(r.cfg.Dir)),
(gdb) info args
_cgo0 = 0xc4204a8070
_cgo1 = {data = 0x0, len = 0, _ = "\000\000\000"}
_cgo2 = {cache = 0x7fffe4c13040, block_size = 32768, wal_ttl_seconds = 0, logging_enabled = false, _ = "\000\000", num_cpu = 16, 
  max_open_files = 10000, use_switching_env = false, must_exist = false, read_only = false, _ = "", extra_options = {data = 0x0, len = 0, 
    _ = "\000\000\000"}}

@ianlancetaylor
Copy link
Contributor

Thanks. I didn't expect _cgo2 to be nil, since it is a struct, but it does show that the pointer in the extra_options field is nil.

And now I see the problem. The value stored in the offset field has changed and the runtime hasn't kept up. Argh.

@gopherbot
Copy link

Change https://golang.org/cl/92275 mentions this issue: runtime: correct runtime structfield type to match reflect

@ianlancetaylor
Copy link
Contributor

Sent https://golang.org/cl/92275. Thanks for your patience.

@tamird
Copy link
Contributor Author

tamird commented Feb 6, 2018

No problem. Thanks for your help and persistence.

EDIT: for posterity, the mentioned reflect change appears to be 9bbb07d.

hubot pushed a commit to gcc-mirror/gcc that referenced this issue Feb 6, 2018
    
    The offset field in structfield has changed to offsetAnon, and now
    requires a shift to get the actual offset value.
    
    Fixes golang/go#23391
    
    Reviewed-on: https://go-review.googlesource.com/92275


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@257413 138bc75d-0d04-0410-961f-82ee72b054a4
@golang golang locked and limited conversation to collaborators Feb 6, 2019
asiekierka pushed a commit to WonderfulToolchain/gcc-ia16 that referenced this issue May 16, 2022
    
    The offset field in structfield has changed to offsetAnon, and now
    requires a shift to get the actual offset value.
    
    Fixes golang/go#23391
    
    Reviewed-on: https://go-review.googlesource.com/92275

From-SVN: r257413
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants