-
Notifications
You must be signed in to change notification settings - Fork 18k
runtime: crash in hybrid barrier initializing C memory #19928
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Also, @ianlancetaylor, do you think this code should work as far as the Go pointer rules are concerned? There are no Go-managed data structures anywhere in sight, so it seems to me like they should. |
/cc @RLH |
One downside of this is that it will increase the cost of the write barrier, since we'll have to add a check for if the slot is in the heap or a global.
I'm not sure off the top of my head how many of these there are. It's possible we could manually annotate the writes to have write barriers (with a write barrier that bypasses the slot check), though that seems error-prone, especially going forward. Or we could compile all write barriers in the runtime to the slot-check-bypassing write barrier, using the knowledge that the runtime never writes a pointer to an uninitialized structure. |
I agree that according to the current Go pointer rules this code is OK. Programs aren't permitted to store a Go pointer in C memory, and indeed that is not happening. Apparently the memory happens to contain a bit pattern that looks like a Go pointer, but it is not actually a Go pointer. Changing We could extend the Go pointer rules to say that C memory that has a pointer type must be explicitly initialized if it is visible from Go. To make that simpler we could modify |
To summarize in-person discussion: we're all concerned about increasing the write barrier cost for something that appears to almost never be a problem in practice. So, we're leaning toward tweaking the rules to say something like: if Go writes to pointers in the memory, it has to be initialized (this is already the case for pointer reads; this would extend it to pointer writes) and possibly extending the cgo interface to make the rules easy to stay in (perhaps providing |
We could probably get at least some mileage out of whether the pointer has a C type or a Go type. It seems reasonable to require that all memory accessed through Go types must be initialized (since Go values are normally zero-inititalized), but much less so for C types (since C out-parameters are often left uninitialized in idiomatic C code). With #16623 it might be more plausible for the compiler to be aware of whether the type of a given slot is C or Go. We could vet the C memory for Go pointers at the time of conversion to Go types (e.g. in a Then we could apply the more expensive write barrier only to pointers of C types, for which assignments are relatively rare (and often somewhat expensive anyway because they tend to correlate with transitions between Go and C stacks). |
@bcmills Perhaps, but I'm somewhat reluctant to further complicate the cgo pointer rules by adding rules that depend on the type used to access the memory, especially since when using cgo that type often varies between C code and Go code. For example, C code will often use |
I think we would have to declare that So the rules would be, roughly:
Note that the code in Russ's example would be invalid under this rule: I will leave it up to you to decide whether that rule is too complicated. |
The current type system does not distinguish between C memory and Go memory and cannot be reinterpreted to do so. |
@rsc I don't think anyone is proposing that the type system should distinguish between C memory and Go memory. The rule I described above constrains initialization, not object location. It's quite possible that there is a better choice of phrases than "Go type", but it wasn't obvious to me. The general idea is that a value of a "Go type" is always initialized by a Go statement or expression, whereas a value of a "non-Go type" may or may not be. The current type system does distinguish between pointers and non-pointers, particularly with the use of |
Bumping to Go 1.8.4 (or maybe Go 1.9) |
Are we just not going to fix this for Go 1.8, @aclements? |
Since we haven't received any external reports of this being a problem and there are ways to work around it, I would lean toward not fixing this for 1.8. It's also still unclear to me what the fix would be. We decided in April that we didn't want to make the write barrier more expensive (and complicate the runtime) to detect whether the slot is outside Go-managed memory, especially without evidence that this is actually a problem. So that leaves changing the rules (and maybe making Either way, doesn't seem like a 1.8 issue, so I'm re-milestoning it to 1.10. |
I'm more concerned about the implications for passed-in buffers than for the Go |
Yes.
Not sure. |
To be a little more concrete, here's pseudo-code for what I'm proposing:
Notably, this never inspects old or tries to guess whether it looks like a garbage value. Pointer writes that could lead to a garbage old are discarded on the basis of writing a non-Go pointer (the new value) into non-Go memory (the slot address). We don't even try to shade old. Unfortunately, while this avoids the slow-path slot check for most non-nil non-C pointer writes, it still triggers that path for writes of nil to Go memory, which are obviously very common, as well as writes of pointers to Go globals into Go heap memory. |
Change https://golang.org/cl/123658 mentions this issue: |
Please note that a C function struct S _;
f(&_); Users don't have control over it, so this means that returning C structs from "exported" Go functions can also cause troubles. |
@chfast I'm likely missing something but I don't see how that can cause a problem. The transformation that you describe is done implicitly by the C compiler. The problems described here only arise in Go code. The Go code is going to return a struct in the usual way for Go code, with the usual zero initialization of the result as neeed. The fact that the C code will then copy that into uninitialized C memory doesn't matter. |
Sorry, I incorrectly assumed that "exported" Go functions have C ABI. |
@chfast Behind the scenes, C code that calls an exported Go function actually calls a C wrapper, generated by cgo, that calls the Go function using the Go ABI and translates the results into the C ABI. |
Describe the problem as a bug, since it is not implied by the rest of the pointer passing rules, and it may be possible to fix it. Updates #19928 Change-Id: I2d336e7336b2a215c0b8cf909a203201ef1b054e Reviewed-on: https://go-review.googlesource.com/123658 Reviewed-by: Austin Clements <austin@google.com>
Nothing more here for Go 1.11. |
We have an implementation that panics, Go 1.18, frequently (panics within minutes of any restart when loaded). The process is C++, RHEL/8, intel compiler, that loads a Golang shared object. The interface between C++ and Go is one that passes a byte*/[]byte back and forth using a Go channel for each direction. Messages are marshalled into buffers which are all allocated from C.malloc and un-marshaled into message objects in the other language. There is no other interaction between C++ and Go. Messages queued from C++ to Go run on a C++ thread that marshals the message, then un-marshals it into a Go message and puts it into a Go channel for goroutines to take. Messages from Go to C++ are queued on a Go channel by a goroutine, and taken off that channel using a C++ thread that dequeues the Go message from the channel, marshals it into a byte[], then un-marshals it into a C message. Marshaling is used as a means to avoid type converting - unsafe.Pointer is used along with a byte size for the un-marshaling functions - the unsafe.Pointer always being C.malloc memory. Many variations on this theme have been tried, as well as variations on which heap to use for the byte pointers. The Go channels have even been removed and replace with C.malloc shared memory. Always the same problem occurs. (gdb) bt The SIGABRT comes from a GC thread. The GC code is generating a traceback in what appears to be a stack scan and the traceback code panics when it fails to identify a function pointer. I don’t know exactly why the panic generating gentraceback is in the “middle” of the GC code but it appears related to resolving defer operations (it’s not generating a traceback for logging, but, to identify what’s going on in the call stack it’s searching). It appears the GC code may be misusing frame.lr in stack frame 7 along the path executing (which ends in the fatal panic).
When the panic occurs, I find that frame.lr = 2, or 3, or 0 -- generally a small integer value. It could be there is code not setting frame.lr = 0 before the code gets here to avoid the valid() check. I've found that if I modify the above code, to detect what appears to be an invalid pointer in frame.lr, and not check it, that our code runs fine w/o any memory growth for long periods of time. But still, it eventually panics, possibly because the invalid pointer check is only an approximation as opposed to being a proper fix. |
@pdefrain Please open a new issue for your problem rather than adding on to this rather old issue. Your problem does not sound the same as the problem described in this issue, and I'm not even sure that this issue is still broken. In your new issue please include reproduction instructions if possible. Thanks. |
* appender: fix typo * appender: remove unused error * make duckdb.Conn exported to be used with conn.Raw method in std SQL package * appender: remove not needed parenthesis * Implement Apache Arrow interface and Arrow type structs Added and defined structures for Arrow schema and Arrow array in a new file "arrow.h". This allows Arrow-compatible communication and data sharing, which is memory-efficient. Implemented Arrow interface in the connection.go file. Also updated the 'duckdb_test.go' file to include tests for the Arrow interface. * call queryArrowArray only when there are more rows to get * Rename QueryArrowContext to QueryArrow Renames the function QueryArrowContext to QueryArrow. This change is to simplify the function name as 'Context' in the name isn't required or adds any meaning. * prepareConfig: refactor to free option string pointers * use calloc instead of malloc to avoid potential bug golang/go#19928 * arrow: separate DuckDB Arrow interface - conn structure was made unexported - NewConnector was renamed to OpenConnector to be consistent with sql package and now returns ConnectorCloser interface to expose Close method - appender: NewAppenderFromConn receives connection of type any to be compatible with sql.Conn.Raw method. Tests and docs were adopted as well. * Refactor to use driver.Conn instead of any. * styleguide fixes * arrow: remove Query method leaving only QueryContext * Simplifications on pointer handling * Fix tests * Fix indentation * Fix formatting --------- Co-authored-by: Marc Boeker <marc@at6.net>
Just wanted to confirm that this one is still an issue, at least in Go 1.22.1. We were hitting this with some code similar to @rsc 's original example, and worked around by switching |
Received report via email of code crashing in write barrier finding a bad pointer (pointing to a free span). Code looks like:
The problem appears to be that C.malloc returns uninitialized memory. Then the initialization of that memory during append overwrites uninitialized pointer slots, and the new hybrid write barrier tries to follow the uninitialized pointers as if they were real pointers.
Prior to the hybrid barrier, the old data being overwritten for off-heap memory was never considered. Now it is. This will invalidate code like the above. It took a long time to notice, so apparently this is rare.
One workaround is to make C.malloc return zeroed memory, which would have a performance cost people might not like (especially since it doesn't seem to matter to most people?). Another workaround is to tell people to use C.calloc for allocating C data structures with pointers, or maybe rewriting just the C.malloc calls that mention pointer sizes.
Really the best fix would be for the GC not to follow these pointers. @aclements says there are some cases where do want the write barrier to follow pointers in non-managed-memory, specifically non-global off-heap runtime structures, but maybe we can distinguish these from other kinds of memory (like C memory) somehow.
If there is a simple fix and we receiver any reports of external problems along these lines, we can consider including the fix in Go 1.8.2.
The text was updated successfully, but these errors were encountered: