Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: Invalid garbage collection of structure pointers #36569

Closed
abarisani opened this issue Jan 15, 2020 · 27 comments
Closed

runtime: Invalid garbage collection of structure pointers #36569

abarisani opened this issue Jan 15, 2020 · 27 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.

Comments

@abarisani
Copy link

What version of Go are you using (go version)?

$ go version 1.13.6

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env

GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/lcars/.cache/go-build"
GOENV="/home/lcars/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/lcars/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/lib/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build760597893=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Run the example at https://play.golang.org/p/RIMIZDWEcZT

What did you expect to see?

I expect the code not to panic.

What did you see instead?

The code panic because GC scrapes struct instances which should be preserved.

Description

The code at https://play.golang.org/p/RIMIZDWEcZT uses an AlignmentBuffer structure to allow selection of an aligned offset over a []byte array for casting a structure. (This is used for bare metal code operation within TamaGo, however the identified issue is not specific to bare metal and/or TamaGo code as it is reproducible with standard Go).

The dTD struct holds two pointers to AlignmentBuffer instances, what is witnessed in the main function is that allocating an array of []*dTD results, without losing a reference to it, for the *AlignmentBuffer pointer contents to be scraped out by GC.

This is unexpected behaviour.

In order to workaround this we had to keep the two *AlignmentBuffer pointers outside the dTD struct, as this doesn't trigger the issue.

@reusee

This comment has been minimized.

@cuonglm

This comment has been minimized.

@ALTree

This comment has been minimized.

@ALTree ALTree closed this as completed Jan 15, 2020
@abarisani
Copy link
Author

One thing has nothing to do with the other.

The panic which I trigger in the test code is to show that the GC is collecting something that it shouldn't (imho), the "unsafe pointer arithmetic" fatal error in go1.14, triggered by "checkptr" merely prevents the test case for an entirely different reason.

I don't think the unsafe pointer use is, or should be related, with the GC behaviour. Or in other words why does the unsafe.Pointer use flags the entire NewAlignmentBuffer for clearing by the GC, while it is kept around? Is this expected at all?

Thanks

@empijei
Copy link
Contributor

empijei commented Jan 15, 2020

If a uintptr cannot be converted to an unsafe.Pointer, why does the documentation specifically states that it is allowed? Is this a doc error?

image

Also I don't think this is a stack growing problem here.

@abarisani

This comment has been minimized.

@ALTree
Copy link
Member

ALTree commented Jan 15, 2020

@abarisani In general, a program that violates the unsafe.Pointer rules described here is not guaranteed to behave correctly, or work, at all; there is also no guarantee that the program will crash at a specific point.

If you can provide a crasher that does not violates the unsafe.Pointer rules, then it'll be investigated.

Re-opening in waiting for info in the meantime.

@ALTree ALTree reopened this Jan 15, 2020
@ALTree ALTree added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Jan 15, 2020
@abarisani

This comment has been minimized.

@cuonglm

This comment has been minimized.

@ALTree
Copy link
Member

ALTree commented Jan 15, 2020

@empijei

The doc says:

There are four special operations available for type Pointer

that means: "these 4 operation are theoretically allowed", in the sense that the program will compile. Then it goes on with an important caveat:

The following patterns involving Pointer are valid. Code not using these patterns is likely to be invalid today or to become invalid in the future.

This means that even if in general a uintptr -> Pointer conversion will compile, if it doesn't follow one of the allowed patterns, then it's not valid.

@abarisani

At line 89:

dtd = (*dTD)(unsafe.Pointer(dtdBuf.Addr))

The rule listing the allowed patterns says (pattern n. 2):

(2) Conversion of a Pointer to a uintptr (but not back to Pointer).
[...]
Conversion of a uintptr back to Pointer is not valid in general.

You are doing this on line 89, since dtdBuf.Addr is an uintptr created from an unsafe.Pointer in NewAlignmentBuffer.

It goes on:

Conversion of a uintptr back to Pointer is not valid in general.
[...]
The remaining patterns enumerate the only valid conversions from uintptr to Pointer.

I don't think your usage is aligned to any of the listed patterns:

  • (3) Conversion of a Pointer to a uintptr and back, with arithmetic. (Note that both conversions must appear in the same expression)
  • (4) Conversion of a Pointer to a uintptr when calling syscall.Syscall.
  • (5) Conversion of the result of reflect.Value.Pointer or reflect.Value.UnsafeAddr from uintptr to Pointer.
  • (6) Conversion of a reflect.SliceHeader or reflect.StringHeader Data field to or from Pointer.

@abarisani
Copy link
Author

By the way, if I change the code so that the two *AlignmentBuffer pointer are returned by buildDTD, rather than being embedded in the dTD structure, and I keep them around in main then the GC does not clean them up.

@abarisani
Copy link
Author

This example address the warning but exhibits the same behaviour: https://play.golang.org/p/QrNrPHkk6Un

@ALTree ALTree changed the title Invalid garbage collection of structure pointers. runtime: Invalid garbage collection of structure pointers Jan 15, 2020
@ALTree
Copy link
Member

ALTree commented Jan 15, 2020

Thanks for fixing the reproducer.

I can reproduce the crash both on 1.13 and tip. I also note that when ran with GOGC=off, the panic at the end of the program does not fire.

cc @aclements @mknyszek

@ALTree ALTree added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. and removed WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. labels Jan 15, 2020
@abarisani
Copy link
Author

Thanks! Just for reference you can see here how we worked around the issue for now:

https://github.com/f-secure-foundry/tamago/blob/master/imx6/usb/endpoint.go#L183

If we remove the indirection and keep those pointers around without embedding them in the structure (i.e. return structure + 2 points rather than structure with those 2 pointers embedded within) then everything works just fine and the GC doesn't give us troubles.

Let me know if there is anything more that I can do to help.

@ALTree
Copy link
Member

ALTree commented Jan 15, 2020

Rule (1) says

(1) Conversion of a *T1 to Pointer to *T2.

Provided that T2 is no larger than T1 and that the two share an equivalent memory layout, this conversion allows reinterpreting data of one type as data of another type.

This means that on the

dtd = (*dTD)(unsafe.Pointer(&dtdBuf.Buf[dtdBuf.Offset]))

line, dTD and &dtdBuf.Buf[dtdBuf.Offset] need to "share an equivalent memory layout" for the conversion to be valid. Is this the case? The two types are:

type dTD struct {
	next   *dTD
	token  uint32
	buffer [5]uintptr

	buf   *AlignmentBuffer
	pages *AlignmentBuffer
}
type AlignmentBuffer.Buf []byte

In particular, dTD has pointer fields, while Buf is just a []byte.

Maybe cc @ianlancetaylor too.

@abarisani
Copy link
Author

It depends on what "equivalent memory layout" means. the size of Buf is initialized so that it can always hold dTD, is that sufficient to satisfy this condition?

@ianlancetaylor
Copy link
Contributor

I agree with @ALTree. In the example from #36569 (comment), this line is invalid:

	dtd = (*dTD)(unsafe.Pointer(&dtdBuf.Buf[dtdBuf.Offset]))

dTD and [...]byte do not share an equivalent memory layout.

The code mentioned in #36569 (comment) also appears to be invalid at first glance.

Go, unlike C, is a memory managed language. You can't in general treat memory allocated as one type as being memory of a different type. In particular, if you allocate memory with a pointer type, then you must never store a non-pointer there, and if you allocate memory with a non-pointer type, then you must never store a pointer there.

I'm going to close this again, but please comment if you disagree.

@ALTree
Copy link
Member

ALTree commented Jan 15, 2020

For the record, mdempsky also proposed to better clarify what "equivalent memory layout" means, in issue #16807, but the proposal was rejected. Discussion there and in the linked issue/golang-dev thread may also be of interest.

@ianlancetaylor
Copy link
Contributor

Sorry, I missed the last couple of comments. "Equivalent memory layout" essentially means that the shared prefix of memory is exactly the same type. If you are converting from one struct type to another, the shared fields need to have the same type.

@abarisani
Copy link
Author

So I guess this line:

dst := (*Dirent)(unsafe.Pointer(&b[0]))

from https://golang.org/src/syscall/fs_nacl.go which casts a Dirent (https://golang.org/pkg/syscall/#Dirent) over a byte array, is also incorrect?

Or is this an issue only when we place pointers within the byte array?

In other word is casting a structure over a byte array fine as long as there are no Go pointers/references inside of it? Because I see casting structs over byte arrays in several occasions (I can find other examples if you like).

I'd also like to point out that our code works just fine with the workaround I mentioned which doesn't trigger the GC, this has been tested with real hardware and we are actually implementing drivers in Go by using similar techniques and it all works fine.

I understand that Go does not guarantee this to continue to work in the future, but I just want to ensure that this GC issue we are witnessing is not a symptom of another issue which is being hidden by instead pointing the finger on how we cast structs over byte arrays.

It is important for our project to understand if GC has issues in general and, if not, what are the rules that we must follow when casting structures over byte arrays (which we must do), as from this issue we stumbled upon it's not quite clear (at least to me).

Thanks for your patience and help!

@abarisani
Copy link
Author

Another example for reference:

rsa := (*RawSockaddr)(unsafe.Pointer(&b[0]))

@beoran
Copy link

beoran commented Jan 15, 2020

@ianlancetaylor

Currently, unsafe pointers can also be used to cast different structs which simply have the same size, this is used in the go standard library as well (e.g., math). See an example here:

https://play.golang.org/p/Vn6PlzmMgHj
EDIT:
https://play.golang.org/p/1FWFs0cx08S is better.

I think the problem only arises when you have pointers inside the struct, the GC doesn't know about those and may not be able to see that you are keeping references by using them.

@abarisani
Copy link
Author

abarisani commented Jan 15, 2020

The GC should know about them as we are keeping a reference to the cast structure around. Or at least that would be my expectation. The cast structure is kept in scope if you check the playground example.

The only difference between the non working and working example is that in the 1st case only the dTD structure is kept in scope, and that's the one that holds the two references to the other 2 structures. In the 2nd case dTD and the other two pointers are kept at the same level.

If the code would be invalid I'd expect none of the examples to work, and not just the second one.

I'd also expect GC to disregard anything which is dereferenced and "hidden" in a byte array that was once upon a time cast to a (now lost) structure. However this is not the case here, all byte arrays which we are casting on as well as all the casted structures are kept referenced.

Hence my confusion and willingness to understand whether this is a true GC bug, or my incorrect interpretation on what should and shouldn't work. In any case it would be nice to have a clear rule to follow so that I can adjust my API accordingly.

@randall77
Copy link
Contributor

So I guess this line:

dst := (*Dirent)(unsafe.Pointer(&b[0]))
from https://golang.org/src/syscall/fs_nacl.go which casts a Dirent (https://golang.org/pkg/syscall/#Dirent) over a byte array, is also incorrect?

Yes, that code violates the rule. I wouldn't call it a major violation, as both Dirent and the byte slice backing store have no pointers in them. But doing casts like this can have Go version and platform dependent behavior (especially in this case, layout and alignment issues). Fortunately in package syscall in the stdlib, we can control for both platform and Go version.

In other word is casting a structure over a byte array fine as long as there are no Go pointers/references inside of it? Because I see casting structs over byte arrays in several occasions (I can find other examples if you like).

It's not fine according to the rules. You might run into issues doing this. But you're right that the damage should be limited. I do not expect you would run into GC issues because no pointers are involved. If you're prepared to accept possible Go version and platform-specific behavior on operations involving just that structure, go ahead.

Really though, you should write encoders and decoders for your data structure to and from []byte. That will ensure you avoid this issue altogether. For instance, in fs_nacl.go we should really do:

import "encoding/binary"
binary.LittleEndian.PutUint64(b[0:8], uint64(src.inode.Ino))
binary.LittleEndian.PutUint64(b[8:16], offset)

and so on. It will even generate the exact same code on architectures that support unaligned writes.

I understand that Go does not guarantee this to continue to work in the future, but I just want to ensure that this GC issue we are witnessing is not a symptom of another issue which is being hidden by instead pointing the finger on how we cast structs over byte arrays.

If you are casting pointer-containing structures onto a byte array, then I can definitely see that causing GC issues. If you have a program that just casts between pointer-free types and still has garbage collection problems, I'd like to see it.

@abarisani
Copy link
Author

Understood, thanks for now!

@ianlancetaylor
Copy link
Contributor

You must not allocate memory as a non-pointer type and then store a pointer value there.

You must not allocate memory as a pointer type and then store a non-pointer value there.

Other fiddling with memory is likely to work with the current implementation but is not guaranteed for the future.

@abarisani
Copy link
Author

I'd just like to thanks again for the nice discussion here, which triggered restructuring our code to avoid all casting that might break in the future, we dramatically reduced the use of unsafe by performing all allocations for DMA purposes through a specific API and using encoding/binary as suggested:

usbarmory/tamago@ee270cc...c0e4396

@golang golang locked and limited conversation to collaborators Jan 21, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

9 participants