Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unsafe: clarify what "equivalent memory layout" means #16807

Closed
mdempsky opened this issue Aug 19, 2016 · 10 comments
Closed

unsafe: clarify what "equivalent memory layout" means #16807

mdempsky opened this issue Aug 19, 2016 · 10 comments
Labels
Documentation FrozenDueToAge NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Milestone

Comments

@mdempsky
Copy link
Member

One of package unsafe's valid patterns is:

(1) Conversion of a *T1 to Pointer to *T2.

Provided that T2 is no larger than T1 and that the two share an equivalent memory layout, this conversion allows reinterpreting data of one type as data of another type.

There have been arguments about what "equivalent memory layout" means (e.g., #16769, and the "Guarantees for package unsafe" thread on golang-dev).

E.g., in C/C++, it's valid to cast from a pointer to a struct to a pointer to another struct containing a prefix of the fields to support things like casting from struct sockaddr_in * to struct sockaddr *. But package unsafe gives an example of converting from *float64 to *uint64, which wouldn't be valid in C/C++.

In crypto/md5, there's code that converts &p[0] (of type *byte) to *[16]uint32, after verifying len(p) >= 48. Assuming we think this should remain valid, that seems to suggest it's okay to convert *[4]byte to *uint32; i.e., that primitive data types need not exactly match in size either.

  • Aside: Technically this conversion isn't protected by rule 1 anyway, because T2 ([16]uint32) is larger than T1 (byte), but I think it's arguably within the spirit of the rule because of the length check.

My understanding of the phrase's intent was to disallow accessing pointer slots as non-pointer types. E.g., you can't convert from **int to *uintptr, even though sizeof(*int) == sizeof(uintptr).

@dr2chase suggested that it could be interpreted to imply sensitivity to the machine's endianness, which may disallow converting *uint64 to *uint16.

In https://go-review.googlesource.com/#/c/18640/ this wording was discussed:

@alandonovan: Is "equivalent memory layout" defined anywhere?
@rsc: No, but I'd rather not. I'm trying to write useful text without writing a legal document. I hope people can at least understand memory layouts and whether the conversion they want makes sense. If not they probably shouldn't be using unsafe.

Unfortunately it seems even within the Go compiler team there's disagreement/uncertainty on what "equivalent memory layout" means, even for concrete examples like converting *uint64 to *uint16.

/cc @ianlancetaylor @randall77

@RLH
Copy link
Contributor

RLH commented Aug 21, 2016

"Packages that import unsafe may be non-portable and are not protected by
the Go 1 compatibility guidelines."
Given the above we should stick with the current wording. "equivalent
memory layout" may not be perfect but it provides room for future
optimizations in the compiler and runtime. If code needs more precision
than "equivalent memory layout" or provokes discussions about what it means
then it should be taken as a warning that the code being written is likely
to require continual maintenance with each future releases of Go.

On Fri, Aug 19, 2016 at 5:26 PM, Matthew Dempsky notifications@github.com
wrote:

One of package unsafe's valid patterns is:

(1) Conversion of a *T1 to Pointer to *T2.

Provided that T2 is no larger than T1 and that the two share an equivalent
memory layout, this conversion allows reinterpreting data of one type as
data of another type.

There have been arguments about what "equivalent memory layout" means
(e.g., #16769 #16769, and the "Guarantees
for package unsafe"
https://groups.google.com/forum/#!topic/golang-dev/uCP4P12xS9Y thread
on golang-dev).

E.g., in C/C++, it's valid to cast from a pointer to a struct to a pointer
to another struct containing a prefix of the fields to support things like
casting from struct sockaddr_in * to struct sockaddr *. But package
unsafe gives an example of converting from *float64 to *uint64, which
wouldn't be valid in C/C++.

In crypto/md5
https://github.com/golang/go/blob/master/src/crypto/md5/md5block.go#L41,
there's code that converts &p[0](of type *byte) to *[16]uint32, after
verifying len(p) >= 48. Assuming we think this should remain valid, that
seems to suggest it's okay to convert *[4]byte to *uint32; i.e., that
primitive data types need not exactly match in size either.

  • Aside: Technically this conversion isn't protected by rule 1 anyway,
    because T2 ([16]uint32) is larger than T1 (byte), but I think it's
    arguably within the spirit of the rule because of the length check.

My understanding of the phrase's intent was to disallow accessing pointer
slots as non-pointer types. E.g., you can't convert from *_int to *uintptr,
even though sizeof(_int) == sizeof(uintptr).

@dr2chase https://github.com/dr2chase suggested that it could be
interpreted to imply sensitivity to the machine's endianness, which may
disallow converting *uint64 to *uint16.

In https://go-review.googlesource.com/#/c/18640/ this wording was
discussed:

@alandonovan https://github.com/alandonovan: Is "equivalent memory
layout" defined anywhere?
@rsc https://github.com/rsc: No, but I'd rather not. I'm trying to
write useful text without writing a legal document. I hope people can at
least understand memory layouts and whether the conversion they want makes
sense. If not they probably shouldn't be using unsafe.

Unfortunately it seems even within the Go compiler team there's
disagreement/uncertainty on what "equivalent memory layout" means, even for
concrete examples like converting *uint64 to *uint16.

/cc @ianlancetaylor https://github.com/ianlancetaylor @randall77
https://github.com/randall77


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#16807, or mute the thread
https://github.com/notifications/unsubscribe-auth/AA7Wn8bBYEeqFV9UizlHIliyWkJrX2JMks5qhh-jgaJpZM4Jo3nL
.

@dr2chase
Copy link
Contributor

I went to check that crypto code, and note that the cast is guarded by either "if x86" or "if littleEndian AND byte pointer is suitably aligned" (because otherwise, it stands a nice chance of being wrong). I think the guards put this well into the gray area.

@RLH
Copy link
Contributor

RLH commented Aug 22, 2016

Yea, explaining alignment issue in a way that is consistent across
architectures, unspecified cache line lengths, and so forth would be
intractable. Let's not go there, "equivalent memory layout" and is
probably the best Go can do.

On Sun, Aug 21, 2016 at 7:36 PM, dr2chase notifications@github.com wrote:

I went to check that crypto code, and note that the cast is guarded by
either "if x86" or "if littleEndian AND byte pointer is suitably aligned"
(because otherwise, it stands a nice chance of being wrong). I think the
guards put this well into the gray area.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#16807 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AA7Wn5xmKjh57pXnSSm_2OFHw1apIwI9ks5qiOECgaJpZM4Jo3nL
.

@ianlancetaylor
Copy link
Contributor

My vote would be to simply drop the words "and that the two share an equivalent memory layout." I don't see that they tell us anything meaningful. As Russ's quote suggests, people who want to convert from one pointer type to another need to know what they are doing. We should not provide any guarantees about how values are stored in memory.

It's true that there is an issue about converting between a pointer to a pointer type to a pointer to a non-pointer type. We may want to state clearly that that is invalid, or, at least, that it's invalid if you dereference the resulting pointer. But I don't think that is implied by the language about "equivalent memory layout."

How about something like

// (1) Conversion of a *T1 to Pointer to *T2.
//
// Provided that T2 is no larger than T1, that T2 does not have
// stricter alignment requirements than T1, that any pointer fields of
// T1 are also pointer fields of T2, and vice-versa, this conversion
// allows reinterpreting data of one type as data of another type. An
// example is the implementation of math.Float64bits:
//
//  func Float64bits(f float64) uint64 {
//      return *(*uint64)(unsafe.Pointer(&f))
//  }
//
// The alignment requirements of types and the position of pointer
// fields of types may vary between different implementations and
// between different versions of the same implementation.

@RLH
Copy link
Contributor

RLH commented Aug 22, 2016

I worry about this definition defeating alias analysis that is based on
writes to disjoint types not being aliases. If we do end up changing the
wording perhaps we should indicate that use of unsafe may suppress compiler
optimizations now and in the future.

I still think we should just keep the current wording.

On Mon, Aug 22, 2016 at 12:58 PM, Ian Lance Taylor <notifications@github.com

wrote:

My vote would be to simply drop the words "and that the two share an
equivalent memory layout." I don't see that they tell us anything
meaningful. As Russ's quote suggests, people who want to convert from one
pointer type to another need to know what they are doing. We should not
provide any guarantees about how values are stored in memory.

It's true that there is an issue about converting between a pointer to a
pointer type to a pointer to a non-pointer type. We may want to state
clearly that that is invalid, or, at least, that it's invalid if you
dereference the resulting pointer. But I don't think that is implied by the
language about "equivalent memory layout."

How about something like

// (1) Conversion of a _T1 to Pointer to *T2.
//
// Provided that T2 is no larger than T1, that T2 does not have
// stricter alignment requirements than T1, that any pointer fields of
// T1 are also pointer fields of T2, and vice-versa, this conversion
// allows reinterpreting data of one type as data of another type. An
// example is the implementation of math.Float64bits:
//
// func Float64bits(f float64) uint64 {
// return *(_uint64)(unsafe.Pointer(&f))
// }
//
// The alignment requirements of types and the position of pointer
// fields of types may vary between different implementations and
// between different versions of the same implementation.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#16807 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AA7Wn_gh-3jm3Ph-KzJfIGMhU5_zZMjtks5qidVTgaJpZM4Jo3nL
.

@ianlancetaylor
Copy link
Contributor

Alias analysis is a problem using any wording at all. Alias analysis is likely to break any code that does anything other than an immediate read using the converted pointer.

In gccgo I disable alias analysis in any file that imports unsafe. Russ and I discussed that several years ago--that importing unsafe should disable some optimizations within the file but should not affect other files, thus giving people who import unsafe yet another way to break things--but I don't think we ever wrote it down.

@thanm
Copy link
Contributor

thanm commented Aug 22, 2016

Is there a potential issue with inlining? E.g. function mypackage.foo in file foo.go contains a call to mypackage.bar in bar.go (where bar.go imports unsafe). If during optimization the body of bar is inlined into foo then you would want the "don't use alias analysis" flag to go along with it.

@ianlancetaylor
Copy link
Contributor

Yes, there is a potential issue. With gccgo, cross-package inlining is currently only done via LTO, which must already deal with the problem (inlining a function in a file compiled with -fno-strict-aliasing into a file compiled with the default -fstrict-aliasing).

@dr2chase
Copy link
Contributor

If this causes problems, two options are (1) don't inline code that mentions "unsafe" or (2) keep track of all pointers ever involved in unsafe.Pointer casts, and treat them as aliasing everything else unsafe. That is, the "alias pointer type" becomes a pair (ReferentType, EverUnsafe) and p and q may-alias if MayOverlap(RT(p),RT(q)) OR EU(p) AND EU(q).

@quentinmit quentinmit added this to the Go1.8 milestone Sep 6, 2016
@quentinmit quentinmit added the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Oct 10, 2016
@rsc
Copy link
Contributor

rsc commented Oct 10, 2016

I am inclined to leave the current wording. I understand the unease but like I said before, I don't want this to turn into a legal document. It is true that compilers must disable certain optimizations to keep things like (possibly inlined) Float64frombits safe. In any event this doesn't seem like it is critical for Go 1.8.

@rsc rsc modified the milestones: Go1.8Maybe, Go1.8 Oct 10, 2016
@rsc rsc closed this as completed Oct 21, 2016
@golang golang locked and limited conversation to collaborators Oct 21, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Documentation FrozenDueToAge NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Projects
None yet
Development

No branches or pull requests

8 participants