Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: spec: define/imply memory layout of tagged struct fields #10014

Closed
griesemer opened this issue Feb 26, 2015 · 28 comments
Closed

proposal: spec: define/imply memory layout of tagged struct fields #10014

griesemer opened this issue Feb 26, 2015 · 28 comments
Labels
FrozenDueToAge LanguageChange NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. Proposal v2 A language change or incompatible library change
Milestone

Comments

@griesemer
Copy link
Contributor

This is half-baked idea that needs more thought, written down so it doesn't get forgotten.

Currently, the spec does not specify how struct fields are laid out in memory (e.g. source order, or optimally packed, or any other order). That is, theoretically, a compiler is free to optimize the layout. However, currently all (?) compilers lay out struct fields in source order, possibly wasting significant amounts of space due to padding (struct internal fragmentation).

It is not unlikely that code depends on this property.

Instead of requiring all structs to follow source order layout of fields, we could say that structs with tagged fields are the ones that are laid out in source order. Here are some variants of the same idea:

  • struct fields are laid out in source order if there's at least one field that's tagged
  • struct fields are laid out in source order if all fields are tagged
  • tagged struct fields are laid out in source order (relative to other tagged fields), but untagged fields are free to be where the compiler decides
  • tagged fields are laid out contiguously, in source order, at the beginning (or the end) of the struct; untagged fields are in compiler-determined order at the opposite end (and the end or beginning) of the struct
@griesemer griesemer self-assigned this Feb 26, 2015
@bradfitz
Copy link
Contributor

I would love to have the source order (read: documentation order) and layout be totally separate, and both be under programmer's control. (the programmer might know optimal better than the compiler, if optimal means more than packing, like getting on different cache lines)

You could imagine a struct tag that declares the index position, or a magic zero-width field that declares the order for the rest:

    type S struct {
          _ struct{} `order:"A,C,B"`
          A uint8
          B int16
          C uint8
    }

@randall77
Copy link
Contributor

I'd be happy to have the compiler pack the structs however it likes, unless
there is a "nopack" label somewhere (first field? any field?), in which
case it must lay out the fields in source order.

On Thu, Feb 26, 2015 at 9:42 AM, Brad Fitzpatrick notifications@github.com
wrote:

I would love to have the source order (read: documentation order) and
layout be totally separate, and both be under programmer's control. (the
programmer might know optimal better than the compiler, if optimal means
more than packing, like getting on different cache lines)

You could imagine a struct tag that declares the index position, or a
magic zero-width field that declares the order for the rest:

type S struct {
      _ struct{} `order:"A,C,B"`
      A uint8
      B int16
      C uint8
}


Reply to this email directly or view it on GitHub
#10014 (comment).

@dvyukov
Copy link
Member

dvyukov commented Feb 26, 2015

I'd be happy to have the compiler pack the structs however it likes, unless
there is a "nopack" label somewhere (first field? any field?), in which
case it must lay out the fields in source order.

I second this.
Are there any other options than tags? We have not yet documented any of the //go: comments, right? Otherwise //go:nopack looks nice.

However, this can break some existing code I guess (e.g. if you pass a struct to a syscall).

@nightlyone
Copy link
Contributor

That would make it difficult to hunt cacheline pingpong issues in shared structs. Those can also happen in imported packages. Also handover to C, kernel API, wire-formats, etc. might become hairy and even lead to surprising bugs.

I would switch the default or make it even a new keyword, since it technically is no struct as we know it from C family languages anymore, but something else. That will be very surprising.

A struct has the following three important properties for me:

  • I control memory layout down to the bit level when composing them using {u,}int{,8,16,32,64} float32, float64 and fixed sized arrays of those
  • padding is zeroed on allocation
  • alignment is documented in spec

But a tool like http://linux.die.net/man/1/pahole might prove to be very useful for Go.

@randall77
Copy link
Contributor

I agree we probably can't make the default reorder fields for Go 1.x. But
it is something to think about for 2.0.

On Thu, Feb 26, 2015 at 10:55 AM, Ingo Oeser notifications@github.com
wrote:

That would make it difficult to hunt cacheline pingpong issues in shared
structs. Those can also happen in imported packages. Also handover to C,
kernel API, wire-formats, etc. might become hairy and even lead to
surprising bugs.

I would switch the default or make it even a new keyword, since it
technically is no struct as we know it from C family languages anymore, but
something else. That will be very surprising.

A struct has the following three important properties for me:

  • I control memory layout down to the bit level when composing them
    using {u,}int{,8,16,32,64} float32, float64 and fixed sized arrays of those
  • padding is zeroed on allocation
  • alignment is documented in spec

But a tool like http://linux.die.net/man/1/pahole might prove to be very
useful for Go.


Reply to this email directly or view it on GitHub
#10014 (comment).

@nightlyone
Copy link
Contributor

Even in Go 2 it would be very surprising to have random breakage of wire formats, binary formats, in process ABI, etc. when using the struct keyword. pahole and friends exist today and are used heavily by e.g. the linux kernel folks to optimize that as needed.

I also doubt the compiler knows enough about data access patterns to optimize it well enough for the general case.

I really have a strong preference for advise instead of premature compiler optimizations here. But that's just me...

@cespare
Copy link
Contributor

cespare commented Feb 26, 2015

Not offering an opinion about this proposal, but

random breakage of wire formats, binary formats, in process ABI, etc. when using the struct keyword

all imply some kind of unsafe code and depending on ordering that isn't guaranteed by the Go specification today.

@extemporalgenome
Copy link
Contributor

The ordering of struct fields in memory only has semantic importance when doing something unsafe or when using reflect, the latter of which can be of guaranteed source-order independent of actual memory layout. Use of unsafe specifically has non-defined, non-guaranteed behavior, and proper use of cgo is increasingly moving toward the idea that memory cannot be shared across the language boundary.

There may have been a time when it would have been more important to define, but, as discussed above, those use cases are evaporating. Still, if we really must define it, the major implementations' current approach of source-order and non-packing should become official; it retains control we already have in practice without adding syntax or more magic comments. Tooling (rather than the compiler) can rewrite type declarations to meet user needs, and even generate packed structs (such as for use in large slice datasets) with converters to and from a well-aligned form when accessing individual elements.

@extemporalgenome
Copy link
Contributor

@Jragonmiris a few things would need to happen for pointers across cgo to be safe (some of these may already be enforced):

  • No mechanism can allow C to directly reference Go package globals (the only safe way to pass data, including pointers, is via the C stack).
  • Goroutines blocked on C calls cannot have any of its stack-resident references moved, if the implementation has a compacting GC.
  • C may not retain references into Go memory after the initial call from Go returns. Also, C may not pass references into Go memory between concurrent calls into C originating from Go (reference passing within C may behave like a tree, but may not behave like a graph).
  • Pertinent to this issue, Go struct types escaping into C in the form of pointers must be detected and implicitly compiled with predictable layouts, such as source-order aligned. Alternatively, Go struct pointers may not cross the boundary at all.

@griesemer
Copy link
Contributor Author

@Jragonmiris Please don't hijack this issue for the cgo problem of passing objects into C land. While related, this current issue is strictly about whether the language should have a way to tell the implementation that struct fields should remain in source order (independent of whether that's the de facto case now or not). Thanks.

@dvyukov
Copy link
Member

dvyukov commented Feb 28, 2015

There was an utility that detects when structs waste space on padding. Can we move that check into vet?

@dominikh
Copy link
Member

There was an utility that detects when structs waste space on padding. Can we move that check into vet?

If you do, please don't enable it by default.

@rsc rsc added this to the Unplanned milestone Apr 10, 2015
@rsc
Copy link
Contributor

rsc commented Apr 10, 2015

It seems fine to me for the spec not to guarantee anything about struct field order in memory. The spec doesn't operate at that level.

That said, no Go compiler should probably ever reorder struct fields. That seems like it is trying to solve a 1970s problem, namely packing structs to use as little space as possible. The 2010s problem is to put related fields near each other to reduce cache misses, and (unlike the 1970s problem) there is no obvious way for the compiler to pick an optimal solution. A compiler that takes that control away from the programmer is going to be that much less useful, and people will find better compilers.

@dr2chase
Copy link
Contributor

Easy exchange of Go heap data into C programs (which I am not at all sure
is a feature) pretty much requires that we leave struct fields exactly as
we found them and also mimic C layout rules (I have some experience with
this). I would not be so quick to write off the goodness of reordering
fields; it can be a GC optimization, for example, if all referenced
pointers are clustered into a single cache line, instead of being scattered
into two. GC also boils down to bulk data movement; to the extent that
this can be reduced by slightly compressing the in-memory representation,
GC is assisted. Alternately, for the same maximum heap size, a 10%
reduction in footprint is equivalent to allowing 120% overhead instead of
100%.

This would however require a second knob for all those places where the
expert programmer (how many of those are there, really?) does reorder
fields for cache line purposes. Having run across code in years past where
"expert" programmers did twiddly things without leaving the slightest hint
about intent or importance, I don't necessarily see this as a bad thing,
though I understand it is counter to the Go Way.

On Fri, Apr 10, 2015 at 1:38 AM, Russ Cox notifications@github.com wrote:

It seems fine to me for the spec not to guarantee anything about struct
field order in memory. The spec doesn't operate at that level.

That said, no Go compiler should probably ever reorder struct fields. That
seems like it is trying to solve a 1970s problem, namely packing structs to
use as little space as possible. The 2010s problem is to put related fields
near each other to reduce cache misses, and (unlike the 1970s problem)
there is no obvious way for the compiler to pick an optimal solution. A
compiler that takes that control away from the programmer is going to be
that much less useful, and people will find better compilers.


Reply to this email directly or view it on GitHub
#10014 (comment).

@tv42
Copy link

tv42 commented Oct 9, 2015

There was an utility that detects when structs waste space on padding. Can we move that check into vet?

https://github.com/mdempsky/maligned

@finomxf
Copy link

finomxf commented Nov 29, 2016

I think Golang should at least add one feature:

pack(1) for struct

when u deal with USB or other hardware related data structures, pack(1) is very natural for a struct's definition. for example:

#pragma pack(push,1)
typedef struct _USB_DEVICE_DESCRIPTOR {
UCHAR bLength;
UCHAR bDescriptorType;
USHORT bcdUSB;
UCHAR bDeviceClass;
UCHAR bDeviceSubClass;
UCHAR bDeviceProtocol;
UCHAR bMaxPacketSize0;
USHORT idVendor;
USHORT idProduct;
USHORT bcdDevice;
UCHAR iManufacturer;
UCHAR iProduct;
UCHAR iSerialNumber;
UCHAR bNumConfigurations;
} USB_DEVICE_DESCRIPTOR,*PUSB_DEVICE_DESCRIPTOR;
#pragma pack(pop)

@ianlancetaylor
Copy link
Contributor

@finomxf That is a complex feature that requires significant compiler support. For now and the foreseeable future I suggest defining the type using [n]byte and modifying the fields using the encoding/binary package. On many processors the performance will be approximately the same.

@finomxf
Copy link

finomxf commented Dec 8, 2016

thanks @ianlancetaylor
I think [n]byte is good enough for now, even more precise than pack(1)
I am happy with all kinds of workaround.

@rsc rsc changed the title spec: define/imply memory layout of tagged struct fields proposal: spec: define/imply memory layout of tagged struct fields Jun 20, 2017
@rsc rsc added the v2 A language change or incompatible library change label Jun 20, 2017
@ianlancetaylor
Copy link
Contributor

The alternative to permitting the compiler to rearrange the struct fields is to change the language spec to specify the struct field layout. We should do one or the other.

My experience with GCC's struct reorg pass is that it rarely applies to real programs. There are too many constraints in practice. That may be less true with Go, but, for example, programs that use encoding/binary to save struct data require consistent field layout but do not use tags to ensure it.

@ianlancetaylor ianlancetaylor added the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Jan 9, 2018
@ianlancetaylor ianlancetaylor modified the milestones: Unplanned, Proposal Jan 9, 2018
@dvyukov
Copy link
Member

dvyukov commented Jan 10, 2018

FWIW Linux kernel can now randomize struct layout to some degree to improve security:
https://lwn.net/Articles/722293/

@tv42
Copy link

tv42 commented Jan 10, 2018

How would randomization interact with deterministic builds? If a package build picks a new random seed every time, it's no longer deterministic; if it doesn't, it's no longer random.

@griesemer
Copy link
Contributor Author

@tv42 There might be a flag to enable/disable. But the field layout shouldn't be observable from a Go program (*), so in that sense the behavior of the program would remain deterministic.

(*) The compiler may automatically disable field layout randomization for a struct if an address is taken of a field (or unsafe.Offsetof is used). There may be other cases.

@ianlancetaylor
Copy link
Contributor

I think it would be very hard to automatically randomize struct field layout in Go, given that people can use reflect to do almost anything.

It would be possible to randomize struct field layout on an opt-in basis, of course. The Linux kernel plugin is opt-in, except for structs that are only function pointers. To achieve similar security results in Go, a likely more profitable approach would be an unpredictable, but stable, ordering of interface method tables.

@bradleypeabody
Copy link

bradleypeabody commented May 31, 2018

I think it makes a lot more sense to formalize the struct packing algorithm that is used (either in the language spec or if that's too constraining then at least by the compiler - gc and gccgo probably pack the same for most if not all targets) as opposed to having the compiler reorder fields for some not-entirely-clear benefit.

Packing can prefer size (fitting fields into otherwise unused space) or possibly performance (alignment), or if you need the layout to match some other spec exactly the concern is precision and compatibility. Sometimes these are conflicting objectives - you might want to keep things as small as possible but also need precise control. These concerns seem much too nuanced for the compiler to be guessing about.

The thing is, usually packing is not a significant issue. When it is, it's probably better to just reorder the fields in your source code - it's explicit, obvious, simple and compatible with what is being done now.

Seems to me it's more of an issue that what the compiler does is not standardized than the fact that the compiler doesn't magically guess what order you want the fields in.

@seebs
Copy link
Contributor

seebs commented Nov 28, 2018

I just ran into this while thinking about another issue.

A thing that I find surprising, but in retrospect it makes sense: 0-byte fields may or may not have the same address as following non-zero-byte fields. I don't know whether this is intentional, or whether code is already depending on it, but it might be surprising.

I do note, there is exactly one thing in the existing spec that hints at an answer to the struct packing question. The word "padding" occurs exactly once in the spec, on an example of a struct with a blank-identifier field with the comment // padding.

There's also the reflect documentation mentioning both Align and FieldAlign for types; that implies that field alignment requirements might differ from non-field alignment requirements, I guess?

My default preference would be to specify that padding is inserted only as needed for alignment, but there's a subtle problem here: We can check alignment requirements with FieldAlign() at runtime using reflect. There's no other way to check them, and we can't necessarily predict them for architectures we don't have handy to test on. So, for instance, does a structure with an int followed immediately by an int64 imply padding? Maybe! It probably does on at least one target, and doesn't on at least one target. If you wanted to, say, insert padding before the int object, but only if padding was going to be necessary, it's not clear how you could do that. (Maybe build tags controlling which of two definitions of a type get used, followed by _ MaybePadIfIntIs32Bits, but that seems ugly.)

@ianlancetaylor
Copy link
Contributor

FYI, the distinction between Align and FieldAlign arises because there are real ABIs today that make that distinction, such as the 32-bit x86 ELF ABI. When using the gc toolchain the Align and FieldAlign fields are always the same, but this is not true when using gccgo.

@griesemer
Copy link
Contributor Author

We cannot change existing structs to reorder fields this at this point. What we could do is introduce notation that arranges fields more efficiently (not necessarily "packed" but with reduced internal fragmentation). We don't have a concrete proposal at this point and this is something one can manually address for now, so closing.

@seebs
Copy link
Contributor

seebs commented Feb 19, 2019

I think it might make sense to formally state that the offset of each struct field is definitely not less than the offset of any previous struct field, and that struct fields do not contain overlapping bytes. That is obviously the case currently, and I don't think it could be changed without breaking things, it's just never been directly specified.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge LanguageChange NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. Proposal v2 A language change or incompatible library change
Projects
None yet
Development

No branches or pull requests