Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: structs: add HostLayout "directive" type #66408

Open
dr2chase opened this issue Mar 19, 2024 · 27 comments
Open

proposal: structs: add HostLayout "directive" type #66408

dr2chase opened this issue Mar 19, 2024 · 27 comments

Comments

@dr2chase
Copy link
Contributor

dr2chase commented Mar 19, 2024

Proposal Details

Abstract

This proposes a new package for zero-sized types whose presence in a structure’s list of fields would control how the compiler lays out those fields, for the purpose of allowing programmers to indicate which structures are interchanged with the host platform and to request a host-compatible layout for those structures.

Background

While the Go language specifies very little about struct layout, in practice the Go implementation is tightly constrained to follow platform layout and alignment rules because of the few cases where a struct is interchanged with a platform API (and where this is not true it creates the possibility of incompatibility, for example, ppc64le, where the platform alignment for float64 fields is different from Go's default). This forces tradeoffs or potential problems on platforms whose constraints differ from common-case on other platforms (that is, what the Go compiler has adopted as its default) and prevents field reordering optimizations that can save memory and improve garbage collection performance.

Proposal

To address this, we propose a family of zero-sized types for struct fields to signal differences in layout or alignment where that matters. The change in the compiler’s behavior should be invisible to pure Go programs that do not use unsafe or interact with the host platform. The goal of the proposal is that programmers be able to ensure that data exchanged with the host platform have a host-compatible layout, both now, and in the face of future layout optimizations.

Subject to discussion, the proposal is this package and (for now) this one type:

package structlayout 

// Platform, as a field type, signals that the size, alignment,
// and order of fields conform to requirements of the GOOS-GOARCH
// target platform and may not match the Go compiler’s defaults.
type Platform struct {}

After reflecting on the discussion below, I would modify this to:

package structs 

// HostLayout, as a field type, signals that the size, alignment,
// and order of fields conform to requirements of the host
// platform and may not match the Go compiler’s defaults.
type HostLayout struct {}

the rationale for the name change is that structs is one word, and parallels strings, bytes, and slices, and is generic enough to include other (future) tags specifying "nocopy" or alignment. Furthermore, such type-modifying tags only work within structures; the package name strongly hints at this.

Rationale

One platform, WASM, has system interfaces that align 64-bit types to more than register size and another, ppc64le, has the possibility of non-Go interfaces that align some 64-bit types to less than register size, and both of these are contrary to the rules that Go normally follows (on ppc64le, we have handled this problem using luck). Signaling these constraints explicitly will help compatibility with these two platforms, preserve/allow implementation flexibility, perhaps make it easier to write checking tools, and perhaps (once types passed to all non-Go calls are properly tagged) allow the Go compiler to reorder structures to use less memory and save GC time by shuffling pointers as early as possible in untagged structures. This optimization is desirable because it automates something humans currently spend time on and don't always get right, and sometimes forces programmers to make compromises between most-readable code and best performance.

The most important part of this proposal is that unless someone is writing code that interacts with the platform, they do not need to know about this. If they are writing cgo, these signal types will be inserted for them.

The compiler will know the meaning of these types and modify struct alignment and layouts accordingly. It’s not clear to me whether Platform is adequate to capture all the cases of non-Go code, but for the current use cases (platform interfaces across all the various platforms, and cgo -- as far as I know “platform” describes their needs) it appears that it is.

Why signal types versus //go:platform ?

It is a better match for the Go type system if changes in types are expressed in the type system itself. Use of field signal types meets this requirement, since the Go type of a struct depends on the fields of the struct, even if they have zero width.

Why just one platform tag instead of finer control?

In practice, the use case is platform compatibility, and platform is a concept that the compiler can translate to the appropriate ARCH-OS combination without demanding that the user know the details, and those details also might not be portable across platforms even when the C type declarations are the same.

In the future, we could consider adding signal types for CacheAlign, AtomicAlign, or Packed but I would not include those at first because I'm not that sure we need them, we might argue about definitions, and their implementation (for Packed, at least) would be somewhat more costly. A non-layout signal type that might work well is “NoCopy” to indicate to vet that a type should not be copied once it has a non-zero value (this is currently implemented by vet knowing that certain types are “special”).

Related: “proposal: spec: define/imply memory layout of tagged struct fields #10014”. This was a very similar proposal, approaching the problem from a slightly different direction, but did not address the issue of "the platform does not match Go's defaults". The new proposal here is more concrete in “how”, includes tweaking alignments to conform to platform constraints, but does not expect someone using the platform tag to know precisely what rules a particular platform uses.

Related: “proposal: runtime: add AlignedN types that can be used to increase alignment #19057”. This was a proposal for a family of types for specifying specific alignments, perhaps of specific fields. That proposal had additional use cases -- specifying higher alignment for various fields -- but also did not address the problem of reduced platform alignment (e.g., ppc64le float64) and its application to specific platform interfaces would require that programmers know the details of that platform’s layout rules (instead of the Go compiler/runtime knowing those details once).

Related: “proposal: cmd/compile: make 64-bit fields be 64-bit aligned on 32-bit systems, add //go:packed directive on structs #36606”. This proposal took the opposite approach -- 64-bit atomics require 64-bit alignment on 32-bit processors, therefore Go should change its default layout, rather than signaling specific types that needed this alignment. It also included a secondary proposal for “packed” types that had a far more annoying implementation burden (how is the pointer addressed in a “packed” struct {uint8; *int}? How does the GC find this pointer?)

Compatibility

Working old code will continue to work properly.

Implementation

Besides the proposed package and type, cmd/compile/internal/types/size.go will need adjusting to follow the signal types. It already contains special case code for sync/atomic/align64, so this is not outlandish.

Open issues

The names. For example, “structlayout”, versus “typelayout”? If we decide that this is a good place for 0-width signal types, some of them (NoCopy) aren’t about type layout which means whatever-layout isn’t quite right.

@gopherbot gopherbot added this to the Proposal milestone Mar 19, 2024
@mvdan
Copy link
Member

mvdan commented Mar 19, 2024

The names. For example, “structlayout”, versus “typelayout”? If we decide that this is a good place for 0-width signal types, some of them (NoCopy) aren’t about type layout which means whatever-layout isn’t quite right.

How about typetag, e.g. typetag.Platform and typetag.NoCopy?

@dominikh
Copy link
Member

Working old code will continue to work properly.

Will layout changes be guarded behind the module's Go version? I would find that important for the unsafe case.

@randall77
Copy link
Contributor

Will layout changes be guarded behind the module's Go version? I would find that important for the unsafe case.

This proposal is just part 1, adding the structlayout.Platform type and implementing its semantics. That is completely backwards-compatible, if you never see that type everything works as before.

Part 2, actually changing the layout of unadorned structs, is not part of this proposal. Of course, this proposal has less motivation if part 2 never happens.

@nemith
Copy link
Contributor

nemith commented Mar 20, 2024

Being part of the type system is a bit weird. Can i throw it in in a interface{}? What happens when i take a pointer to it and it's set to nil. Can i make it generic with type parameters?

@ianlancetaylor
Copy link
Contributor

@nemith All of those will work fine. This proposal just changes the layout of the fields within a struct. Field layout is already fully described by the reflect.Type value associated with the type. Nothing else cares.

@dr2chase
Copy link
Contributor Author

@nemith Part of the reason for putting it into the type system is so that all the other Go tools understand it, from the point-of-view of type comparison, identity, etc.

@ChrisHines
Copy link
Contributor

If/when additional signal types get added, what are the semantics of including more than one in the same struct? Will it be required that all signal types have orthogonal semantics, will it be a compile time error if they are incompatible? What if they are only incompatible on one platform/OS pair, would there be a vet check to alert someone that doesn't explicitly try that combination to the potential issue?

@dr2chase
Copy link
Contributor Author

@ChrisHines

I see no reason to require orthogonal semantics ("PlatformLayout" overlaps with "DeclaredLayout", which is not yet proposed but I can imagine it) but I do think that incompatible combinations should be diagnosed at compile time. And looking at a plausible interaction with alignment specification (the other/next layout tag I expect to someday see) I can construct plausible examples that would use both.

I think for this particular tag it would be reasonable to require that it precede any other field that has non-zero size.

My goal is to have as few of these tags as possible, motivated by real problems, so hopefully there will not be many combinations that apply, I think it is fine for the compiler to reject any combinations that are problematic. Even one of these tags should be a niche case; two should be niche-squared.

HOWEVER:

This might get messy for special C-hardware-specific types, for example, those used to talk about xmm and ymm registers, that currently have no Go equivalent. There are several approaches to that problem and I am not sure which is best; the existence/names of the very-wide data types is platform-dependent for C compilers, but Go could decide to just generally support 128 and 256-bit integers. Or, we could add per-field type tags for alignment that would precede 128-bit or 256-bit fields. So, something like:

type Ymms struct {
    _ typetag.PlatformLayout
   Inactive bool
   Reg  [32]uint256 // These are aligned to a platform-appropriate boundary
}

or

type YmmPart uint64
type Ymms struct {
    _ typetag.PlatformLayout
   Inactive bool
    _ typetag.Align256 // I hope the right alignment is 32-bytes / 256 bits.
    Reg  [32][4]YmmPart
}

(I added the boolean field just to make it clear that the hypothetical Align256 tag precedes a particular field.)

I prefer the choice where the programmer doesn't need to go read documentation to figure out what the C compiler is doing. On the other hand, after quickly checking what the internet to see what the YMM alignment rules are (and discovering a mess with annoying special cases), I can understand needing to be able to specify a platform order yet also be very picky about the alignment. Because of that, I think that specifying both platform layout and specific (increased) alignment for certain fields should be allowed. Specifying reduced alignment is probably a compile-time error, certainly taking the address of a field with reduced alignment is a compile-time error.

(Why is taking the address of a reduced-alignment field a likely error? The compiler would prefer to use the fast, assume-that-integers-are-aligned, instructions for dereferencing a *int64, and most people want the compiler to do that because it is faster. Taking the address of an unaligned field breaks that assumption.)

I am not sure if this is right for a vet check or not; vet would need to know a lot about different architecture and OS combinations and their C compilers. A different and slightly interesting question is what should happen if a struct tagged with "platform" layout contains a type that is inherently Go-oriented, like slice or map, or one for which Go has its own alignment assumptions. My inclination is to say that right now vet isn't checking any of this; platform interchange types are already a niche, and weird combinations of type tags (that don't exist yet) and/or Go types is a hypothetical niche of that niche.

@dr2chase
Copy link
Contributor Author

@mvdan proposed improved naming, included at the top for anyone coming upon this later and not wanting to slog through comments:

After reflecting on the discussion below, I would modify this to:

package structs 

// PlatformLayout, as a field type, signals that the size, alignment,
// and order of fields conform to requirements of the GOOS-GOARCH
// target platform and may not match the Go compiler’s defaults.
type PlatformLayout struct {}

the rationale for the name change is that structs is one word, and parallels strings, bytes, and slices, and is generic enough to include other (future) tags specifying "nocopy" or alignment. Furthermore, such type-modifying tags only work within structures; the package name strongly hints at this.

@dr2chase dr2chase changed the title proposal: Signal types for controlling struct layout proposal: add signal types for controlling struct layout Mar 28, 2024
@rsc
Copy link
Contributor

rsc commented Apr 3, 2024

Platform is a bit odd since Go is a platform too, and "the GOOS-GOARCH target platform" sounds like Go too.
What about structs.HostLayout, and don't mention GOOS-GOARCH in the docs?
(For the record, the main need for this is to write structs that match Windows and WASM, not Cgo.
Cgo can always do something magical and unexposed.)

@rsc rsc changed the title proposal: add signal types for controlling struct layout proposal: add structs.HostLayout "directive" Apr 3, 2024
@ydnar
Copy link

ydnar commented Apr 4, 2024

Would this also affect the alignment of the struct on the stack?

@dr2chase
Copy link
Contributor Author

dr2chase commented Apr 4, 2024

@ydnar - it depends. On some architectures the stack is not very aligned, and so extra-aligned data is heap-allocated instead. Otherwise, yes, probably.

And I am fine with HostLayout, will edit the top proposal to reflect this.

@rsc rsc changed the title proposal: add structs.HostLayout "directive" proposal: structs: add HostLayout "directive" type Apr 4, 2024
@rsc
Copy link
Contributor

rsc commented Apr 4, 2024

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc
Copy link
Contributor

rsc commented Apr 10, 2024

Have all remaining concerns about this proposal been addressed?

The proposal is to add

package structs
type HostLayout struct{}

that can be added as a field named _ in a struct. This would have no significant effect in most tools, but a compiler could use it as a hint about laying out the struct. Of course there is an effect for type equality, since a struct with one of these fields is different from a struct without, but that’s exactly what we want if the compiler is using it to hint a different layout.

@dr2chase
Copy link
Contributor Author

Would like to additionally require that this directive-typed field must appear before any field with greater-than-zero size, if that's okay? (This should not be requirement for some imagined other directives, e.g., explicit next-field alignment.)

@ianlancetaylor
Copy link
Contributor

What advantage do we get from imposing such a requirement?

@dr2chase
Copy link
Contributor Author

Simplifies the implementation ever-so-slightly (avoids a pre-pass over structure fields), also makes its use more uniform, and I don't see much harm in the restriction (which could be relaxed later if I turn out to be wrong in "not much harm").

@ianlancetaylor
Copy link
Contributor

My take on it is that the advantages we get from the restriction aren't worth the cost of complicating the spec by adding the restriction.

@ydnar
Copy link

ydnar commented Apr 14, 2024

Would the field be zero sized if it's the last _ field in a struct?

@dr2chase
Copy link
Contributor Author

@iant I am okay with that also.

@ydnar _ fields can have width. A zero-width (field) type is either an array of zero elements, a struct of zero fields, or a struct/array built entirely of zero width types.

@gopherbot
Copy link

Change https://go.dev/cl/578355 mentions this issue: cmd/compile: layout changes for wasm32, structs.HostLayout

@ydnar
Copy link

ydnar commented Apr 18, 2024

@ydnar _ fields can have width. A zero-width (field) type is either an array of zero elements, a struct of zero fields, or a struct/array built entirely of zero width types.

I’m referring specifically to 6f07ac2:

cmd/gc: pad structs which end in zero-sized fields

For a non-zero-sized struct with a final zero-sized field,
add a byte to the size (before rounding to alignment).  This
change ensures that taking the address of the zero-sized field
will not incorrectly leak the following object in memory.

reflect.funcLayout also needs this treatment.

Fixes https://github.com/golang/go/issues/9401

@ianlancetaylor
Copy link
Contributor

@ydnar Yes, that would kick in if you choose to put this field last in a struct. I don't see any reason to treat it differently. For all purposes other than it's special purpose, it's just a field.

@gopherbot
Copy link

Change https://go.dev/cl/581316 mentions this issue: cmd/compile: wasm32-specific structs.HostLayout changes

@aclements
Copy link
Member

Would the field be zero sized if it's the last _ field in a struct?

We should probably document that "By convention, this field should be placed first in a struct." That's a good convention to have regardless.

@cherrymui pointed out that, while it's important that an addressable zero-sized field at the end of a struct must have non-zero size, that we could carve out zero-sized _ fields from this rule. Algorithmically, we would first strip zero-sized _ fields at the end of the struct, and then append a byte if the remaining final field is zero-sized. This is an implementation detail, and is something we could decide to implement in the future. It would remove a minor foot-gun.

@zephyrtronium
Copy link
Contributor

zephyrtronium commented Apr 24, 2024

@cherrymui pointed out that, while it's important that an addressable zero-sized field at the end of a struct must have non-zero size, that we could carve out zero-sized _ fields from this rule.

This seems to imply a change to reflection: https://go.dev/play/p/TfYcdEwAACm

@rsc
Copy link
Contributor

rsc commented Apr 24, 2024

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

The proposal is to add

package structs
type HostLayout struct{}

that can be added as a field named _ in a struct. This would have no significant effect in most tools, but a compiler could use it as a hint about laying out the struct. Of course there is an effect for type equality, since a struct with one of these fields is different from a struct without, but that’s exactly what we want if the compiler is using it to hint a different layout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Likely Accept
Development

No branches or pull requests