Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: spec: support for easy packing/unpacking of struct types #64613

Open
griesemer opened this issue Dec 8, 2023 · 25 comments
Open

proposal: spec: support for easy packing/unpacking of struct types #64613

griesemer opened this issue Dec 8, 2023 · 25 comments
Labels
LanguageChange Proposal v2 A language change or incompatible library change
Milestone

Comments

@griesemer
Copy link
Contributor

griesemer commented Dec 8, 2023

Introduction and acknowledgements

This proposal is a restatement of ideas that were all originally proposed in #33080 in 2019 but somehow didn't get more traction despite getting essentially very positive feedback.

Specifically, I propose that we take @urandom 's idea of permitting expressions representing multiple values (such as function calls returning multiple results) to create struct values. But instead of using conversions for that purpose, I suggest that we allow such multi-values in struct literals, as proposed by @ianlancetaylor in the same issue. Furthermore, I suggest that we expand the idea to arrays (but not slices), also mentioned by @ianlancetaylor in that issue.

And I propose that we use @urandom' s and @bradfitz 's suggestion (here and here), and write s... to unpack a struct value s.

In short, I propose we give the combined ideas in #33080 serious consideration. They cover a lot of ground which tuple types would cover, without the need for tuple types. The ideas are clean and simple.

Proposal

  1. The ... may be used to unpack a struct or array value v: v... produces the list of element values of the struct or array value v. This list of values can be used exactly like a multi-valued function result (i.e., the same rules apply).

Example:

type Pair struct{a, b int}
var p Pair
x, y := p... // unpack the elements of the pair value p into a multi-value assigned to x, y
  1. Given a struct or array type T with n elements and an expression e that stands for n values (a multi-value) of matching types, the composite literal T{e} creates the struct or array value with the multiple values as the elements.

Example:

func position() (x, y int)
p := Pair{position()} // the multi-value returned by position() can be used directly in the composite literal
  1. ... is added to the list of tokens that cause an automatic semicolon insertion during lexical analysis. This is needed so that it is possible to have ... at the end of a line without the need to manually write a ;.

This is the entire proposal.

Examples

The ... applied to a struct (and an array, which also has a compile-time fixed size) produces the list of elements and can be used exactly like a function returning multiple values.

s := struct{x int}{2}
x := s...  // same as x := s.x

type S2 struct{x, y int}
s2 := S2{1, 2}
a, b := s2...  // unpacking of s2; a = 1, b = 2; shortcut for a, b := s2.x, s2.y

Given s2 above, and

func f2(a, b int)

we can call f2 like so:

f2(s2...) // same as f2(s2.x, s2.y)

Instead of:

func f2() (x, y int)

a, b := f2()  // temporaries for use with composite literal below
s2 := S2{a, b}

we can write

s2 := S2{f()}

leading to the tautology

s2 == S2{s2...}

In other words, s2... is essentially syntactic sugar for s.x, s.y except that we cannot mix and match with other elements. For instance

type Triple struct{x, y, z int}
_ = T{s2..., 3}  // cannot mix a multi-valued expression with other expressions

because we don't allow similar mixing with multi-valued function results passed to other functions. (Lifting this restriction is an independent discussion and should only be considered after having gained experience with this proposal.)

The compiler always knows when there's a multi-value (a multi-valued function result, or an unpacked tuple or array) and it will simply allow such multi-values in places where that exact number of values is permitted: as arguments to a function, as elements for a struct or array composite literal that expects that exact number of elements of matching types.

This allows us to write something like

type data struct{ a, b int; msg string }

func produce() (a, b int, msg string)
func consume(a, b int, msg string)

// send produced data over a channel
var ch chan data
ch <- data{produce()}

// consume data from a channel
consume(<-ch...)

If one needs comma-ok one would write:

d, ok := <-ch
if ok {
   consume(d...)
}

It also makes it easy to convert from arrays to structs that have the same number of elements and types:

p := Pair(1, 2)
a := [2]int{p...}

or even

a := [...]int{p...}  // this version will work even if the number of elements in p changes

and back

p := Pair{a...}

Discussion

Allowing a multi-value in a struct/array literal seems more natural than in a conversion (as proposed originally): for one, composite literals already accept a list of values, and conversions always work on a single value. Providing a multi-value to a composite literal is similar to passing a multi-value as an argument to a function call.

Using ... to unpack a struct or array value is similar in spirit to the use of ... to unpack a slice for passing in a variadic function call.

The unpack operation ... requires a syntax change. Proposal #64457 explored the idea of ... as unpack operator with a concrete prototype implementation to identify potential lexical and grammatical problems (CL 546079). It turns out that to make unpack operations work nicely (without the need for an explicit semicolon), if the ... token appears at the end of a line, a semicolon must be automatically inserted into the token stream immediately after the ... token. This will break array literals using the [...]E{...} notation if the closing bracket ] appears on a different line than the ...; a situation that is extremely unlikely to occur in actual Go code:

// nobody writes code like this, and gofmt will have fixed it
var array = [...  // this will cause a syntax error because the lexer will introduce a semicolon after the ...
]int{1, 2, 3}

func f(args... int)
var s []int
f(s...,  // here the problem doesn't occur because we need a comma anyway
)

@jba has pointed out a perceived issue with backward compatibility: If we allow e.g. S{f()}, if we add a field to S without changing the signature of f, the code will break. For exported structs the recommendation is to use tagged literals (explicit field names). That said, if multiple function results are used in combination with structs to pack them up, if one of them changes, the other will need to change, too. Tagged struct literals allow more flexibility, but they also invite possible bugs because one doesn't get an error if one misses to set a field. In short, the perceived backward-compatibility is a double-edged sword. It may be that the proposed mechanism works best for non-exported code where one can make all the necessary changes without causing API problems. Or perhaps S{f()} could be permitted if f() produces a prefix of all the values needed by S. But that is a different rule from what is proposed here and should be considered separately.

Implementation-wise, there some work needed to allow ... in expressions, and it may require minimal AST adjustments. Type-checking is straight-forward, and so is code generation (packing and unpacking can be seen as a form of syntactic sugar, there's no new machinery required).

@randall77
Copy link
Contributor

This setup seems pretty inflexible. Suppose you had

type P struct{x, y int}
var p = P{}
func f(a, b int) {}
f(p...)

That all works fine. But then suppose I want to add a context argument to f?

func f(c *context.Context, a, b int) {}

I'd like to call it like

f(c, p...)

but I understand from this proposal that this wouldn't be allowed. You'd have to do

f(c, p.x, p.y)

I guess that's the same problem when starting with f(g()), but we're adding another instance of the same wart.

@robpike
Copy link
Contributor

robpike commented Dec 8, 2023

More than just having new functionality, I like this proposal because it corrects one of two asymmetries in the language that bother me (the other is #45624): there is special support for assembling a (pointer to a) struct, but none to disassemble one. That is, one can write s := S{a, b} but the existing language provides no shorthand for the reverse operation. This proposal provides one.

@jimmyfrasche
Copy link
Member

#63221 proposes an unpack() builtin that does much the same as ... here and there is a great deal of discussion about that.

This proposal would provide a similar mechanism to tuples that fulfill many of the same goals. However, it would still require creating little types whose only purpose is to temporarily package a group of values together and to name them and their fields. Still, this would be easy to add tuple-structs atop if need be.

How does this work with unexported fields from structs defined in another package: does it skip them or can you just not use ... in that situation?

@DeedleFake
Copy link

This feels like a significant downgrade from #63221. It's essentially the exact same as that one but with fewer features, a slightly different syntax, and the added, and nice, ability to pack multiple return values into a non-tuple struct. Of particular note is that this still doesn't solve what I think is most people's main frustration: Needing to create a named type to be able to send multiple things through a channel.

c := make(chan struct{int, string}, 1)
a, b := (<-c)... // Works under this proposal.
c <- struct{int, string}{a, b} // Still just as awkward as it has been.

@dfinkel
Copy link
Contributor

dfinkel commented Dec 8, 2023

How does this work with unexported fields from structs defined in another package: does it skip them or can you just not use ... in that situation?

@jimmyfrasche I'd vote for the second option. (that being a compile error)
Same with packing function return value assignments

@griesemer
Copy link
Contributor Author

@randall77 If we can come up with clean and unambiguous rules for adding a parameter to an f in a call f(g()) where g() returns multiple values, it seems like we can apply the same "fix" to this proposal. This seems an orthogonal problem to the proposal that can be addressed independently.

@griesemer
Copy link
Contributor Author

griesemer commented Dec 8, 2023

@jimmyfrasche As you say yourself, this proposal doesn't preclude a future mechanism that auto-infers a struct type or perhaps even a tuple from a set of values, if that is important. But that's an independent consideration, and adding a tuple type is a more significant change that seems less warranted. This proposal generalizes mechanisms we already have (... to unpack things, and passing multiple values to functions or a composite literal in this case). So this is a much simpler proposition. It can be seen as a completing/rounding out of existing features rather than the introduction of new concepts.

With respect to unpacking a struct with unexported fields: it's a very good question and I don't know what the right answer is. If we strictly follow existing rules, unpacking would be allowed: exporting simply controls access to names, nothing more. Since we don't mention the names, we're ok. It reminds me of #56669 which is unrelated but exposes a similar problem. Alternatively, one could disallow ... in that specific case. Finally, just ignoring unexported fields seems the worst of the possible choices.

@griesemer
Copy link
Contributor Author

@DeedleFake Agreed that #63221 does more, but this proposal doesn't preclude a future mechanism to auto-infer struct types. It seems orthogonal.

@fzipp
Copy link
Contributor

fzipp commented Dec 8, 2023

if we add a field to S without changing the signature of f, the code will break.

The same applies if we change the order of fields, which is already true for unkeyed struct literals as well. If this proposal is adopted, the section in the Go 1 Compatibility Promise should be extended with a note that usage of this feature is not covered.

@earthboundkid
Copy link
Contributor

FWIW, my vote is that unpacking across packages with an unexported field is disallowed. I also think unpacking across packages should be flagged by go vet unless the struct has a magic comment like //go:unpackablefields or something. Maybe a magic struct tag, but then you’d have to repeat it for each element.

I don’t suppose this needs any reflect support, but it might be convenient to have.

Overall, I like this because it aligns Go with other languages like Python, JavaScript, etc that have a destructuring operator.

What about unpacking a slice? Maybe e1, e2 := s… and it will panic if s isn’t exactly len 2?

@earthboundkid
Copy link
Contributor

Re: slice unpacking, since you can already convert a slice to an array, it seems like not having slice unpacking would just make people add an extra hoop to jump through.

@josharian
Copy link
Contributor

josharian commented Dec 8, 2023

it aligns Go with other languages like Python, JavaScript, etc that have a destructuring operator

Note that some destructuring operators in other languages are sensitive to the field names. Consider this (typed on my phone, may be typos):

type S struct {
  n int
  off int
}

// …
s := S{n: 9, off: 5}
n, off := s

If you swap the order of the fields in the type definition of S, you get a confusing bug.

Name-based destructuring isn’t a great fit for Go, because local variables aren’t typically capitalized. And it doesn’t solve the tuple use case.

But position-based destructuring is enough of a footgun that I expect/hope it’d be used sparingly.

@gregwebs
Copy link

gregwebs commented Dec 8, 2023

Position-based is not robust. It's a fundamental problem with position-based tuple access.
For the de-structuring case, this isn't robust to a change in the struct field ordering as @josharian points out. There may be no compiler error.
For the function call case, it isn't robust to adding or removing a struct field or a function parameter. This would produce a compiler error but require backing out the use of this feature.

I think Typescript and Javascript have this right by using name based de-structuring and structuring (it also supports position-based de-structuring and tuples via arrays with multiple types, but I stay away from those features and in code in the wild they are used much less than name-based).

let o = {
  a: "foo",
  b: 12,
  c: "bar",
};
let { a, b } = o;
let { a, ...passthrough } = o;
let total = passthrough.b + passthrough.c.length;

It also supports name-based object structuring

let defaults = { food: "spicy", price: "$$", ambiance: "noisy" };
let search = { food: "rich", ...defaults };
let a = 1
let b = 2
obj = { a, b}
obj.a // 1

If you thought the language needs to be loosy-goosy, that's not the case: Rust supports this as well- both destructuring and structuring.

A lot of the desire for Tuples in Go comes down to it's choices to have non-tupled multi-valued returns and to not having tagged unions or initially generics, which required these returns almost everywhere.

There's room to perhaps allow something along the lines of this proposal, but there are 2 other related features that get used heavily in other languages that I think should be designed first because there is a lot of potential overlapping use cases where one would currently imagine this proposal being used:

  • name-based de-structuring and structuring
  • tagged unions

@gazerro
Copy link
Contributor

gazerro commented Dec 8, 2023

I am generally in favor of this proposal. However, it is not clear to me how this proposal would integrate with the special form of assignment for maps and the special form of receive. For example:

var m map[int]struct{x int, y string}
// ...
a, b, ok := m[2]...
var ch chan struct{x int, y string}
// ...
a, b, ok := <-ch...

If these forms were allowed, changing the struct from struct{x int, y string} to struct{x int, y string, z bool} would still compile, but the behavior would be different.

It might be possible to disallow the use of ... in these special forms, but I think it would be unfortunate. For example, you would have to write:

if d, ok := m[2]; ok {
        a, b = d...
        // do something with a and b
}

instead of

if a, b, ok := m[2]...; ok {
        // do something with a and b
}

@timothy-king
Copy link
Contributor

#64613 (comment)

I would draw the opposite conclusion that unexported fields can be accessed from another package.

a, b := s2... // unpacking of s2; a = 1, b = 2; shortcut for a, b := s2.x, s2.y

This relatively nice view is that unpacking is shorthand for a sequence of selector expressions of the listed fields of a struct. One cannot write s2.x in another package. It would be a compile error. So to be consistent with this means that unpacking a struct p.S in package q is a compile error if p.S has any unexported fields.

Sticking unpacking being shorthand for a sequence of selectors/array accesses, something that pops out is a *p.S could also be unpacked using this principle.

(I am not sure there is a way to deal with embedded structs or interfaces with unpack other than to select the embedded struct/interface. Otherwise there will be ambiguities for the cardinality of the left hand side. So it would not be all valid selectors on p.S, just ones listed on the struct.)

@jimmyfrasche
Copy link
Member

destructuring may be worth exploring but it wouldn't help with f(s...) or S{g()...}.

It would allow the comma-ok forms to work on one line instead of two without any ambiguity over where the ok came from:

var c chan [2]T
// ...
[x, y], ok := <-c

@jimmyfrasche
Copy link
Member

With respect to unpacking a struct with unexported fields: it's a very good question and I don't know what the right answer is. If we strictly follow existing rules, unpacking would be allowed: exporting simply controls access to names, nothing more. Since we don't mention the names, we're ok. It reminds me of #56669 which is unrelated but exposes a similar problem. Alternatively, one could disallow ... in that specific case. Finally, just ignoring unexported fields seems the worst of the possible choices.

Unpacking unexported fields doesn't seem ideal. That would make it unsafe, but possible, to use with any struct you don't have control over. You don't want a patch update in a third part dependency breaking the build because it made an internal change.

One way to go would be to make it the opposite of an unkeyed literal. If so, then it would make since that s... would be illegal if its type cannot be written as an unkeyed literal in the current package. For the most part that seems like it would work out well. The annoyance would be that for structs defined in the same package it would unpack _ fields which are always zero. (I don't suppose it's possible to make it illegal to use an unkeyed literal with _ fields at this point?)

Another option would be to treat it as a macro that is equivalent to writing out all the fields by hand and thus only including the ones that the user could otherwise write. In the same package, all non-_ fields are expanded; in exterior packages, all exported fields are expanded.

A hybrid option would be to never unpack _ fields and disallow structs from another package that contain non-exported fields to be expanded. That seems like the best thing from a usability point of view even if it's not really consistent.

@myaaaaaaaaa
Copy link

GLSL has a feature called swizzling, which essentially gives the programmer control over which fields to unpack. Perhaps this can be adapted for Go?

 type Pair struct{a, b int}
 var p Pair
-x, y := p...
+x, y := p.{a,b}
+v, u := p.{b,a} // reverse order
 
+i, j, k, l := p.{a,a,b,b}
+i == j
+k == l
 
 type S2 struct{x, y int}
 s2 := S2{1, 2}
-s2 == S2{s2...}
+s2 == S2{s2.{x,y}}
 func f2(a, b int)
-f2(s2...)
+f2(s2.{x,y})

@DeedleFake
Copy link

@myaaaaaaaaa

I don't see how that's particularly better than just doing

x, y := p.a, p.b
v, u := p.b, p.a // reverse order

i, j, k, l := p.a, p.a, p.b, p.b
i == j
k == l

s2 == S2{s2.x, s2.y}

f2(s2.x, s2.y)

It saves a very minor number of keystrokes in return for very little in these examples, in my opinion.

@griesemer
Copy link
Contributor Author

@gazerro If you need the comma-ok form, you need to do the unpacking in two steps:

var m map[int]struct{x int, y string}
v, ok := m[2]
a, b := v...

This is explained in the example section of the proposal.

@griesemer
Copy link
Contributor Author

The argument against unpacking (not robust, what to do about unexported fields, embedded fields) seem pretty convincing: these are valid technical arguments.

Since s... (as nice as it may read) would be just syntactic sugar for s.a, s.b, s.c (or whatever the field names of s are, we probably should avoid that footgun in favor of being explicit. We can already write what s... achieves, it's just a bit longer. Notably, there's no need to introduce temporaries, etc (at least as long as s is not some more complex expression).

This leaves the original idea of #33080: a mechanism to pack a multi-value into a struct (i.e., only 2) of this proposal):

Given a struct or array type T with n elements and an expression e that stands for n values (a multi-value) of matching types, the composite literal T{e} creates the struct or array value with the multiple values as the elements.

This would still allow us to easily pack a multi-value returned by a function into a struct (or an array) without the need to go through temporary variables. Do people see any problems with that?

@atdiar
Copy link

atdiar commented Dec 12, 2023

If unpacking simply corresponds to what is being done manually, doesn't seem to me that it should be too problematic?

  • unexported fields wouldn't be unpacked since when inaccessible
  • embedded fields would be unpacked in variables as normal, these are just some kind of unnamed fields in a sense

That means that packing and unpacking won't necessarily be symmetric for every struct type but I think that would be ok.
It is for a certain kind of structs. (those exhibiting tuple-like qualities, so to speak)

The advantage is that, although perhaps ugly in user-land, one could address the issue mentionned by @randall77 at the top:

f(pack(c, p...)...)

Just by composition. Where pack represents the operator that packs into a struct.

Then again, if that's something that can be implemented without that operator, not only will it be simpler but also more legible. Perhaps it will still help the underlying implementation. So just a note, just in case.

Edit:
These tuple like structs would only be unpackable within the package the unpacking appears in. (sic!) so it's not too worrying. No asymmetry between packing and unpacking within a package, even in presence of unexported fields.

@ianlancetaylor ianlancetaylor added LanguageChange v2 A language change or incompatible library change labels Dec 14, 2023
@dolmen
Copy link
Contributor

dolmen commented Dec 20, 2023

If unpacking is desired, an unpacking function (or method) can still be written to return the corresponding multi-value. This is more verbose, but it is more future-proof and clearly documented.

type S struct {
    a, b int
}

func (s *S) AsList (a, b int) {
    return s.a, s.b
}

The explicit function also makes explicit how the unpacking works with a pointer to struct vs directly a struct.

Notes:

  • instead of unpacking I would instead call that destructuring or transforming a named container into a positional container.
  • the use cases for destructuring are quite rare. Types named Pair or XY are obvious, but just trivial, and not that common. If you have more than 3 value in the struct, explicit destructuring is safer (protects against future change in stuct members order change).

The major issue with destructuring is that the Go lacks support for using multi-value in various place. Multi-values basically only work on the right side of an assignment. They also work as arguments to another function, but I never found that useful as I usually have to handle an error.

I have written such destructuring methods a few times when using database/sql because the sql.Stmt.QueryContext and sql.Stmt.ExecContext methods take positional list of values. But usually I could not pass directly the output of the destructuring method because I had to build a bigger list of values with values from other sources. I had to write an SQLArgs []any type with a func (*SQLArgs) Append(...any), and my destructuring methods were returning []any instead of having the raw values in the signature (because we can't pass a multi-value to a variadic argument).

@atdiar
Copy link

atdiar commented Dec 21, 2023

@dolmen Not bad. I think the two can/should coexist.
That changing a struct definition by adding a field ends up in compile failure is not necessarily a bad thing. It can also point out the locations where the implementation should be adjusted. Instead of silently compiling without accounting for the change.

In fact, and alternatively, perhaps that only packed structs should be depackable. (perhaps... Need to think about it).
That might make those kind of anonymous structs closer to tuples.
That could also avoid having to find a general methodology for exported vs unexported fields since depacking would just concern these special structs. And avoid proliferation.

Perhaps that it gets closer to what @jimmyfrasche was proposing?

I seem to remember that one previous proposal wanted these values to be immutable. That could kind of be the case if the fields are unexported.

Many ways to skin the cat probably.

The point is that when dealing with generics*, if some way is found to describe such constructed types in terms of type parameters at some point, we can represent generic functions by having them take those packed type parameters in argument or return positions and also have the ability to depack values of such generic types.

In other words, creating a true equivalence between multiple return values and these depackable structs.

(*) Will probably require composablity so that for a type parameter T defined such that[U (...any), T (U, error)], T is not the representation of a struct that embeds itself a struct type and an error type. It should be considered as destructured, equivalent to (U..., error)
Sometimes, we just want to make sure that a function returns an error value.
We are not there at all yet, just thinking out loud. Maybe that can inspire something.

@jimmyfrasche
Copy link
Member

Reviewing this thread I don't see a mention of when you're allowed to do this. I would assume that you would only be allowed to write pkg.S{f()} if you could write an unkeyed composite literal for pkg.S.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LanguageChange Proposal v2 A language change or incompatible library change
Projects
None yet
Development

No branches or pull requests