Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: Go 2: syntax to express destructuring and structuring assignments #48499

Closed
sammy-hughes opened this issue Sep 20, 2021 · 32 comments
Closed
Labels
Milestone

Comments

@sammy-hughes
Copy link

sammy-hughes commented Sep 20, 2021

Go is very intent on not implicitly casting types. This makes the expression of a destructuring assignment difficult to imagine in Go. The following represents a proposal to express both destructuring assignment (x,y := ZtoXandY(Z)) and aggregating assignment (x := XfromYandZ(y, z)), across arbitrary but assignable constituent types, with a syntax that has no compatibility risk, and which feels idiomatic to Go.

Of note, the following focuses on the key point of interest for myself, the chief opportunity for distinction of such functionality. Convenience and ergonomics implications abound for the API discussed here, but I largely consider them secondary to a new facility for performance-sensitive computing in Data Science, Simulation, Modeling, Cryptography, and Graphics libraries, allowing compiler-checked, type-safe expression of optimizations previously requiring liberal use of unsafe to vectorize inputs, and exposing such optimizations to implementors unwilling to compromise on the guarantees provided by Go's static typing and commitment to perpetual compatibility.

Go2 Language Change Template

  • Would you consider yourself a novice, intermediate, or experienced Go programmer?
    • Intermediate, with 2 years experience with Go as a professional, but many an all-nighter as a hobbyist.
      -What other languages do you have experience with?
    • C/++, Rust, Node/Javascript, various dialects of SQL, Python , Basic, and PHP.
  • Would this change make Go easier or harder to learn, and why?
    • It would likely be frustrating for new learners unfamiliar with the memory model.
  • Has this idea, or one like it, been proposed before?
    • I'm aware of one proposal on Go-nuts that deals with multiple-assignment, and it's from 2012
  • If so, how does this proposal differ?
    • This proposal prioritizes compile-time safety guarantees, suggesting a syntax that can describe otherwise unsafe operations.
  • Who does this proposal help, and why?
  • What is the proposed change?:
    • Syntax change to define type conversions that normally require unssafe
  • Please describe as precisely as possible the change to the language.
    • I go into depth below, but the chief goal is a capacity to assert that a given slice of memory referenced by an instance or collection of known types can be used to back a field-compatible known type, much like as slice on a struct.
  • What would change in the language spec?
    • It would not meaningfully change the language spec. There are no overlapping co-valid constructs.
  • Please also describe the change informally, as in a class teaching Go.
    • Just like a slice is a very specific section of an array, this is way to take a slice out of the middle of a struct. Because struct is already defined in memory, this can be pretty close to a free operation. This would be weird for the garbage collector to handle, since neither of these items are pointers, but it would probably have to be treated almost just like a kind of pointer. You might think this is a weird thing to want, but there are lots of situations where you need to take a very specific action on a bunch of items, and you only need two or three fields out of the struct, and the rest is kind of just a waste. If you already have it in memory, you can make your calculations much faster if you can create a much smaller struct, with exactly the fields you want from the big struct, but right in the middle of the big struct. If you're processing a batch of 50,000 or 100,000 pieces of data, if you can make a struct like that, then all you need is a way to tell the Go compiler what part of the big struct you want and where you want it from. Then Go can see if you should be allowed to do that, like if you're telling the truth about what type of data the fields are, Go can know ahead of time if you're doing something wrong or if one of your coworkers changed part of the program, and it can warn you before you try to run it.

Already, you can kind of do this in a few different ways, but you usually have to make a choice about whether you want your program to be fast, or if you want it be safe. With this syntax, or really any syntax that lets you talk to the compiler like that, means you can work with the compiler, and choosing to be safe doesn't have to also be choosing to be slow.

  • Is this change backward compatible?
    Breaking the Go 1 compatibility guarantee is a large cost and requires a large benefit.
    • This feature appears similar to inline constraint type unions, but is in no way co-valid with that construct. Additionally, it resembles an interface type assertion. Again, there are no conditions under which there is co-validity.
  • Show example code before and after the change.
/////////////Before
super := SuperType{}
subber := (*SubType)(unsafe.add(unsafe.Pointer(&super), unsafe.OffsetOf(&super.X)))
////////////After
super := SuperType{}
_, _, _ subber := super.{int, string, *int, SubType}
  • What is the cost of this proposal? (Every language change has a cost).
    • There is currently no need for the garbage collector to interact with a concept of co-ownership. The closest thing to it, slices, are just pointers. While all code prior would see no impact, I'd expect that this would require an amount of retooling for the GC that I'm unclear on.
  • How many tools (such as vet, gopls, gofmt, goimports, etc.) would be affected?
    • It messes with the memory model, so Go vet is chiefly affected. Minor changes involved to gopls and gofmt, and obviously it would need to be implemented in the tokenization phase of the compiler.
  • compile time cost?
    • I hope it has a compile-time cost. It may necessitate more robust memory tracing to determine whether a structural assertion is valid.
  • What is the run time cost?
    • None, aside from any impact to the GC. This proposal is intended such that any runtime cost or uncertainty is strictly avoided.
  • Can you describe a possible implementation?
    • Yes, emphatically. See below. I also include a very naive simile using raw pointers, linked to in a Go Playground snippet (That's still a bit broken)
  • Do you have a prototype? (This is not required.)
    • See above...and way below. Yes.
  • How would the language spec change?
    • Again, this does not change the language. This proposal was very much conceived around Go as it is today.
  • Orthogonality: how does this change interact or overlap with existing features?
    • Where interfaces and generics provide polymorphism. This is a proposal for reductive monomorphism, aimed at accelerating high-performance computational workloads.
  • Is the goal of this change a performance improvement?
    • It most definitely is.
  • If so, what quantifiable improvement should we expect?
    • ETL operations on a data model I work with frequently use 3-5 out of 44 fields in any given stage. Several associated types provide alternate forms for several sections of fields, with such a feature, base allocation could be used as a pseudo-stack, hypothetically reducing allocations, In several stages one or more of the associated types is processed by similar machinery to the primary types. Significant chunks of time are spent just spinning through the several slices, converting to a common type only to later apply them as a write back onto the list, before streaming back to the database. Much of this dead time could be avoided by using pre-existing allocations, and taking addresses for pointers-to-struct for the common types, as slices of those base types. I expect that the loop time will still need to be spent, and a slice of pointers-to-struct will still have to be allocated, but everything after that, including the copy back onto the parent, can be avoided.
  • How would we measure it?
    • I expect it to be readily apparent in profiling, and while I haven't been able to get approval for use of unsafe in live code, I can basically eliminate allocations by using an offset iterator to traverse the base lists. The described syntax could accomplish that iteration with no peripheral assignments, for sure, assuming something close to it was eventually accepted.
  • Does this affect error handling?
    • It will definitely create both errors and frustrated folks, I expect, but no impact on error handling or error subsystem.
  • If so, how does this differ from previous error handling proposals?
    • N/A
  • Is this about generics?
    • Kind of? Not really. Generics being parametric polymorphism, this would be reductive monomorphism, with a big ol' asterisk.
  • If so, how does this relate to the accepted design and other generics proposals?
    • One could easily make the case that it's a form of generics, and I was hoping that this functionality could fit into generics, as that's how it's grouped in a few other languages, but this constitutes essentially the opposite effect of generics. Generics and interfaces provide a means by which functions can be generalized across diverse entities, where this is a functionality whereby the portion of a variety of objects can be reduced to only that bit they all have in common, or whatever fields any particular number have in common.

That "big ol' asterisk" is that while there is the above orthogonality, that's a happy side-effect of the capability to describe a new way that the compiler should understand a given area of memory, preserving all primitive types as distinct and axiomatic, ensuring that I don't accidentally slice into and mutate the GC flags.

API of Interest

When implementing network or library API's to fit a contract delivered by an external party, such as REST API or a linked library, or in some other way constrained by surfaces that require some level of flexibility, a codebase can accumulate a multitude of nearly-identical or close-parallel structured types that are not convertible, but which each have one or more subset of fields which can be treated as a shared primitive for whatever computation they are communally destined for. Performance-optimized applications will often leverage the unsafe package to iterate over such shared subsets of fields, as a way to avoid excess copies.

Currently, Go supports several means of expressing generic logic, most recently to include the generics feature. Most of these means are unsuitable to performance-sensitive applications. It was expected that Generics would satisfy this need, and it very well may with significant development of the constraints functionality, but at present does not meet the needs particular to high-performance computing, across package boundaries, on specific subsets of fields, preferably using direct interactions with private fields from statically dispatched methods on monomorphized sub-types, without overhead from transform functions or intermediate assignments. (Ahem, yeah, I uhm, yeah, I'm talking about "traits" in go. Sorry if my claw was visible. I think it's a decent proposal, so please keep reading?).

Part of the proposal will be to outline why the existing means of establishing Super-type/Sub-type relationships are not sufficient for the domain, as well as showing how the proposed syntax would succeed where they failed. It is important that it be understood that this proposal is suggested as a means that involves zero difference in post-compilation assembly, and is a means of arriving at compiled programs without giving up the compile-time guarantees, memory-safety, and simplicity that are Go's hallmark.

A description of the details of the proposed API will follow that discussion, accompanied by a discussion of some of the difficulties I recognize, as well as a defense of the place such an API would have in the existing Go ecosystem.

Status Quo

Currently, the most approximate functionality to that of interest consists of the following solution, which requires sacrificing compatibility and safety guarantees. Description of comparable alternatives and how they fall short will follow:
-Define a static, monomorphic function for each Super-type/Sub-type relationship which receives an instance of type pointer to Super-type which can be described at an offset by Sub-type. The function body will cast to unsafe pointer, add the offset of the first field shared between Super-type and Sub-type, cast to pointer to Sub-type, returning the address as a pointer to Sub-Type. Example:

type SuperType struct {
    Name string
    W, H, X, Y float64
}
type SubType struct {
    X, Y float64
}
func SuperInteriorAsSubType(super *SuperType) *SubType {
    return (*SubType)(unsafe.add(unsafe.Pointer(super), unsafe.OffsetOf(&super.X)))
}

With no further allocation than is needed for the pointer, assuming it is valid at the given address with offset, all mutations to the point are necessarily mutations of the parent instance. Function calls are monomorphic and statically dispatched on a concrete type, against a receiver allocated at the same locality as the parent instance, under conditions that otherwise would offer no possible mechanism of allocation on the stack.

Even should the SubType be established at a late phase in development, or as maintenance/improvements to a deployed application, such functionality could apply to existing types without requiring refactoring of any kind. Ideally, cross-package SubType/SuperType relationships would be permissible, though constraint on such would make such an API moderately less of a novel capability.

There is no directly equal operation that can be performed using existing constructs and syntax. There are several API's that are close. The most obvious candidate being type embedding. Very similar is shadowing by a pointer-composed type, but statically type-safe transformation functions, Interface abstraction, and Generic functions are also candidate.

Refactored types: chiefly and most obviously, the type composition functionality in Go is the usual way to go. Less obvious, and in large part accomplishing the essence of the functionality of interest, implementing the SubType as a struct of pointers to types of number and order of the fields that compose the SubType/SuperType relationship. Examples:

type SubTypeComposition struct {
    X, Y float64
}
type SubTypePointerShadowing struct {
    X, Y *float64
}
type SuperType struct {
    Name string
    W, H float64
    SubTypeComposition
}
func Example(super *SuperType) *SubTypeComposition, *SubTypePointerShadowing {
    return &super.SubTypeComposition, 
        &SubTypePointerShadowing{&super.SubTypeComposition.X, &super.SubTypeComposition.Y}
}

The chief effective shortcoming of either option is that no method of a SubType can interact with a value of its parent, and any fields that the parent interacts with must be part of the exported API. In this context it makes sense that they are exported fields, but in many cases that will not be the case. Further, The API of interest can provide SubTypes of varying extent as to overlapping fieldset, e.g. hypothetical structs as type textBox, type layoutContainer, type drawableArea, all suggest that they might have been defined with width and height properties, being SuperTypes of both a type Rectangle and a type Point. Supposing that naming-suggested relationship to be true, the API of interest can support interaction as both possible SubTypes, while embedding in this contrived example presents mostly ergonomic issues, such as accomplishing the transform for a list of SuperType, and will likely be similar to the API of interest as compiled program. When used carefully, shadowing at a field-level will be most similar to the API of interest.

Interfaces are to be wholly rejected as a possible solution for a performance-sensitive application, seeking to vectorize operands. Interfaces necessarily involve a dynamically dispatched call for any methods, and preclude use of fields on elements of the interface type. For purpose of applications that weigh development iteration time at parallel importance to performance considerations, interfaces are absolutely a solid option, being designed to simplify development systems that have the mild variations with clear familial structure as I describe, excepting that the performance implications are mildly appalling in context of an ETL pipeline or a render job. The API of interest provides means by which data can be subdivided post-hoc, with existing machinery, with admittedly idealistic expectations of minimal performance overhead.

Finally, and the cause of this proposal appearing only now, there was a long-held expectation that the generics system would accomplish the API of interest through a parallel mechanism. In any response, I ask that attention be paid here, as I would love a discussion of how Generics might possibly cover the parallelism and performance concerns motivating this proposal. That said, presently generics has several limitations by necessity of mechanism, that leaves this proposed API a salient opportunity, whether it's believed worth developing.

As I understand it, generics will allow for most method calls to be made statically, most of the time. I'm unclear on whether vtable lookups will ever be necessary, solving one of the enduring shortfalls of interfaces for high-performance applications. Definitions of constraints as an interface suggests that exporting is an expected capability, and excepting difficulties around generically interacting with fields, many potential applications for the API of interest are covered.

The shortcomings I expect are underlying type mixing, possibly extending to generic method signatures, and with non-trivial operations to satisfy the constraining interface.

The issue of mixing underlying types that implement the generic constraint limits the level of parallelism possible, and suggests that cofactors in a signature would be best implemented as essentially a repeating definition of the type symbol and identical constraints for each parameter. I'm unclear whether this it will be possible to coerce distinct underlying types satisfying a given constraint into a monomorphic structure, excepting by explicit allocation and population of values into a type intended to serve as a monomorphic SubType. Ultimately, this would serve to eliminate generics as a possible candidate for operations which expect to use a given SubType as if it were monomorphic, across instances of diverse SuperTypes. In the worst case, generics could serve as a loading dock for the vectorization of inputs, using strategies entirely available pre-1.17, just with less boilerplate.

Additionally, if it is necessary to arbitrate all operations, trivial and complex, with method calls, even assuming no other issue, this suggests that there will be little opportunity for manual inlining, and a performance conscious interface to use as constrain would specify simple accessors and setters on all relevant attributes, to be implemented by each of the various types. The possibility that these method calls are statically dispatched as methods of concrete types is a very important benefit, and I have been waiting for roughly that for almost 2 years, and it does suggest that it can be possible that most or all of those simple accessor and setter calls can become inlined. Nonetheless, it does leave the API presented to an implementor roughly where it was yesterday.

Proposed API

Ironically, the syntax I would propose is very similar to that of the constraint API, but A, in a non-overlapping syntactical context, and B, with important peculiarities as to make the two forms quite distinct. Further, should the functionality be found to be as much an improvement on the capabilities offered by Go to those in fields similar to mine, let not details of how such a capability is expressed stand in the way.

As a clarifying comment, the above represents taking a reference to a slice of fields internal to different type, that slice being compatible between SuperType and SubType, facilitating queuing into a monomorphic type which can be transformed by functions expecting exactly and only that type, such transformations being propagated by reference to memory that is held by a variable preexisting the current clause and outliving that clause, with the SubType as a slice on that memory. Currently, there is not strong guarantee that memory leaks, unexpected frees, and other corruptions will not occur in connection with my naive implementation using raw pointers. The essence of this proposal is that at some level, this ability to use a structured type as a slice on another structured type, having a region which is entirely assignable from the perspective of the SubType, and which is for the SuperType, for the size of the SubType from the designated offset, assignable.

For an identifier or call expression x, and for N type expressions, the following statement can be used to reinterpret the memory representation of the value of x in x.{T, ...N}, provided that the type expressions satisfy the assignability standard of the underlying types, as compared by whatever types would lie at the offset indicated by the sequence of type expressions.

for x as a call expression, the following rules will describe structural assignment assertions applied to values supplied as arguments, being under requirement of assignability to the parameters composing the input signature. Additionally, values supplied to a return clause must be assignable to the return signature, and structural assignment assertions applied to the output of a function are under requirement of assignability from the types composing the function signature.

for x as an identifier, the syntax rules apply to either side of the assignment operator independently, and structural assignment assertions must be compatible with themselves, each field composing the types on the lefthand side must be assignable to the declared variables, if any, to the left of the assignment operator, and variables on the right side of the operator must first be evaluated for assignability with any structural assignment assertions, and then evaluated for assignability to left-hand structural assignment assertions, if supplied, and finally with any predeclared variables on the left-hand side.

The following describes the constraint between a set of values and any structural assignment assertions supplied. These rules are being proposed as applicable to the following situations:

  • the set of values to the left of an assignment operator
  • the set of values to the right of an assignment operator
  • the set of values being sent on a channel
  • the set of values being received on a channel
  • the set of values being supplied to a function parameter
  • the set of values being supplied to a return clause
  • the set of values being provided by a call expression as an assignment, including when variadic.
  • the set of values being supplied as the unkeyed values for a compound type.

where the set of values is composed of a single primitive value, the type list may only include one type expression which is one of:

  • A, a primitive type which is a type that is assignable by the type that is returned by x
  • B, a single-field struct, having a field that is either a simple value of a type that satisfies A or C
  • C, a length-1 array of a type satisfies A or B

where the set of values is composed of no values or many values:

  • a list of type expressions must be supplied which is equal in length to the number of out values supplied, excepting a variable-output expression which supports a syntax that supplies a number of values matching the length of the type expressions
  • each type expression must specify a type which is a type that is assignable by the type of value returned by x, at the position in the list of returned values that matches at type expression ordinally, in the type expression list.
  • returned values may be collected by a single or by multiple compound types, though a single value cannot be split across different primitives or different compound types.

where the set of values is composed of one or more compound types:

  • as a default, the rules for primitives do apply, e.g. that the type expression list must correlate, and ultimately must be compatible as assignment from the type of the returned value. Generally this constitutes the right-hand component of a direct assignment.
  • returned values may be subdivided as multiple distinct type expressions, provided each specified type expression is compatible under assignability standards with the memory offsets in the values returned by x, based on the cumulative offset of the type expression in the list of type expressions. This describes the right-hand form of a destructuring assignment.
  • returned values may be collected into fewer compound type expressions, provided that each value that would be collected into the compound type would be assignable to the corresponding memory offset in the collecting compound type, based on the cumulative offset of the type expression in the list of type expressions. This describes a right-hand form of a composing assignment
  • mixing of type expressions in the same list of type expressions, variously constituting direct, destructuring, and composing assignments should be permitted.

The following represents a list of cases conceived as being incompatible with this scheme:

  • interfaces having only an internal pointer to value and type information, it is unclear what a structural assignment assertion attached to an interface would mean, and it would be confusingly similar to present type assertion syntax.
  • functions cannot conceivably be deconstructed or constructed for storage, and if such an expression were to apply, it must have to do with the value of the function, being the stream of instructions at the referenced address. As author of this proposal, I am intrigued by the notion of instruction-level composition and decomposition of functions in Go, but such a feature lacks any orthogonality whatsoever, and I am unaware of even any languages implementing such an abstraction.
  • maps don't fit into a clean division of values, having a pseudo-value as key. I have no notion of what a structural assignment assertion would look like, let alone suggestions about how to implement one. Further, given that they can be resized it's possible that an implementation allowing such access on a map would either impose performance limitations on the map type, or else that unpredictable behavior would be unavoidable.
  • assignment assertions do not seem applicable to channels, much as with functions. While it's possible some kind of syntax might be useful for saving a channel or function with memoized structural assertion for one or both directions, it is a feature without orthogonality in Go.
  • slices represent a structured type under the hood, but could also be expressed as a pointer to an array of current size-value, except much like with a map, it is difficult to imagine an implementation that upholds the capabilities of the type, while not causing unpredictable memory states for SubTypes.

Aside from subdivision (destructuring) and supervision (composing) of the cumulative allocations representing the left-hand and right-hand components of an assignment, being new to the proposed API, individual primitive values should not be permitted to be subdivided, and normal constraints around implicit conversions should also apply, such that when a type-cast occurs as part of a destructuring or composing assignment, it should be explicit in a way that is not covered by this proposal. No such facility is suggested here, and any statements that are suggested in this proposal should be considered a typo where a field or element appears to be typecast beyond what is positively suggested by this proposal.

Further, I do not address two edge-cases, being when non-exported fields are covered by such an assignment, and when the left-hand component of an assignment performs a destructuring of more than one structured value, into a right-hand value being composed from now separate values, having values from more than one destructuring expansion. I do not observe a hardware limitation or need to enforce this, other than that, A, in my naive implementation using raw pointers, I lack convenient facility to separate pointers to type information from the data stream, and B, enforcing only one cardinality of expansion, destructuring or composing, per compound type, per side of the assignment, the rules for parsing such expansions is simplified. Further, I am unclear as to whether the proposed limit on expansion cardinality should apply to a compound field in a struct, or if all fields in a compound type should be considered to be. The target constraint is that at no point a given slice of memory be ambiguously held in part, and if it is, that such a case is in the interest of the Golang team to support.

The proposal suggests that following expressions be valid:

zero_th, one_th, two_th := [3]int{10,20,30}.{int, int, int}
zero_th, one_th, two_th := struct{One,Two,Three int}{10,20,30}.{int, int, int}
converted := [3]int{10,20,30}.{struct{One,Two,Three int}}
converted := struct{One,Two,Three int}{10,20,30}.{[3]int}
partial, two_th := [3]int{10,20,30}.{[2]int, int}
partial, two_th := struct{One,Two,Three int}.{struct{One,Two int}, int}
partial, _ := [3]int{10,20,30}.{[2]int, int}
 _, two_th := struct{One,Two,Three int}.{struct{One,Two int}, int}

The following represents statements that should not be valid:

var zero_th, one_th, two_th uint = [3]int{10,20,30}
var zero_th, one_th, two_th *int = [3]int{10,20,30}

If that is dealt with by providing an explicit cast with memory compatible types, the following should be legal:

var zero_th, one_th, two_th uint =  [3]int{10,20,30}.{[3]uint}
var zero_th, one_th, two_th uint =  [3]int{10,20,30}.{uint, uint, uint}

Because the class of statement represents a reinterpretation of that memory, constrained by type-memory assignability (not convertability), the following also should be legal:

var zero_th, one_th, two_th *int; 
*zero_th, *one_th, *two_th = [3]int{10,20,30}

A point I expect to be controversial would be the following related statement:

var r *struct{X,Y int}; 
*r  = struct{X,Y,Z int}.{struct{X,Y int}, int}

The following is a two clause statement that I suggest should be valid, constituting the assignment to an dereferenced pointer to a type which has assignability from the right-hand anonymous struct.

var r *struct{X,Y int}; 
*r  = struct{x,y,z int}.{struct{X,Y int}, int}

An important class of statements that I believe would be immensely convenient, but which I expect to be controversial is in a situation as follows. Given the following definitions:

type Point struct{X, Y float64}
Box struct{X,Y,W,H float64}
Circle struct{XY,R float64}
func Move(p *Point)...
boxes := []*Box{&Box{1.0, 1.0, 1.0, 1.0}, &Box{0.0, 1.0, 1.0, 1.0}}
circles := []*Circle{&Circle{1.0,1.0, 1.0}, &Circle{0.0,1.0, 1.0}}

Essentially, this is the entire point of this proposal, for me at least:

var p *Point
var box Box
for _, box = range boxes {
    *p, _ = box.{Point, Point}
    Move(p)
}

var p *Point 
var circle Circle
for _, circle = range circles { 
    *p, _ = circle.{Point, float64} 
    Move(p)
}

This change is entirely backwards compatible. This involves a symbol combination (".{") which is nowhere else valid, except in constraints, and there are no cases where the two forms would be co-valid. Per suggestion of example code before and after the change, in the proposal section, I gave a number of examples. In most cases, the change simply provides improved expressiveness, but it also introduces distinctly new capabilities, as in the case of taking a reference to a struct representing a subset of the fields from instances of one or more other types, and acting on that struct reference as a proxy for the described. This does resemble interaction as with interfaces, but promises to be considerably more svelte in code-weight.

Tooling impact of this change isn't something I've spent time considering, but I am confident that it represents impact to vet and to gofmt. Otherwise, I don't expect much change. It is a focused change, using existing symbols, it still represents a proposal that mucks with the memory model. As such, go vet would need to be enhanced to validate a new kind of expression, and gofmt would need to be enhanced to properly adjust expressions bearing this new syntax. This proposal was designed with symbol-resolution as an important factor, and was actually inspired by going over instruction representation of compiled code. As such, I'm fairly confident that the changes would be unobservable past IR resolution, and the runtime cost would be non-existent. I'm intentionally not arguing for performance benefits, but I'm much more inclined to expect improved performance from code using the syntax proposed, over interfaces or raw pointers.

I prepared a rudimentary example using raw pointers, to demonstrate the concept in a few of my suggested cases. I noted the possible issue of resolving unexported fields, and I am taking no steps to resolve that. I'm also not taking steps validate convertibility, and I am intending this purely for demonstration. It A, requires compiler flags to use generics, and B, is the buggy product of some early-AM hacking. Again, it serves as a demonstration of the kind of operation that I hope will be made safe and checkable at compile time.

https://play.golang.org/p/mDp04vPCRvt

@gopherbot gopherbot added this to the Proposal milestone Sep 20, 2021
@ianlancetaylor ianlancetaylor changed the title Proposal: syntax to express destructuring and structuring assignments using existing memory model proposal: Go 2: syntax to express destructuring and structuring assignments Sep 20, 2021
@ianlancetaylor ianlancetaylor added v2 A language change or incompatible library change LanguageChange labels Sep 20, 2021
@ianlancetaylor
Copy link
Contributor

For language change proposals, please fill out the template at https://go.googlesource.com/proposal/+/refs/heads/master/go2-language-changes.md .

When you are done, please reply to the issue with @gopherbot please remove label WaitingForInfo.

Thanks!

@ianlancetaylor ianlancetaylor added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Sep 20, 2021
@beoran
Copy link

beoran commented Sep 20, 2021

I don't quite see the point of this proposal because it seems to be you can already do what you want in Go, with a few supporting methods. Here is your "entire point" example as it works today:

https://play.golang.org/p/GaaBp-vMUU1

package main

import (
	"fmt"
)

type Point struct{ X, Y float64 }

func (p Point) String() string {
	return fmt.Sprintf("Point{%f,%f}", p.X, p.Y)
}

func PointAt(X, Y float64) Point {
	return Point{X, Y}
}

type Box struct {
	Point
	W, H float64
}

func BoxAt(X, Y, W, H float64) *Box {
	return &Box{PointAt(X, Y), W, H}
}

type Circle struct {
	Point
	R float64
}

func CircleAt(X, Y, R float64) *Circle {
	return &Circle{PointAt(X, Y), R}
}

func (p *Point) Move(delta Point) {
	p.X += delta.X
	p.Y += delta.Y
}

func MoveCirclesAndBoxes() {
	boxes := []*Box{BoxAt(1.0, 1.0, 1.0, 1.0), BoxAt(0.0, 1.0, 1.0, 1.0)}
	circles := []*Circle{CircleAt(1.0, 1.0, 1.0), CircleAt(0.0, 1.0, 1.0)}
	fmt.Printf("boxes: %v\ncircles: %v\n", boxes, circles)

	// Essentially, this is the entire point of this proposal, for me at least:
	delta := Point{1.0, 2.0}
	for _, box := range boxes {
		box.Move(delta)
	}
        // Can also iterate like this: 
	for i := 0; i < len(circles); i++ {
		circles[i].Move(delta)
	}

	fmt.Printf("boxes: %v\ncircles: %v\n", boxes, circles)
}

func main() {
	MoveCirclesAndBoxes()
}

This uses no interfaces and the calls to Point.Move are statically dispatched.

@sammy-hughes
Copy link
Author

@gopherbot please remove label WaitingForInfo

@gopherbot gopherbot removed the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Sep 21, 2021
@sammy-hughes
Copy link
Author

@beoran, yeah. I did spend a significant chunk of that dealing with what alternatives exist today, and after expressing the analogous unsafe expression that is the target of this functionality, an example as above was literally my first example. I'm not claiming that it's not something that can be done currently. Quite the opposite. This is most effectively done with raw pointers, and several less drastic solutions also exist.

I did give a few reasons why such a solution would better than above, even if it is all a hypothetical to your concrete example, but this isn't some sob story about how Go doesn't have the tools I need. This isn't some demand that I'm gonna take my custom to Python, but rather "Hey. This seems like it'd slot nicely in with Go's memory model. What'd ya think?"

@martisch
Copy link
Contributor

martisch commented Sep 21, 2021

Example from proposal:
converted := struct{One,Two,Three int}{10,20,30}.{[3]int}

One thought I had while quickly glancing over the proposal is if those conversions are to be made without copying to new memory then it will introduce a memory ordering on the struct fields in go. This is not currently guaranteed as far as I remember by the go spec as a thought was the go compiler could optimise this for the user under the hood in the future (unless there is some annotation to tell the compiler not to).

@beoran
Copy link

beoran commented Sep 21, 2021

Ok, now that you filled in the template I have a better idea of what you are trying to do.

The unsafe code you use now,
subber := (*SubType)(unsafe.add(unsafe.Pointer(&super), unsafe.OffsetOf(&super.X))) is similar to C code with C's offsetof(), where we can create a pointer to somewhere in the middle of an existing struct and treat that as if it is an embedded struct. In C we do this because we do not have struct embedding like we have in Go, and also for performance. Of course in Go this is unsafe and in C it may break strict aliasing, so in both cases it is not really safe to use.

But in Go we do have struct embedding, so I would say the first reason for this feature is debatable. The main problem you seem to face is performance. Rather than proposing a rather complex new language feature, I think we should probably try to improve the performance of the compiled code for embedded structs, so it works better in your use case.

@sammy-hughes
Copy link
Author

sammy-hughes commented Sep 21, 2021

All-in-all, I appreciate the attention given thus far, and if I must walk away being told by the Go team, "We care a lot more about X API working well, and we think your suggestion is a distraction from it. Try using X, it'll get you close but Safely.", I could accept that. I am hoping to push the issue a bit, if there's willingness to consider it.

@martisch, honestly I was thinking that the possibility that memory might have multiple owners was a bigger deal. I didn't realize that field-ordering wasn't a stable thing. I generally see the same data in the same order, and while explicitly not covered by a compatibility guarantee, I assumed this was fairly stable. If you meant from an API perspective, matching the precise order and type between structs absolutely would be the biggest barrier to a user.

Either way, I was writing this proposal with the assumption that field-order was stable, even if it didn't carry a guarantee. Field Number doesn't correspond to offset, embedded interfaces are uinptr width, and embedded structs also have their own subsection of the total offset. If this is a misunderstanding, I would love a pointer to reference/source that would help me get a better head-model of allocations. If you simply meant that you don't think the feature compelling enough to officially support a feature which depends on that, I would find it frustrating but wholly acceptable.

@beoran, I appreciate your interest in skipping past any XY problems posed. I find embedding an inappropriate tool for my use on account of behavior composition versus structural composition. Again, I will readily admit that I'm comparing Go to Rust, and I am in want of a specific functionality that may very well be of non-interest for the language design of Go. As above, it'd be an obnoxious finding, but I could accept that as outcome.

Embedding is currently the only effective way to compose behaviors, because it is the only way to compose structure. If that proposal for struct fields holding a function reference being able to satisfy interfaces would help me somewhat. The core problem domain is performance pathways over types that aren't mutually convertible. Possibly saying more about our ingestion phase and data design than any deficiency in Go, I'm interested in a means of expressing that each of those types satisfies a contract based on fields, while being non-exclusive with other contracts which include those same fields. Embedding is a way to accomplish this, but it fails on the non-exclusion point.

I understand that this is what interfaces are literally for, but they have an implementation I find unsuitable as a whole in my domain. Today, there are absolutely ways to accomplish the pattern I want, a contract constraining the reduction of instances of various types to a common interchange form, and where behaviors can be defined which will apply to instance in that form, where mutations to instances in that form carry over to the instance in original form, and where overlapping contracts are non-exclusive. The further goal of limiting allocations, ideally keeping all allocations localized and/or on the stack, is satisfied by each of the below strategies.

Given that my problem is one of reducing varying types to the subset on which they overlap, I could totally see this being handled by the Generics system at some point in the future. The subtle issue of defining field-based constraints aside, and the API by which a constraint could be defined based on types from other packages, I still have the issue of type-mixing, and then of migrating changes back to the original, after a series of mutations. If X as field of generic pointer to T, (*T)(t).X was permissible at some point, and if there were not overhead in selecting that function (mitigated to some extent by ([]T)[i].X), I could become content with that. Even with any possible issues, that at least guarantees compile-time checking.

The solution that I'm playing with as a viable, production-ready approach to refactor to is to use a slice of fixed arrays of uint8, serializing using a fixed-field schema, and passing those chunks around as datasleds with a husky-dog of a function having en-closure incrementing index and a reference to the sled, iteratively returning the fields needed for that pipeline stage. You may have noticed my lobster claws, and I do know I'm intending a pattern that's not really idiomatically Go.

Having done a fair bit of hacking in Go, and I observed that my solution above resembled the Go memory model, and that I could get similar behavior as above by using a raw pointer to convert the array of uint8 to an array of struct. In my test model, it worked, but I observe the comments from @martisch to indicate that as a happy accident.

The part that embedded structs can't accomplish, by design, is mutations outside of their fields. So if you want, say that Point example, to operate on Point with height, you have to define that on the next highest unit above a Point, rather than on a slightly different definition of a Point that still satisfies a hypothetical contract around, yeah, field ordering. Instead, to merge pipelines having different element types, even if they all satisfy a common hypothetical contract, I'm back at reallocation as a new type, or else refactoring to wrap Point in a dimensioned other type, and then embed that. I get that in this contrived example, you could absolutely handle arbitrarily dimensioned drawables as a struct with Point and slice of Point with enum type, but the point is rather that Go doesn't yet have a compile-checked performance path that offers a contract that supports that use case -- it's obviously not my place to even claim it should, but the observation stands.

My second alternative was to create a type where each of the fields internally are pointers. Because I don't have any guarantee that len(datasled)%Sizeof(item) == 0, it could not be viable unless I create that guarantee with a runtime panic at the start. With those allowances, that pattern kind of almost works, until you want to merge pipelines, and then there's the type-mixing problem. A possible solution is for that type to be packaged with a closure which mutates its internal pointers on its behalf, so that the type itself can be monomorphic. This is a solution that actually works for me, mostly, as a (mostly) ergonomic way to diversely monomorphize a collection without reallocation.

Finally, not suggested above, because I think it's more to do with my use case than with the original problem, is a struct wrapper having an empty interface, where the interface is treated as a multipurpose pointer, and where each method is a return or mutation on the type-switched field on interface.(type). I haven't tested that, and once generic structs are up, I'm expecting that would be better on all counts, excepting possibly the case of type-mixing.

@beoran
Copy link

beoran commented Sep 21, 2021

First of all, I am not on the Go team, I'm just a regular punter here who likes to comment and evaluate Go issues.

Well, I thought about what you are trying to do, and reading the generics proposal I felt it should be possible to do something like this with the go generics. I tried it out but it didn't work, so I filed this: #48522.

I would like to kindly advise you to "forget" Rust and other C-like languages when programming Go. Your previous knowledge is actually likely to lead you astray, since Go is semantically much more of a Wirth-style language like Oberon or Modula 3. I would try to figure out first how to write your code in a Go-like and readable way, and then think of how to improve the performance.

Edit: a classic trick would be to turn your structs inside out and use separate fixed length arrays of fields in stead.

@sammy-hughes
Copy link
Author

I actually just saw that, and I went "Huh, this person must have read my propsal -- Oh! Hey, it's that guy with the name that sounds like the bear-dude from LoTR that got totally left out of the movies!" I guess I disregarded the possiblity that there was interest in supporting that API. Not having dug into the stenciling machinery, unless different versions of a generic function are created for each type that is used with an example, the use of fields on a generic is fairly likely to remain off the table.

A function call as assembly produced by the Go Compiler, having a struct parameter, any use of fields on that struct are rendered as pointers+offset, from the struct pointer. I could see that being handled in assembly, most obviously, by rendering a different function for each distinct type used as call argument, but it could also be done with, in the assembly, a pointer to an array of the offsets to be used, one for each type which would be passed as call argument, and the array being of the length of distinct fields referenced in the function body. Now that I think about it, it seems like that could be OK in the end, but I'd have to do a mockup, ensuring that it resembles what I just described, and compare it to some static calls.

And you speak truth on the "Forget that stuff" bit. One of my favorite patterns in Go as of late has been the slice of array with driving closure iterator. It isn't idiomatic Go, but it's generally fairly close to the for i;i<l;i++ {} pyramid equivalents, while being more fun to write, and emphatically easier to refactor or repurpose. Plus, most of the utility of what I'm suggesting is had by swapping iterators, anyway, and it's covered by static analysis up until the point where I do the raw-pointer cast.

Nonetheless, a facility for structured type division and supervision, with static analysis from core tooling, would be a benefit to the language, cut use of empty interface, and provide a decent alternative to unsafe. And, maybe you're right, and enough of the possible applications are already covered by struct embedding and now generics that it's just not advantageous enough to introduce memory model constraints over.

@beoran
Copy link

beoran commented Sep 22, 2021

Well, I think that allowing access to fields in generics is probably going to be a good way to help solve your problem, and actually I am surprised that it is not supported yet, since it doesn't really make sense for it not to be supported. I don't care too much about the implementation details, rather I think we should push to keep it "on the table" , as it were, and get it implemented somehow. Then if that implementation were to have some performance downsides, we can always suggest improvements. "First make it right. Then make it fast." (Jeffrey Palermo).

And yes, Go also makes closures easier than most C-like languages do, which allows them do be used in various creative ways. So if it's easier to read, write and refactor, then by al means, go for it.

And I fully agree that Go needs a new language feature to move away from using the empty interface and unsafe as much as possible. But, generics is that language feature, if fully implemented as is now documented in the proposal. Also, the feature you are proposing seems to be of too limited use, focused only on your own use case. That is a problem that allowing field access in generics doesn't have. So If you agree, I'd like your support for #48522, especially by providing examples from your current projects in how allowing access to struct fields in generic types would help you.

@3bodar
Copy link

3bodar commented Sep 22, 2021

@sammy-hughes
The toy examples in your proposal are not hard to read, but in the real world it is not uncommon for structs to have way more than 3 fields. Could you discuss what your examples would look like if I wanted to destructure this struct like so: fields LowerC ... LowerV into a subtype Lower, fields Percent and CompilingRuntime into a subtype SpecialChars? In particular, would I be supposed to create dummy struct types for the preceeding/succeeding fields (relative to each desired subtype) with no other purpose than to avoid writing out ~5 dozen fields, and whenever I change the main struct, make sure I also update the dummies?

With the current facilities that Go provides, something like

(*Lower) (unsafe.Add(
    unsafe.Pointer(super),
    unsafe.Offsetof(super.LowerC),
))

can be hidden inside a method with a descriptive name, e.g. func (super *CmdFlags) AsLower() *Lower, and I think use cases that would utilize more than a handful of the n*(n+1)/2 possible destructurings of a struct with n top-level fields are not common, so you won't need many of these methods. All is pretty readable and maintanable, imo.

So, do you think there will be an overall gain in readability and maintainability for real world programs, than what can be achieved presently?

@sammy-hughes
Copy link
Author

As to the final question, @3bodar, I do. My last professional position was maintaining the data-delivery portion of a web application, and my current position is with a data warehousing team, part of an Ag/Chem org. Not using Go, I worked with a team responsible for ingesting assessor data for a company that offers a suite of financial projection products. I will agree that 2 of 3 of those positions wouldn't have seen the workload impacted.

In the first case, my last position working with Go, I would completely agree with @beoran's commendation of struct embedding, as the need was simply to represent data for purposes of CRUD operations between a remote client and the persisted data layer, on RDS/MySQL. We did some reporting, but a lot of the actual logic in the transforms involved were done in SQL, at the data layer. This feature would provide limited benefit, and any polymorphism needed would be completely served by struct embedding. We dealt with concretes like "Location" embedded in "Structure". Relationships between the concepts represented generally had clearly-defined relationships in persistence representation and view, needing no redefinition, none being possible.

The latter mentioned position, using a slew of Microsoft GUI tools, with C# as the main language for actual code, we had an internal flat file that would be compatible with the "datasled" technique I mentioned in an earlier comment, where we're essentially just marshalling into an array of byte. Were I to rewrite in Go, I would not reach for anything more complicated than a custom serializer framework, writing struct types for each distinct region, with serialization logic being well established, and then write the transform logic from original source (database dumps, fixed-length formats like our own, horrifying messes of GB-sized CSV files with no text delimiter), again, with likely no possible use-case for a feature such as I've proposed above.

My current position, we've had a stable data model in persistence, of the "General enough that it sucks only a little, for everything," but we routinely get new output definitions for existing transforms, and the number of transform stages a set undergoes will vary depending on the nature of the subject/project. Some transforms are universally applied, and some only apply to a specific experiment, but having a fixed persistence/input surface, but with data serializing into a range of distinct forms, varying by the value domain of the measurement, how self-referential the data is, the weighting of associated observations, the experiment constraints that apply, etc. In the current form, we have over 40 fields on the primary entity, with several lists of associated entities. Given that we are processing and delivering GB's of data per request, and that the specific form of the serialized data is based on the request spec, being one of many, our options are

  1. Write each stage for a reduced, interpreted "interchange" form, and then loop through the experimental finding, then any applicable related findings, loading them into a slice of that reduced form, and execute the battery of contiguous stages which use that form, nil the slice after looping, then move on to the next transform. If two transforms are separated by one or more transforms on a different interchange form, having to choose whether you can afford to retain the transformation, or if you have to manually free it to ensure sufficient memory for the next transform.
  2. Just eat the cache locality issues, soak the constant cache-line fills, and write the thing to properly select logic for a primary finding type or the appropriate related finding type, on every stage, for every distinct operation in that stage.
  3. Optimize using struct embedding. Work up a new common model to use every time a new spec requires a change, wrapping some types, expanding others, and just changing methods on still more. This presents risk to compatibility across existing pipelines, and requires just accepting that, while you can optimize for throughput, you're still stuck in a constant maintenance phase.
  4. Beg, borrow, or bribe management into allowing unsafe in production code, having sufficient tests to guarantee the output for the conditions you forsee, but knowing that you're giving up compiler assistance in the process, and as unlikely as a memory-model change is, risking an update to Go invalidating your entire waterworks.

With what I'm proposing there are still a number of headaches and problematic logistical elements, but a lot of those problems become much simpler to handle.

  1. The proposed functionality comes from the interest of alternatively expressing the existing data in a form optimized for records-per-cache fill, without paying for it by allocating more memory than the necessary pointers, and without two copies per field included in the interchange re-expression.
  2. The proposed functionality allows interchange forms to be mutually compatible, provided they each satisfy a contract around field-ordering on the common model. And changes to that model carries the risk of invalidating contracted interchange types, but because the proposal stipulates type-checking, with the same rigidity of assignability as in any other assignment in Go, such invalidations would fail, even without unit tests for that specific eventuality, because pre-compile static analysis can find the assignment as invalid.
  3. The proposed functionality would allow the common model to continue unchanged, and adjustments for each new spec that comes through necessitates at most, definition of new re-expressing interchange types, and will require no change as often as solution strategy 3 above would have permitted such.
  4. The proposed functionality, supported by clear expectations on the part of any user of such syntax, and any restrictions on expression-combination as are needed by the Go toolchain to guarantee safety, would allow changes which otherwise might have caused outages to instead fail the build at compile. Additionally, having the support of the Go toolchain, such projects as used the, hypothetically accepted and implemented, feature proposed, over unsafe, would be easier to work with and debug, not having a blindspot in static-analysis on every call to unsafe.

So yes, the final result, assuming it becomes accepted as a supported feature of the language, would be that vectorized calculations on datasets too large to fit in cache for a single-processor become much easier to accomplish in Go, and some of the routine optimizations by such projects, currently using unsafe, gain the support of Go's excellent compile toolchain.

@3bodar
Copy link

3bodar commented Sep 22, 2021

@sammy-hughes
Thanks for the detailed reply, but except for the first sentence, I couldn't understand much of what you are saying. Not even whether you are trying to answer my other questions (the essential ones for me to better understand how the proposal would affect the language). But nevermind.

I would just add one thing - the feature as described doesn't appear to me as fitting smoothly into Go's design philosophy of simplicity and clarity of design, and, I think, would be of no more than niche use, so I'd predict it's going to be rejected. It's very possible I don't know what I am talking about, as I consider myself a novice gopher (having written < 5k lines of Go code in all) and I've yet to fully read and understand the spec.

If you prove me wrong, more power to you. Good luck.

@sammy-hughes
Copy link
Author

sammy-hughes commented Sep 23, 2021

@3bodar, yeah, no worries.

I don't aim to convince you that this is not complex as far as the backing machinations, but I do object to the suggestion that it would be arcane to use. The syntax is rigid, but it follows, essentially, this core rule: account for all fields/elements, in exactly the correct order. Other rules, yes, but mainly that.

Such a syntax would have to be that rigid if it is intended for inclusion outside of the package "unsafe". I believe it to be very simple for the Gopher using it, even if it takes fiddling to arrive at a proper invokation. That fiddling would be adjusting the call to confirm to a static, readable, exists-in-code requirements list, e.g. the struct or array (not slice) being destructured.

Meanwhile the syntax itself borrows from existing syntax elements. First, it was directly inspired by the type-union clause of a constraint declaration: x.{w|y|z}, itself quite similar to the syntax for asserting the underlying type of an interface: x.(y). As a type expression list, it matches such syntax roughly.

I am proposing a feature and syntax which is a combination of special cases for the following Go features:

  1. Slices. I'm proposing a feature and syntax to slice on instances of structured types, using an intentionally rigid API. I expect my proposal to assist mostly research and computational workloads, where it offers optimizations over what is already safely possible, and replaces the need for unsafe in most of the cases I'm thinking of for what is unsafely possible. The API does need to be rigid enough to allow compile-time guarantees to be possible, and the core functionality likely is not served by anything less rigid.
  2. Embedding: In the above comments, I pointed out why embedding is cumbersome in my specific workload, and I would expect that extends to anyone dealing with the following issues: A, constantly evolving output specification, and B, extremely demanding performance expectations. Outside of that, I expect that existing machinery completely satisfies the list of applications I can imagine, this proposed feature possibly not providing significant benefit outside of that.

As far as who will benefit from it, the following cases are examples of use-cases completely supported by the existing Go ecosystem, without resorting to unsafe, or needing any new syntax.

  • Front-end for a data-persistance layer (database, redis, filestore, whatever): Design for Code-gen consumiung the database DDL.
  • Data processing in a concrete domain (like property value data from county assessors): Special cases exist, but "structure" is inherent to the domain.
  • Developing a rendering engine (using a structured, planned approach, with good management): Struct embedding is likely all you'll likely need.

Contrarily, the following are examples of application-targets that will benefit from such a syntax, already often resorting to unsafe, for raw pointers to get the performance optimizations described in this proposal.

  • Developing a rendering engine, except taking an agile approach
  • Experimental modeling for a research project or natural resource discovery
  • Embedded application development
  • Any project with management that throws new requirements at you every time they watch CSI
  • Tool development having rigid goals, but with an appetite for any possible incremental performance gain

That said, and acknowledging that this could be a bit of a hassle to implement (mostly in the improvements to precompile analysis tools like go-vet), and that the demands of the syntax proposed could be difficult to satisfy in certain cases, it would be a feature that is quite accessible. The proposal contains the possibility of consolidating dozens of lines of runtime-safety around raw pointers into a single, compiler-checked assignment.

@sammy-hughes
Copy link
Author

sammy-hughes commented Sep 24, 2021

@3bodar, I failed to answer a specific question not once but twice.

TLDR:

  1. The feature as proposed requires all fields to be explained, so yeah, either 5 dozen fields or a couple dump structs. It's intended to allow safety at speed, not for convenience.
  2. Your unsafe example gives a pointer which is potentially nilled by the GC, if either SuperType instance or SubType instance goes out of scope. The proposal as written is for that, but safe.
  3. This proposal is a win if the tricks I can already use unsafe to accomplish can happen with tooling support at linter, validation, compiler, and GC.

I am going to cover as discussion the question you cared about, include some snippets based on the example you suggested, and while offering profuse apology along the way, briefly catch why I thought I was answering your question, while apparently quite spectacularly missing the point.

Paraphrasing for clarity (I'm one to talk):

Could you discuss what your examples would look like if I wanted to destructure this struct ? Does the creation of "spacer" struct types, exclusively to provide structural offsets constitute an anti-pattern?

Yeah. I didn't answer that. I answered twice in good faith, but I didn't address a specific question, and for that I am sorry. After rereading this thread, I went "Aw, Py**on!" (Sorry for the colorful language. Bit of a sailor).

The chief end of this proposal should be considered, I, performance improvement, II, static-analysis and safety improvement, and III, very distantly an ergonomics improvement. As the goal is, crassly put, to push "unsafe hacks" the realm of safe and dependable code, supported by Go's tooling, by provisions for non-ambiguous contracts, I and II or II and I is the same.

I. The proposal is intended to gain a facility for a developer to write a contract around the structure of a compound type, either to divide or to supervise, whereby the garbage collector may be aware of the effective lifetime of component fields, the compiler can be aware of incorrect type conversions, and linters/syntax formatting can be aware of the progressively asserted structure of values subject to the feature being proposed.

II. Considered as a safety improvement, the syntax proposed intends that an utterly unambiguous contract may be established, that is executable by the compiler, and subsequently by the runtime GC. Further, it would help simplify static analysis tooling, versus using raw pointers, to trace values across their various identifiers, recognizing the value across its various identifiers being used in context of concurrency in a manner resulting in race conditions, and various quality-of-life matters such as linter context highlighting and in-editor tracing.

III. The syntax, as proposed whatever other form, must be suitable to provide guarantees that are better not made than if offered ambiguously. Still,

  • if an external library, having a stable API, has struct types which are cumbersome to use, the feature as proposed would permit almost-free slicing on the internal structure, with some preparation, but with full tooling support.
  • As proposed, the feature allows writing types to meet external surfaces, without needing a dozen different custom serializers for different services.
  • It could allow skipping the odd extra line for operations wherein some combination or separation occurs.
  • It could possibly be used to enable patterns like collecting a struct out and error out into a field on the out struct (mildly senseless, but possibly enabled).

There are some examples where I would rather like to have this syntax, for reasons that are purely ergonomic. For most such cases, interfaces and generics are suitable and effective.

The struct referenced, the CmdFlags struct from the Go compiler codebase, was suggested as having 2 ranges of interest, LowerC to LowerV, and the two adjacent fields Percent and CompilingRuntime. With defs to follow, example would be one of:

func MakeCmdFlagsA(
  in0 *cmdFlagsSpacerPrelogue, 
  in1 *Lower, 
  in2 *SpecialChars, 
  in3 *cmdFlagsSpacerEpilogue,
) CmdFlags {
    var out CmdFlags.{cmdFlagsSpacerPrelogue, Lower, SpecialChars, cmdFlagsSpacerEpilogue} = *in0, *in1, *in2, *in3
    return out
}

func MakeCmdFlagsB(
  in0 *cmdFlagsSpacerPrelogue, 
  in1 *Lower, 
  in2 *SpecialChars, 
  in3 *cmdFlagsSpacerEpilogue,
) CmdFlags {
    //This is hypothetical, and does not represent a part of my proposal, specific syntax here being mentioned nowhere.
    //I simply observed the question "How might that look?", and suggest a possiblity here.
    out.{CmdFlags.{cmdFlagsSpacerPrelogue, Lower, SpecialChars, cmdFlagsSpacerEpilogue}} = *in0, *in1, *in2, *in3
    return out
}

func MakeCmdFlagsC(
  in0 *cmdFlagsSpacerPrelogue, 
  in1 *Lower, 
  in2 *SpecialChars, 
  in3 *cmdFlagsSpacerEpilogue,
) CmdFlags {
    //This is hypothetical, and does not represent a part of my proposal, specific syntax here being mentioned nowhere.
    //I simply observed the question "How might that look?", and suggest a possiblity here.
    out = CmdFlags.{cmdFlagsSpacerPrelogue, Lower, SpecialChars, cmdFlagsSpacerEpilogue}(*in0, *in1, *in2, *in3)
    return out
}

func ProcessCmdFlags(in *CmdFlags) {
    _, lower, specialChars, _ := (*in).{cmdFlagsSpacerPrelogue, Lower, SpecialChars, cmdFlagsSpacerEpilogue}
    //use things as destructured. Did not need the prelogue or epilogue, so dumped into placeholders.
}

For reference, that's based on the following struct definition, from the Go compiler flags:

type cmdFlagsSpacerPrelogue struct {
    _, _ CountFlag //alternatively [2]CountFlag
    _ string
    _, _ CountFlag //alternatively [2]CountFlag
    _ func(string)
    _, _, _, _ CountFlag //alternatively [4]CountFlag
    //I didn't chase that long enough to know if 1. A different form has V, 2. it's a derived property, or 3. A form derived from this has V
    _ CountFlag
}
type cmdFlagsSpacerEpilogue struct {
    _, _, _, _, _  string
    _, _, _, _ bool
    _, _, _ *bool
    _, _ func(string)
    _, _, _, _ string
    _ CountFlag
    _ bool
    _ string
    _ int64
    _ string
    _, _, _, bool
    _ *bool
    _ bool
    _ string
    _ bool
    _, _, _ string
    _ bool
    _ struct { // I am unclear if listing these fields flattened to the outside struct would have the same meaning
        _ struct {
            _ map[string][]string 
            _ map[string]string
        }
        _ []string
        _, _ map[string]string
        _, _ bool 
    }
}
type Lower struct {
    LowC int
    LowD func(string)
    LowE, LowerH, LowerJ, LowerL, LowerM CountFlag
    LowO string
    LowP *string
    LowR CountFlag
    LowT bool
    LowW CountFlag
    LowV *bool
}
type SpecialChars struct {
    Prozent int
    CompilingRuntime bool
}
type CmdFlags struct {
	// Single letters
	B CountFlag    "help:\"disable bounds checking\""
	C CountFlag    "help:\"disable printing of columns in error messages\""
	D string       "help:\"set relative `path` for local imports\""
	E CountFlag    "help:\"debug symbol export\""
	G CountFlag    "help:\"accept generic code\""
	I func(string) "help:\"add `directory` to import search path\""
	K CountFlag    "help:\"debug missing line numbers\""
	L CountFlag    "help:\"show full file names in error messages\""
	N CountFlag    "help:\"disable optimizations\""
	S CountFlag    "help:\"print assembly listing\""
	// V is added by objabi.AddVersionFlag
	W CountFlag "help:\"debug parse tree after type checking\""

	LowerC int          "help:\"concurrency during compilation (1 means no concurrency)\""
	LowerD func(string) "help:\"enable debugging settings; try -d help\""
	LowerE CountFlag    "help:\"no limit on number of errors reported\""
	LowerH CountFlag    "help:\"halt on error\""
	LowerJ CountFlag    "help:\"debug runtime-initialized variables\""
	LowerL CountFlag    "help:\"disable inlining\""
	LowerM CountFlag    "help:\"print optimization decisions\""
	LowerO string       "help:\"write output to `file`\""
	LowerP *string      "help:\"set expected package import `path`\"" // &Ctxt.Pkgpath, set below
	LowerR CountFlag    "help:\"debug generated wrappers\""
	LowerT bool         "help:\"enable tracing for debugging the compiler\""
	LowerW CountFlag    "help:\"debug type checking\""
	LowerV *bool        "help:\"increase debug verbosity\""

	// Special characters
	Percent          int  "flag:\"%\" help:\"debug non-static initializers\""
	CompilingRuntime bool "flag:\"+\" help:\"compiling runtime\""

	// Longer names
	AsmHdr             string       "help:\"write assembly header to `file`\""
	Bench              string       "help:\"append benchmark times to `file`\""
	BlockProfile       string       "help:\"write block profile to `file`\""
	BuildID            string       "help:\"record `id` as the build id in the export metadata\""
	CPUProfile         string       "help:\"write cpu profile to `file`\""
	Complete           bool         "help:\"compiling complete package (no C or assembly)\""
	ClobberDead        bool         "help:\"clobber dead stack slots (for debugging)\""
	ClobberDeadReg     bool         "help:\"clobber dead registers (for debugging)\""
	Dwarf              bool         "help:\"generate DWARF symbols\""
	DwarfBASEntries    *bool        "help:\"use base address selection entries in DWARF\""                        // &Ctxt.UseBASEntries, set below
	DwarfLocationLists *bool        "help:\"add location lists to DWARF in optimized mode\""                      // &Ctxt.Flag_locationlists, set below
	Dynlink            *bool        "help:\"support references to Go symbols defined in other shared libraries\"" // &Ctxt.Flag_dynlink, set below
	EmbedCfg           func(string) "help:\"read go:embed configuration from `file`\""
	GenDwarfInl        int          "help:\"generate DWARF inline info records\"" // 0=disabled, 1=funcs, 2=funcs+formals/locals
	GoVersion          string       "help:\"required version of the runtime\""
	ImportCfg          func(string) "help:\"read import configuration from `file`\""
	ImportMap          func(string) "help:\"add `definition` of the form source=actual to import map\""
	InstallSuffix      string       "help:\"set pkg directory `suffix`\""
	JSON               string       "help:\"version,file for JSON compiler/optimizer detail output\""
	Lang               string       "help:\"Go language version source code expects\""
	LinkObj            string       "help:\"write linker-specific object to `file`\""
	LinkShared         *bool        "help:\"generate code that will be linked against Go shared libraries\"" // &Ctxt.Flag_linkshared, set below
	Live               CountFlag    "help:\"debug liveness analysis\""
	MSan               bool         "help:\"build code compatible with C/C++ memory sanitizer\""
	MemProfile         string       "help:\"write memory profile to `file`\""
	MemProfileRate     int64        "help:\"set runtime.MemProfileRate to `rate`\""
	MutexProfile       string       "help:\"write mutex profile to `file`\""
	NoLocalImports     bool         "help:\"reject local (relative) imports\""
	Pack               bool         "help:\"write to file.a instead of file.o\""
	Race               bool         "help:\"enable race detector\""
	Shared             *bool        "help:\"generate code that can be linked into a shared library\"" // &Ctxt.Flag_shared, set below
	SmallFrames        bool         "help:\"reduce the size limit for stack allocated objects\""      // small stacks, to diagnose GC latency; see golang.org/issue/27732
	Spectre            string       "help:\"enable spectre mitigations in `list` (all, index, ret)\""
	Std                bool         "help:\"compiling standard library\""
	SymABIs            string       "help:\"read symbol ABIs from `file`\""
	TraceProfile       string       "help:\"write an execution trace to `file`\""
	TrimPath           string       "help:\"remove `prefix` from recorded source file paths\""
	WB                 bool         "help:\"enable write barrier\"" // TODO: remove

	// Configuration derived from flags; not a flag itself.
	Cfg struct {
		Embed struct { // set by -embedcfg
			Patterns map[string][]string
			Files    map[string]string
		}
		ImportDirs   []string          // appended to by -I
		ImportMap    map[string]string // set by -importmap OR -importcfg
		PackageFile  map[string]string // set by -importcfg; nil means not in use
		SpectreIndex bool              // set by -spectre=index or -spectre=all
		// Whether we are adding any sort of code instrumentation, such as
		// when the race detector is enabled.
		Instrumenting bool
	}
}

If you end up reading this, @3bodar, I do quite hope you find this a better answer. As a quick explanation of why I think I was answering you, accepting that you clearly didn't agreee, consider my perspective on my two replies to youi:

I initially heard from you, "Hey your thingy looks complicated. Why? Would it even be worth it in the real world?" My response was intended to nicely say, "Yeah. That kind of means I don't care about you," but in a way that made it clear that I also don't care about past me, for much of my career. I wanted to show how it actually doesn't make sense for two real-world examples, and what made them different from now.

After you replied, I replied in context of my memory of your first reply and then the text of your second reply. The core of it hit me as "Meh. Doesn't look useful," which was a conclusion I support you arriving at, and challenged my use jargon (sorry. if we were friends, it would have been a "Lets stop texting and phone call" thing). In my reply, I set out to plead "no contest" on those points, as I agree on the spectrum of applicability, but not on the scheme of the syntax and on strength of appeal. I absolutely think it "looks like Go", and I fully intend that this proposal feel like Go, too.

Meanwhile, thanks for expressing interest. I hope it was all at least food for thought! One of the many things we agree on is the likely outcome of this proposal, but if I've done nothing other than spun some mental gears, I'd be quite content.

@3bodar
Copy link

3bodar commented Sep 24, 2021

@sammy-hughes
Now that's to the point, and I appreciate you taking the time to write it. And it's what I expected - with the "Pr[o]logue", "Epilogue", etc. And it's the segue I needed in order to add a couple more thoughts.

The essential problem I see is the syntax. The notions of simplicity and clarity being a contentious matter, I feel pretty adamant in my assessment that the code in those "MakeCmdFlags..." is horrendous. Just imagine coming accross that snippet without having a clue and trying parse the purpose of it.

And the fundamental reason for that complexity is, I believe, that you are trying to achieve something out of thin air without having a proper abstraction for it. Similar problems have already been solved in Go - you're basically in need of something that to a struct, is what a slice is to an array. (Or, in a similar vein, what the underlying []byte is to a string, ignoring issues of immutability and conversion semantics.) I wouldn't call that something a "struct slice" as I think it would be natural to allow for holes in it (contributing to sizeof, otherwise inaccessible), so let's call it a "struct view". So, just for illustration, if you could make up some syntax for creating a "struct view" type based on an existing struct type, e.g.

type MyStruct struct {
    A int
    ...
    K float64
    L float64
    M float64
    ...
    Z string
}

type MyStructView struct(MyStruct) {
    // only MyStruct fields allowed to the right, optional new names to the left
    F
    C    Circle(K, L, M) // aggregation 
    NewZ Z
}

then you could write simple and elegant code that feels Go-like:

// downcasting:
myStructView = MyStructView(myStruct)
// upcasting:
myStruct = MyStruct(myStructView)
// field access:
var c *Circle = &myStructView.C
z := myStruct.NewZ

The compiler would have all the info it needs - parent struct type, field types and offsets - to ensure type safety, similar to the array/slice case, and would know that the conversions are to/from a "struct view" type, and act appropriately, i.e. no need for new syntax for the destructuring.

My point here is just to provide an example of what I would consider a feature that is not obviously unworthy of Go's "ideals" - I have no idea whether something like the above satisfies all your requirements, or whether it would hold any water once you've analyzed all the intricacies related to it's precise specification and interaction with the rest of the language.

This whole time, I wasn't implying there's no value in your proposal, only that it's of little use even if it's just one aspect of it that feels like a not-so-pretty patch on the face of Go.

Cheers. 🍺 🍺

@ianlancetaylor
Copy link
Contributor

var out CmdFlags.{cmdFlagsSpacerPrelogue, Lower, SpecialChars, cmdFlagsSpacerEpilogue} = *in0, *in1, *in2, *in3

Why would I write this instead of just using a composite literal?

@ianlancetaylor
Copy link
Contributor

I understand that you have a use case for this, but it seems quite specialized to your code. Can you point to any places in the Go standard library, or in popular Go packages, where this new syntax would be desirable? Thanks.

@3bodar
Copy link

3bodar commented Sep 24, 2021

@ianlancetaylor
A key point of @sammy-hughes's is to not make a new value, similar to the way slicing an array reuses the underlying array. So a composite literal is not an option. And he wants type safety too, so unsafe is not an option either. In my previous post I expounded on what I think is the best he can realistically hope for - a "struct view" type such that the "struct view"/struct pair mirrors the semantics of the slice/array pair.

@ianlancetaylor
Copy link
Contributor

In the var out CmdFlags example, we are creating a new value: CmdFlags. Using a composite literal is going to wind up producing exactly the same executable code.

@3bodar
Copy link

3bodar commented Sep 24, 2021

@ianlancetaylor
I see. But from what @sammy-hughes has written so far, I've understood that his "destructuring" idea is all about using type safe sub-structs without creating copies. I hope he chimes is on that, but I suspect my idea about "struct views" might serve his purposes.

And just for what it's worth, could you comment briefly on whether you think such a feature could be generally useful - a "struct view" type, tied to a struct type, whose value is a "struct view" header (similar to a slice header) made up of just a pointer to the actual struct - basically doing for structs what slices do for arrays? Would it be worth it to open an separate proposal issue, or would you outright dismiss it?

@ianlancetaylor
Copy link
Contributor

From my perspective slices are basically safe pointers. For every case where in C you can safely use pointer addition, in Go you can use a slice. (For cases where in C you can unsafely use pointer addition, in Go you can use unsafe.Pointer.) The connection between slices and arrays is that the array is the range safely covered by the slice-as-pointer. (With this perspective we see that Go append is basically C realloc.)

My understanding of the struct view idea is that it is not that. A struct view is a way of picking out certain fields of a struct and accessing them directly. It's a form of narrowing and aliasing. And to some extent it's a form of renaming, although in practice, due to the way that pointers work in Go, that can only work for very limited cases, namely when the layout of the fields in the underlying struct precisely matches the layout of the fields in the renaming struct view.

The struct view is an interesting idea but I don't see a lot of benefit for real code. If we look back at https://go.dev/blog/go2-here-we-come we see that the first criteria for a language change is "address an important issue for many people." I don't myself see a lot of code that would benefit from struct views, and I don't see people expressing a need for anything like struct views. So I don't see it as a likely language addition. Sorry.

@3bodar
Copy link

3bodar commented Sep 25, 2021

@ianlancetaylor
That's right - the struct case is different from the array case as structs don't ever change in size. What you're saying makes perfect sense. Thanks.

@sammy-hughes
Copy link
Author

sammy-hughes commented Sep 27, 2021

So, looong weekend dogsitting some very active Australian Shepards. A few quick points:

  1. @ianlancetaylor, I would like a quick yes/no if examples from other languages would be considered as support. Meanwhile, I do observe your poo-pooing (Not derizive or dismissive). What I'm proposing is a very low-level optimization, and while I'll definitely dig for some candidates, I don't expect to find much in the Go ecosystem. I do believe such a feature would be prolifically useful in projects where it is applicable, at best that would be in niche domains, for few projects..
  2. A performance possibility that could possibly completely satisfy me would be some way to guarantee that methods/functions are inlined, potentially even that guarantee with the addition of "of arbitrary complexity." I described a solution I'm presently using to maximize cache-ops, but which relies on a closure.
  3. @3bodar, ignoring the sounds-like-a-verdict from Ian for sec, specifying that the compiler can infer segments requested, whenever it is ambiguous which fields you intended, a compile-time error would be fitting. I agree with you that such syntax would be more convenient if that behavior were possible, however, it means A, a big chunk of additional complexity, and B, it becomes less clear what is occurring. That said, your proposed syntax is as good as mine, only with the exception that I'm not clear on how one would define a view across multiple types, being effectively the same type.
  4. I find the comments above generally show an understanding of what I'm suggesting, and there's nothing I feel warrants specific correction. (You're good, @3bodar)

@ianlancetaylor
Copy link
Contributor

Examples of other languages that provide similar features would be considered as support.

(If you mean examples of other languages that use something like unsafe pointer casting to do a similar operation, that is less interesting. Other languages have different idioms and requirements, and in particular C and C++ specify plain-old-data structure layouts more precisely than Go does, when combined with relevant processor ABI docs. And one can do the unsafe pointer casting in Go, too.)

@sammy-hughes
Copy link
Author

@ianlancetaylor, the context being explicitly "compiler-guaranteed safe aliasing of structured types based on contract-in-code", and examples of how that's used.

@3bodar
Copy link

3bodar commented Sep 28, 2021

@sammy-hughes

...how one would define a view across multiple types, being effectively the same type.

The straightforward way to achieve something like that would be to explicitly define views per struct type and factor out the common part into a new pair of struct/view:

type A struct { ... }
type ViewA struct(A) { ... }
type B struct { ... }
type ViewB struct(B) { ... }
type Common struct { ... }
type CommonView struct(Common) { Common } // the view is the whole struct

In this setting the compiler would have no problem doing the narrowing conversions from ViewX to CommonView if those are "effectively the same type", so a func can accept a CommonView. Going in the other direction using just a conversion is obviously impossible as the original type is lost, but I'm not sure that's a problem.

Anyway, I've practically lost interest in this issue as I think Go simply isn't the right language for what you're trying to do, so I see myself bailing out of the discussion. But still, good luck...

@sammy-hughes
Copy link
Author

@3bodar, I also am mostly at your conclusion. I'll be spending the weekend satisfying the request for examples of projects which could/do benefit from such a feature, but after that I think I'll be ready to close the issue.

If ViewA and ViewB are both described by a field list which is convertible to Common, then I think most of my wishlist could be signified by such. One would still need to cast from ViewA/ViewA to Common, but there is no more difficulty suggested by that than by my suggested syntax, and given a little more complexity in tooling changes, significantly less difficulty is presented to the user of your suggested syntax.

I think the only real difference in capability would be whether the syntax could be used to alias field-ranges from types in other packages, and I'm not clear on how easily non-contiguous ranges of fields would be described, and whether any reduction in vectorized cache-stored values could be achieved--granted, I'm not 100% on whether that's possible as a benefit to be had, without copy, with my version of such syntax either.

That said, I expect to still be writing my own static-analysis tools for these kinds of projects, approval or no, for quite a while.

@ianlancetaylor
Copy link
Contributor

Based on the discussion above, this is a likely decline. Leaving open for four weeks for final comments.

@sammy-hughes
Copy link
Author

@ianlancetaylor, I did indicate that I planned to dig for some examples. I haven't been able to spend meaningful time on that yet. I accept the "Proposal-FinalCommentPeriod" tag.

I know of several structures that are like what I'm talking about, in C# and Rust, but I'm not clear how much of what I'm proposing is actually included there, and time I've spent has mostly been there. Again, though, I haven't really added meaningful research into existing features of other languages and uses of such, and I have spent no time on projects in the Go ecosystem.

@sammy-hughes
Copy link
Author

I recognize that this is still in final comment period, but nearly a month after my initial intent to explore, i still haven't gotten to this. I have found that Rust's traits and C++'s "concepts" have some overlap here, but they don't hit all the points I'd want. While I still think my proposal feasible, if not insignificant labor to implement, I recognize the concerns over broad applicability.

I'm going to close this ticket, accepting that the proposal is already in "Likely decline" status. Should I feel continued need, I may develop the idea into a distinct code-gen tool, but as of now, I agree with the consensus that it does not fit naturally as a Go feature...yet.

Thank you @ianlancetaylor and @griesemer for the due consideration, and to @beoran and @3bodar, I enjoyed the pushback and suggestions around this feature proposal.

@sammy-hughes
Copy link
Author

sammy-hughes commented Oct 25, 2021

As a post-close update, this exact functionality was implemented in Rust as a library. It is healthy, being actively maintained for over a year, and it is used by one other package. This is a point, yes, for usefulness, but it's also a positive case of functionality not being directly suitable (at least yet) for core language implementation, being implemented independently.

Further, the approach taken is one which requires that views are implemented in the same package (Go parlance for Rust's "mod" keyword) as the type subject to the view. To implement this in Go, a function-call is still required, as Go has yet to adopt a feature like Rust's Trait construct. Additionally, as this implementation depends on declarative, privileged prep-work, a responsible implementation would likely follow that same pattern; as @3bodar suggested, with the view being declaratively prepared in the same owning package as a type subject to view.

This was implemented using Rust's unsafe keyword, and Rust has a relaxed view of "unsafe" usage as compared to Go, having syntax rules that control scope of "unsafe"ness, and specific community conventions around "unsafe" use. Go currently has a stop-gap approach, wherein some "unsafe" functionality simply requires that "unsafe" is imported, even if as _, and applies silently to the entire file. I believe this provides more confirmation that Go is not yet at a state that would support core tooling support for such library or special syntax for that functionality, without a preceding declarative element.

In other words, one of the more progressive languages I can think of has a similar functionality that only fits as a library, and then requires specific preparation work and direct ownership of types being subject to such a view. This is solid confirmation that "Decline" was the correct choice.

The referenced crate (in go, equivalent to a module) is structview: https://crates.io/crates/structview/1.1.0

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

6 participants