Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: Go 2: unions as sigma types #54685

Open
zephyrtronium opened this issue Aug 26, 2022 · 34 comments
Open

proposal: Go 2: unions as sigma types #54685

zephyrtronium opened this issue Aug 26, 2022 · 34 comments
Labels
LanguageChange Proposal v2 A language change or incompatible library change
Milestone

Comments

@zephyrtronium
Copy link
Contributor

zephyrtronium commented Aug 26, 2022

Author background

  • Would you consider yourself a novice, intermediate, or experienced Go programmer? Experienced.
  • What other languages do you have experience with? C, Java, Rust, Python, MATLAB, Haskell, TypeScript, various flavors of assembly

Related proposals

Proposal

Introduce a restricted version of sigma types into Go. Formally, sigma types are pairs (a: T, b: B(a)) where a is a value of type T and B is a function that constructs the type of b from the value of a. Less formally, they can be considered tagged unions where the tag is an accessible value of a type the programmer can select.

Sigma types are a central feature of dependent type systems, which make it possible to statically prove many facts about the safety of programs. Such systems allow type metaprogramming, which allows the type constructor B to depend on variable values of a. Because Go does not have metaprogramming, I propose the restriction that a must be one of a statically enumerated set of values. I believe that sigma types will be useful despite that restriction.

Definitions

Preferring a name more familiar to programmers than logicians, I propose to name these types union types. Formally, there are two categories of union types, depending on the form of T in the definition (a: T, b: B(a)). The first category is value-discriminated. A value-discriminated union type has the following:

  • A discriminator type, which is any type that can represent constants.
  • A discriminator set, which is a fixed set of constant values of the same type (i.e. not untyped) as the discriminator type.
  • A fixed set of dependent types.
  • A fixed surjective mapping from the discriminator set to the dependent types.

A value of a value-discriminated union type U has:

  • A discriminator, which is a variable of U's discriminator type. The discriminator is one of the elements of U's discriminator set.
  • A dependent value, which is a variable of the type to which U maps the discriminator.

The second category of union types are type-discriminated. A type-discriminated union type has the following:

  • A discriminator set, which is a fixed set of types, plus a nil case. That is, the nil case is not a type, but it always appears in the discriminator set.
  • Dependent types, which is a fixed set of types.
  • A fixed surjective mapping from the discriminator set to the dependent types.

A value of a type-discriminated union type W has:

  • A discriminator, which is a type in W's discriminator set.
  • A dependent value, which is a variable of the type to which W maps the discriminator.

The distinction between value-discriminated and type-discriminated union types is that T in (a: T, b: B(a)) is a type in the former and a universe of types in the latter. This distinction is necessary because types are not terms in Go.

Syntax

Writing union types

The syntax I propose is as follows. We add a new kind of type literal with the following EBNF:

UnionType     = "switch" (Type [Tag] | "type") "{" { UnionTypeElem ";" } "}" .
UnionTypeElem = UnionTypeCase ":" Type [Tag] .
UnionTypeCase = "case" ExpressionList | "default" .

For a value-discriminated union type, the Type following switch specifies the discriminator type. Each case gives a list of values and the dependent type to which those values map. The values in the case clauses must be constants representable by the discriminator type. These form the discriminator set. The default clause specifies the dependent type when the discriminator is the zero value of the discriminator type. It is an error to specify the same value or type multiple times, including to both specify the zero value and to include a default clause. It is also an error to neither specify the zero value nor to include a default clause. That is, the zero value must appear exactly once in the discriminator set.

If the keyword type rather than a type name follows switch, then the union type is instead type-discriminated. Each case gives a list of types and the dependent types to which those values map. These form the discriminator set. The default clause specifies the dependent type for the nil case; if it is omitted, then the dependent type is struct{}. It is an error to specify the same type multiple times.

Union types may have tags on dependent types accessible through reflection, similar to tags on struct fields. Value-discriminated union types may additionally have a tag on the discriminator type.

An additional shorthand syntax can be used to specify type-discriminated union types where the discriminator types are each identical to their dependent types.

ShortUnionType = Type { "|" Type }

With this proposal, the syntax T1 | T2 | ... | Tn where a type is expected becomes a type-discriminated union type, shorthand for:

switch type {
	case T1: T1
	case T2: T2
	...
	case Tn: Tn
}

Union type literals are composite literals containing zero or one keyed elements. If a keyed element appears, the key must be a constant or type in the discriminator set of the union type, or literal nil to indicate the nil case of a type-discriminated union type, and the element must be a value of the type to which the union type maps the key.

Inspecting

In order to inspect the dynamic state of a union-typed value, we use syntax analogous to the same operations on interface types. These come in three flavors. First, we can use union assertions, a primary expression:

UnionAssertion = "." "(" Expression ")" .

The expression in a UnionAssertion production must be a constant or type in the discriminator set of the union type, or literal nil to indicate the nil case of a type-discriminated union type. Like type assertions on interfaces, union assertions can be used in contexts expecting one or two values. When there is only one result, the assertion panics if the discriminator is not equal or identical to the expression. When there are two, the first is the dependent value if the discriminator is equal or identical and the zero value of the dependent type otherwise, and the second is a bool indicating whether the assertion succeeded.

Second, we can use switch statements that mirror type switches, called discriminator switches:

DiscrimSwitchStmt  = "switch" [ SimpleStmt ";" ] DiscrimSwitchGuard "{" { DiscrimCaseClause } "}" .
DiscrimSwitchGuard = [ identifier ":=" ] PrimaryExpr "." "(" "switch" ")" .
DiscrimCaseClause  = DiscrimSwitchCase ":" StatementList .
DiscrimSwitchCase  = "case" ExpressionList | "default" .

The expressions in the case clauses in a discriminator switch must each be in the discriminator set of the union type of the primary expression of the guard. Analogous to type switches on interfaces, if an assignment appears in the guard and all expressions in the case which matches the discriminator appear in the same case of the union type definition, then the assigned variable has the type to which that case maps; otherwise, it has the union type.

Lastly, specific to value-discriminated union types, the special form x.(switch) evaluates to the current value of the discriminator. Note that this is the same syntax as the guard of a discriminator switch. Whenever this syntax appears in a switch guard, the switch statement is a discriminator switch having the semantic requirement that the case values must be members of the discriminator set. If the programmer wishes to relax this requirement, they may wrap the entire switch guard in parentheses.

Syntactic ambiguity

A minor syntactic ambiguity arises with this definition. This occurs where a union type literal without a type name is written in a statement context. Because Go requires that every literal must be used, the only otherwise valid use of a union type literal in statement context is, ostensibly, to call a method or perform a channel send on a value of one of the dependent types:

func anime() {
	switch string {
		case "anime": int
		default:      error
	}{}.("").Error()
}

Considering the awkwardness and uselessness of this expression, it is unlikely that it would ever arise in practice. Therefore, the parser assumes a switch statement when encountering the switch keyword in statement context. If it is necessary to write such an expression, the workaround is to wrap at least the type in parentheses.

Properties

The zero value of a value-discriminated union type has the zero value of the discriminator type as its discriminator and the zero value of the dependent type as its dependent value. The zero value of a type-discriminated union type has nil as its discriminator and the zero value of the dependent type of the nil case as its dependent value. Note that a type-discriminated union type may have a nil discriminator and a nonzero dependent value.

As an additional property, operations which can be used with all dependent types of a union type can additionally be used with the union type itself, mapping dynamically to the corresponding operation on the dependent value. When the operation on the dependent value is not a conversion but creates a result of the same type, the use of the operation on the union type produces a new value of the union type with the same discriminator and the operation result as the dependent value. The definition of "operation" here is as for constraints in generic functions. (This applicative property serves to mirror the semantics of type elements in generic functions.)

Two value-discriminated union types are identical if their discriminator sets are equal and both map each discriminator value to identical dependent types. Two type-discriminated union types are identical if every type in each is identical to one type in the other and both map each discriminator to identical types.

Neither the discriminator nor the dependent value of a union type is addressable.

Union types are comparable when each dependent type is comparable (all types which can represent constants are comparable, so the discriminator of a value-discriminated union type is as well) and are equal when their discriminators are equal or identical or both nil and their dependent values are equal.

The alignment requirement of a union type is the maximum of the alignments of all dependent types and the discriminator type, with an implementation-defined minimum alignment for type-discriminated union types. No guarantees are made about the maximum size or layout of a union type; a compiler may choose to put the discriminator first, last, in the middle, or spread across bits of the dependent types' storage, and it may or may not choose to rearrange or share storage for different dependent types.

Standard library

Updates to package reflect are extensive. Kind needs a new Union value. A new type DependentType has the following definition:

type DependentType {
	// Discriminator is the case which maps to this dependent type. If the
	// union is type-discriminated, then Discriminator is a Type or nil.
	Discriminator any
	// Type is the dependent type.
	Type Type
	// Tag is the tag string on the dependent type.
	Tag StructTag
}

On Type, a method NumDependent returns the number of discriminators on the type, and Dependent(int) returns a DependentType according to an implementation-defined enumeration. For value-discriminated union types, a new Discriminator method returns the type of the discriminator, and a DiscriminatorTag method returns the tag of the discriminator. Lastly, on Value, a new Discriminator method returns the current discriminator as a Value | Type, and the existing Elem method returns the dependent value.

Packages go/ast and go/types will need new types to describe union types.

Examples

Sum types

Before

type Sum interface {
	isSum()
}

type AlphaVariant struct{}

func (AlphaVariant) isSum() {}

var _ Sum = AlphaVariant{}

type BetaVariant struct {
	err error
}

func (*BetaVariant) isSum() {}

var _ Sum = (*BetaVariant)(nil)

func Make(bad bool) Sum {
	if bad {
		return &BetaVariant{errors.New("beta")}
	}
	return AlphaVariant{}
}

After

type Sum switch bool {
	case false: struct{}
	case true:  error
}

func Make(bad bool) Sum {
	if bad {
		return Sum{true: errors.New("beta")}
	}
	return Sum{}
}
Sum type with type parameters and methods

Before

type Maybe[T any] struct {
	just T
	ok bool
}

func (m Maybe[T]) Just() (T, bool) {
	return m.just, m.ok
}

func (m Maybe[T]) String() string {
	if just, ok := m.Just(); ok {
		return fmt.Sprintf("Just(%v)", just)
	}
	return "Nothing"
}

After

const Just = true

type Maybe[T any] switch bool {
	case Just: T
	default:   struct{}
}

func (m Maybe[T]) String() string {
	switch t := m.(switch) {
	case Just:
		return fmt.Sprintf("Just(%v)", t)
	default:
		return "Nothing"
	}
}
Type element syntax

Before

type Bytesy interface {
	string | []byte
}

var b Bytesy // ERROR: cannot use type Bytesy outside a type constraint: interface contains type constraints

After

type Bytesy string | []byte

var b Bytesy // OK: Bytesy is a type-discriminated union type with discriminators string, []byte, and nil
Unmarshaling JSON with a variant record

Before

type variantJSONResponse struct {
	Type string          `json:"type"`
	Data json.RawMessage `json:"data"`
}

type Response interface {
	isResponse()
}

type AlphaResponse string
type BetaResponse int

func (AlphaResponse) isResponse()
func (BetaResponse) isResponse()

func Decode() (Response, error) {
	var v variantJSONResponse
	err := json.Unmarshal(data, &v)
	if err != nil {
		return nil, err
	}
	var r Response
	switch v.Type {
	case "alpha":
		r = new(AlphaResponse)
	case "beta":
		r = new(BetaResponse)
	default:
		return nil, fmt.Errorf("unknown type %q", v.Type)
	}
	err = json.Unmarshal(v.Data, r)
	if err != nil {
		return nil, err
	}
	return r, nil
}

func Use() {
	r, err := Decode()
	if err != nil {
		panic(err)
	}
	switch r := r.(type) {
	case AlphaResponse: // Needs to be *AlphaResponse, but this can't be detected.
		doStringStuff(string(r))
	case BetaResponse: // Needs to be *BetaResponse.
		doIntStuff(int(r))
	default:
		panic("forgot a type")
	}
}

After

type Response switch string `json:"type,omitempty"` {
	case "alpha": string `json:"data"`
	case "beta":  int    `json:"data"`
	// We can express a JSON error field that exists if and only if the data
	// field is absent.
	default: string `json:"error"`
}

func Decode() (Response, error) {
	var r Response
	err := json.Unmarshal(data, &r) // Assuming encoding/json support for union types.
	if err != nil {
		return Response{}, err
	}
	return r, nil
}

func Use() {
	r, err := Decode()
	if err != nil {
		// Note that this can include the case where the JSON specifies a
		// variant not in the union type.
		panic(err)
	}
	switch r := Decode().(switch) {
	case "alpha":
		doStringStuff(r)
	case "beta":
		doIntStuff(r)
	// case "anime": // Invalid because "anime" is not in the discriminator set.
	default:
		panic(errors.New(r.("")))
	}
}

Additional info

  • Who does this proposal help, and why? This proposal allows implementing the full behavior of typed enums and discriminated unions, so all of the uses described in proposal: spec: add sum types / discriminated unions #19412 and proposal: spec: add typed enum support #19814 are addressed (except uses depending on exhaustive checking of cases). It expands upon that with the flexibility to describe types varying with arbitrary values, notably strings, which arises from time to time in some JSON APIs. By repurposing the T1 | T2 syntax, it also takes a large step toward unifying type constraints with the rest of Go's type system.
  • Please also describe the change informally, as in a class teaching Go. This proposal adds the ability to write types similar to Rust enums that can have values of several different types, but with the added advantage that the dynamic type can be controlled by an arbitrary value that can be any constant.
  • Is this change backward compatible? Insofar as changes to the type system can be, yes. A compiler for a version of Go that implements the proposed changes will also compile code written for versions that do not, with no changes in semantics. Other tools will need updates to support the changes to the AST and type system.
  • Orthogonality: how does this change interact or overlap with existing features? While it is technically possible to approximate sum types using sealed interfaces, doing so requires a large amount of code to define new types and methods for each variant, the interface zero value of nil is always a variant, and it is impossible to discover the set of variants dynamically. While this proposal addresses all of these problems, the last is, in my opinion, the most significant: reflection-based unmarshaling algorithms cannot decode interface-typed variables in general. Union types in particular would allow reflection to decode a particular variant based on the value of another field, which is helpful for certain APIs which return JSON like {madoka: "homura", kyuubei: 9} or {error: "anime"} from a single endpoint. This is in addition to the benefits of sum types, such as making applications like parsers much simpler to describe in types.
  • Is the goal of this change a performance improvement? No.

Costs

  • Would this change make Go easier or harder to learn, and why? Harder. I don't think dependent types are a particularly easy concept to learn. Typescript has the ability to encode dependent types through conditional types, so advanced users of that language may find the idea of union types accessible. Once users learn them, they may find certain problems easier and more natural to express in safe code, lowering the cost of learning different approaches like sealed interfaces or custom unmarshaling for those situations.
  • What is the cost of this proposal? (Every language change has a cost). The main cost is in the addition to the language, which increases the burden associated with learning Go. Supporting union types would also add a significant amount of code to the compiler, runtime, and any external tools which process Go code, as well as any code which uses reflect to handle all possible types.
  • How many tools (such as vet, gopls, gofmt, goimports, etc.) would be affected? Yes.
  • What is the compile time cost? I do not expect that adding a new type kind would dramatically increase compilation costs for any programs, although there may be some small increase in compilation time and object sizes. The cost would arise in the amount of compiler code required to support union types, changes to export and object formats to represent them, &c.
  • What is the run time cost? The existence of union types should not substantially impact the execution of most programs, except those which use reflection to handle types broadly, since the proposed implementation of reflection of union types may be expensive. The runtime itself will need substantial updates to account for the new type kind.

Exclusions

This proposal intentionally excludes several related features, especially pattern matching, underlying type terms (~T), and exhaustive switches. I will explain my rationale for these exclusions.

Pattern matching is orthogonal. It would hypothetically be possible to propose pattern matching which works with the existing types in Go, especially interface types. If we decide to accept union types, and pattern matching is a thing we want specifically for them, it is straightforward to propose it separately.

Underlying type terms are orthogonal. One can envision a proposal to add underlying type terms as proper types, e.g. such that every type which has T as its underlying type is in subtype relationship with ~T, allowing a "shortcut" for functions which are generic over a single type per formal parameter. Again, it is straightforward to propose them specifically for union types separately.

Exhaustive switches on sum types have been discussed in #41716, and the conclusion at the time was that it would be a mistake to include them. The rationale there is based on that proposal allowing underlying type terms, which is not proposed here, so perhaps it would be reasonable to include them. However, adding one creates a strong reason not to add the other, and I feel that syntactic unification with type terms of general interfaces is a better goal.

@gopherbot gopherbot added this to the Proposal milestone Aug 26, 2022
@thediveo
Copy link

Preferring a name more familiar to programmers than logicians, I propose to name these types union types.

Shouldn't this be reflected in the issue title, such as "proposal: Go 2: sigma (union) types"?

@seankhliao seankhliao added LanguageChange v2 A language change or incompatible library change labels Aug 26, 2022
@zephyrtronium zephyrtronium changed the title proposal: Go 2: sigma types proposal: Go 2: unions as sigma types Aug 26, 2022
@beoran
Copy link

beoran commented Sep 5, 2022

I think that even with generics, Go is sorely lacking some kind of union types. This proposal seems one of the better ones I have seen so far.

@atdiar
Copy link

atdiar commented Sep 5, 2022

Is "union" the right name for this feature? It looks like a dependent type i.e. the type of a variable depends on its value.

If so, isn't it something that is more complex and the theory is still making progress/ in flux?

As opposed to making constraint interfaces usable as regular interfaces which is more akin to discriminated unions where the discriminant would be the type pointer.

It's interesting nonetheless. (and I will come back to it once I have more time to delve into it.)

@beoran
Copy link

beoran commented Sep 5, 2022

Especially for modeling JSON data a value discriminated union would be extremely useful as well. So i like that this proposal covers both cases.

@zephyrtronium
Copy link
Contributor Author

@atdiar

Is "union" the right name for this feature? It looks like a dependent type i.e. the type of a variable depends on its value.

If so, isn't it something that is more complex and the theory is still making progress/ in flux?

Correct. The types I am proposing are exactly sigma types, a major feature of dependent types, with the limitation that types can only vary with static terms. I don't know whether dependent types are substantially more "in flux" than classical types – I have less time than I'd like to read literature on type systems – but dependent types are at least less common.

That difference of frequency is why I am proposing to call these union types rather than sigma types, even though formally they are the latter. The number of people I can say "sigma types where the dependent term varies only with static terms" to without substantially more explanation is very small. On the other hand, I can say "tagged unions where the tag is an enum" and communicate fairly precisely what is going on to many more people, probably including all of those who would understand "sigma types." So, it seems pragmatic to choose a name which evokes unions.

@ianlancetaylor
Copy link
Contributor

Is it possible to access fields of a sigma type without using a switch? Does that question even make sense?

One of the main reasons that we rejected union types long ago was that there seemed to be too much overlap with interface types. An interface type seems fairly similar to the sigma types described here, except that the key must itself be a type, not a value. Does this add sufficient power to the language to justify including it?

@zephyrtronium
Copy link
Contributor Author

@ianlancetaylor

Is it possible to access fields of a sigma type without using a switch? Does that question even make sense?

switch and assertions are the only ways to access the dependent type of a sigma type. This is the intent: in order to use a variant of a sigma type, you must first verify that that is the variant that was originally assigned to it. Interfaces are similar in this respect.

(The proposal does currently describe an applicative property that would allow "operations" on sigma types when all dependent types support that operation, intended to mirror the semantics of type parameters. In retrospect, it is the wrong property, and it is hard to implement in a compiler, and it is out of scope for what I intend this proposal to be. I will retract that paragraph.)

One of the main reasons that we rejected union types long ago was that there seemed to be too much overlap with interface types. An interface type seems fairly similar to the sigma types described here, except that the key must itself be a type, not a value. Does this add sufficient power to the language to justify including it?

The formal expressive power that sigma types would add lies primarily in reflection. Having a fixed set of types enumerable through reflection is especially beneficial in dynamic marshaling and unmarshaling; the "Unmarshaling JSON with a variant record" example in the proposal shows how sigma types would enable decoding JSON efficiently in one shot where it currently requires substantial additional code using json.RawMessage or entirely custom unmarshaling. This is in addition to the ability to handle individual fields which are e.g. either string or number, which classical union types address. The proposal's value-discriminated union types also provide a way to express a choice between types without necessarily including nil, which was a sticking point in #19412 and #41716.

Logically, sigma types are entirely distinct from interfaces. Rather than an open type describing a set of polymorphic behaviors, sigma types as proposed describe a fixed relationship between a finite set of constants and corresponding types. It is possible to implement a finite set of types using interfaces with unexported methods, as long as you can define those unexported methods (particularly requiring the types to be in the same package). It is much more natural to me to describe a list of AST node types or a list of assembly instruction types as a list of types.

Sigma types have additional practical benefits over interfaces because they can be implemented without implicit boxing. If every (discriminator and dependent) type in a given sigma type is scalar, it is feasible for a compiler to treat the entire sigma type as scalar. There is no way to achieve this with interfaces, barring some optimization that changes the entire representation of interfaces with provably finite implementations. So, sigma types could be admitted even in performance-sensitive contexts where interfaces are usually avoided.

In addition to all of this, perhaps the most important aspect of sigma types is not the code it enables us to write, but the code it prevents. Since the compiler rejects switches and assertions not in the discriminator set, there is no risk of mistaking the dynamic type, especially where sigma types could replace any or where T implements an interface but a programmer can check for *T. Sigma types can prevent bugs, and this comes with all the other advantages of static types for the developer experience.

So, I feel that the addition is justifiable.

@beoran
Copy link

beoran commented Dec 9, 2022

@ianlancetaylor After ten years of experience with Go, it seems to me the decision made then to not include unions into Go because of interfaces was wrong.

While it is possible to use all sorts of workarounds using interfaces to get something similar to unions, these workarounds are annoying and require a lot of extra work and boilerplate. I end up needing to simulate union types in almost any program Go I write.

As @zephyrtronium says, json serialization is one point where this proposal would help a lot, but also for GUI applications, or anything where unions are traditionally used in other programming languages.

@merykitty
Copy link

Logically, sigma types are entirely distinct from interfaces. Rather than an open type describing a set of polymorphic behaviors, sigma types as proposed describe a fixed relationship between a finite set of constants and corresponding types.

Currently type parameter constraint allows union of types, it seems to be a natural evolution for me to allow this syntax to specify a usable interface type. Additionally, it allows the interface to naturally have common behaviours across all implementations with methods.

Sigma types have additional practical benefits over interfaces because they can be implemented without implicit boxing.

I don't see any reason why this would not be possible with simple sealed interface, if the set of implementation is known, the compiler can inline the implementations directly into the interface instance.

Additionally, sealed interface allows more flexible hierarchies with some branches being cut off and some branches open for implementation. For example, consider a function that takes a list-like object:

type ListLike[T any] interface {
    []T | List[T]
}

type List[T any] interface {
    // get, append, etc
}

@zephyrtronium
Copy link
Contributor Author

zephyrtronium commented Dec 21, 2022

@merykitty

Currently type parameter constraint allows union of types, it seems to be a natural evolution for me to allow this syntax to specify a usable interface type. Additionally, it allows the interface to naturally have common behaviours across all implementations with methods.

That was #41716, for the old type list syntax for constraint interfaces. I do not believe another proposal exists for using the current type union syntax for sum types, but it would have many of the same problems, and it would require defining what ~T means outside a constraint. That said, one of my goals for this proposal is to make it possible to one day rewrite the entire section describing general interfaces as something like, "An interface may embed a non-interface type, in which case only types assignable to that type are assignable to the interface type."

Sigma types have additional practical benefits over interfaces because they can be implemented without implicit boxing.

I don't see any reason why this would not be possible with simple sealed interface, if the set of implementation is known, the compiler can inline the implementations directly into the interface instance.

It is possible in principle, but it has substantial disadvantages. The compiler would need to run this check for every interface (although, to be fair, a check of "has an unexported method or a method with an unexported parameter type, or embeds such an interface" shouldn't be too expensive) and contain additional code paths to generate unboxed interfaces. Moreover, the runtime and any unsafe code aware of the layout of interfaces would need to be aware that sometimes interfaces are special, if the compiler has chosen to use the unboxed representation. That was already a bar that excluded changes to interfaces in the past, in #8405.

Additionally, sealed interface allows more flexible hierarchies with some branches being cut off and some branches open for implementation. For example, consider a function that takes a list-like object:

I don't understand this. If you add any method to List, your example does not compile: https://go.dev/play/p/AgCfvZIho_i. Neither of the interfaces are sealed. It is strictly less flexible than what union types would enable, since you could easily define a union type with []T and List[T] as dependent types.

@DeedleFake
Copy link

DeedleFake commented Dec 21, 2022

I don't understand this. If you add any method to List, your example does not compile: https://go.dev/play/p/AgCfvZIho_i. Neither of the interfaces are sealed. It is strictly less flexible than what union types would enable, since you could easily define a union type with []T and List[T] as dependent types.

They're saying that they want to be able to do that. They're essentially talking about using the type list syntax to turn interfaces into union types so that you could define an interface that could be used to accept any of a list of types. In their example, I assume it would work something like

type ListLike[T any] interface {
  []T | List[T] // This would become legal.
}

type List[T any] interface {
  Len() int
  At(i int) T
}

func At[T any](list ListLike[T], i int) T { // This would also become legal.
  // The compiler could now enforce only legal branches and go vet or something could check completeness.
  switch list := list.(type) {
  case []T:
    return list[i]
  case List[T]:
    return list.At(i)
  case int: // Compile-time error, presumably.
    return *new(T)
  }
}

@merykitty
Copy link

@zephyrtronium Thanks for your reply,

That was #41716, for the old type list syntax for constraint interfaces. I do not believe another proposal exists for using the current type union syntax for sum types, but it would have many of the same problems, and it would require defining what ~T means outside a constraint.

We don't need to unlock the whole syntax, it is entirely possible to just say that interface { A | B | ... } is a type (as opposed to being only a constraint) only if all of A, B, ... are also types. I don't see this having any difference in comparison with your proposal. In fact, an interface value is a discriminated union that contains a type and a pointer to the dependent value already, so creating an entirely new construct seems confusing to me.

It is possible in principle, but it has substantial disadvantages. The compiler would need to run this check for every interface (although, to be fair, a check of "has an unexported method or a method with an unexported parameter type, or embeds such an interface" shouldn't be too expensive) and contain additional code paths to generate unboxed interfaces.

The compiler just needs to do it for unions only, for non-sealed interfaces this optimisation may be added later. Adding another construct to the language also needs a huge amount of additional code paths so I don't find that reason so compelling.

I don't understand this. If you add any method to List, your example does not compile: https://go.dev/play/p/AgCfvZIho_i. Neither of the interfaces are sealed.

The decision to disallow behavioural interfaces in unions is not fundamental and can be lifted if the need arises (#49054). The union ListLike[T] is sealed in the mean that it can be implemented only by a []T or a List[T].

@zephyrtronium
Copy link
Contributor Author

@merykitty My mistake, I misunderstood a few parts of your previous comment.

We don't need to unlock the whole syntax, it is entirely possible to just say that interface { A | B | ... } is a type (as opposed to being only a constraint) only if all of A, B, ... are also types. I don't see this having any difference in comparison with your proposal.

That's true, and it is exactly the approach I'm taking with this proposal: suggest a meaning for A | B now and leave the question of ~A for later. One difference with my proposal is that I'm proposing a strictly more powerful mechanism, which includes types dependent on arbitrary constants or types rather than just sum types. Another difference is that I am explicitly trying to address the concerns raised in #19412 and #41716, especially with regard to the zero value, whereas your suggestion seems like it either does not address those concerns or is making a choice that people disagreed with in those issues.

In fact, an interface value is a discriminated union that contains a type and a pointer to the dependent value already, so creating an entirely new construct seems confusing to me.

I think that's a very narrow view of interfaces, or perhaps better described as an implementation detail. In terms of the type system, interface types have subtypes that can substitute for the interface. That's the point of interfaces, but it is exactly the opposite of what I expect a discriminated union to do. That's why I said above that "sigma types are entirely distinct from interfaces." I want an explicit mechanism to express the idea I want, rather than having to exploit the interaction with an orthogonal concept to get less than half the advantages of that idea.

The compiler just needs to do it for unions only, for non-sealed interfaces this optimisation may be added later. Adding another construct to the language also needs a huge amount of additional code paths so I don't find that reason so compelling.

There is a substantial difference between adding a new representation for an existing language element and adding a new language element, and I think the former is more subtle in projects as large as Go compilers. That's especially informed by my attempts to implement #45494 which deals with the representation of interfaces. I could be wrong, though.

In general, I think you are talking about a different proposal than this one, although that proposal hasn't been written yet. I think it would be helpful to have it written, so that we can explore its advantages and implications in a focused place. I would also support such a proposal if it includes the ability to enumerate the variants through reflection – what I'm proposing is more powerful, but what you describe still does most of what I want.

@ianlancetaylor
Copy link
Contributor

As discussed in the last few comments, it seems that we can get a similar effect by permitting ordinary interfaces to include a union of embedded types (basically, allow constraint types as ordinary types, perhaps excluding the ~T notation). That would have the significant advantage of making the language slightly more orthogonal. If we do that, then is there is a strong reason to also have this proposal? And is there a strong reason to have this proposal instead of doing that?

Note that such an interface could perhaps be implemented without boxing, without affecting how the language behaves. Though perhaps in some cases an implicit type switch would be required when assigning to any or some other basic interface type.

@mrg0lden
Copy link

mrg0lden commented Jan 5, 2023

@ianlancetaylor
I know this might not be the best place to discuss this nor is it well established to be discussed, but I got curious. (IDK where to ask about this TBH, if there's a better place, please let me know.) Regarding what's between the parenthesis "perhaps excluding the ~T notation", so if a type constraint includes a ~T, it can still function as intended when used as a constraint, but is the ~ (only the base type is permitted) part only excluded or the whole ~T (the base type is excluded as well). This also brings many questions, like what actual type constraints are useful as both generic constraints and union types.

@ianlancetaylor
Copy link
Contributor

@mrg0lden I'm sorry, I don't understand the question.

Using ~T is fine in a type constraint, but in a union type it would permit any type whose underlying type is T. That would mean that type switches wouldn't be useful, and it would be effectively impossible to pull the value out of the union type. So we could only permit ~T if we added additional language mechanisms, such as permitting ~T as a case in a type switch.

@zephyrtronium
Copy link
Contributor Author

@ianlancetaylor

As discussed in the last few comments, it seems that we can get a similar effect by permitting ordinary interfaces to include a union of embedded types (basically, allow constraint types as ordinary types, perhaps excluding the ~T notation). That would have the significant advantage of making the language slightly more orthogonal. If we do that, then is there is a strong reason to also have this proposal? And is there a strong reason to have this proposal instead of doing that?

As I mentioned in my previous comment, it's difficult to evaluate this proposal against just allowing values of non-basic interface types because there is no formal proposal for the latter. Per #19412 and #41716, that proposal would need to make the decision about whether nil is a value of such types, or else find a way to obviate the decision. This proposal does so by having union types explicitly describe their zero value.

Deferring the zero value question, there are practical patterns enabled by the dependence of a type on constants. The union types I propose are roughly equivalent to a pattern I've seen in TypeScript whereby one combines literal types (e.g. false or 0 or '' as a type rather than as a value) with unions to restrict the types of other properties of an object depending on which arm of the union has a matching literal. The fact that I've seen this in TypeScript implies to me that we would see it in Go as well if it were possible. (I think narrowing by the discriminator would even work better with this proposal than it does in TS.) Whether that's "a strong reason to have this proposal" is subjective, but I like being able to reject incorrect programs without having to run them.

@DeedleFake
Copy link

This comment was originally about how I don't like that using interfaces with type lists as unions would require new top-level types in order to name the options, but after I wrote out examples of code I didn't like it wound up actually being O.K. Here's the examples to show how it would work in case someone else is having the same hesitations:

type Option[T any] interface {
  Some[T] | None
}

// This is necessary because a type parameter can't be used directly in a type list. It also
// fixes a potential issue from someone trying to do, for example, Option[None].
type Some[T any] struct {
  Val T
}

type None struct{}

func Example(arg string) Option[fs.File] {
  // ...
  if (condition) {
    return None{} // This feels strange to me. It's got no real connection to `Option[T]`.
  }
  // ...
}

type Result[T any] interface {
  Ok[T] | Err
}

type Ok[T any] struct {
  Val T
}

// This is necessary because an interface with methods can't be used directly in a type list.
type Err struct {
  Err error
}

func (err Err) Error() string { return err.Err.Error() }
func (err Err) Unwrap() error { return err.Err }

I would still prefer that the types in the union be more directly attached to the union type itself, but with proper package naming it should be fine most of the time, probably.

This mechanism also makes it possible to attach nicely specialized methods to the union by adding them to the interface:

type Option[T any] interface {
  Some[T] | None

  Or(T) Option[T]
}

// Same type definitions as before.

func (s Some[T]) Or(v T) Option[T] {
  return s
}

func (n None) Or(v T) Option[T] {
  return Some[T]{Val: v}
}

@Merovius
Copy link
Contributor

Merovius commented Jan 5, 2023

FWIW personally I very much like this proposal. I think it is one of the most well-designed proposals to add sum types to Go I've seen and IMO the result would fit pretty nicely into the language. And in a world where we didn't have union-elements for interfaces for constraints, I would probably be in favor of this proposal. But given that we do have union-elements, it just seems too confusing to have two so similar, but distinct mechanisms in the language. I think this is a better mechanism for sum/union types in themselves, but we should only have one and there's one which is already in.

@zephyrtronium
Copy link
Contributor Author

@DeedleFake
With that example, what is the value of var o Option[fs.File]? Is it nil? Then your option type really has Some[T], None, and nil as options, and I have to check for nil before I can safely call Or. If it's non-nil, which variant is it? How does the compiler initialize it; will the zero value be all zeros in memory like it is for every other type? Is it simply an illegal statement, i.e. there is no zero value? That would imply that it can't be the element type of a pointer, slice, map, or channel, and that reflect would have to introduce new panics in currently fine methods.

I really think a separate proposal for values of non-basic interface types would be helpful.

@DeedleFake
Copy link

The zero value of interfaces as unions has been discussed elsewhere, especially in #41716. I don't think that a good solution was ever decided on. I believe the main options it got narrowed down to were either that yes, it can be nil, or that it defaults to the first type listed. The latter option would be less surprising from a usage standpoint, I think, but it wouldn't correspond to all zeros in memory like other types so it feels a bit cheaty. It makes sense with most hypothetical types I can think of, though:

type Option[T any] interface {
  None | Some[T] // None as a default is good.
}

type JSONType interface {
  Null | String | Number | ... // Null makes sense as a default.
}

type Result[T any] interface {
  // I'm less sure about this one, but I guess that a nil error as a default isn't a terrible option.
  Err | Ok[T]
}

@zephyrtronium
Copy link
Contributor Author

@Merovius
One of my goals with this proposal, in particular with defining a meaning for T1 | ... | Tn outside interfaces, is to allow one day rewriting the spec on general interfaces exclusively in terms of other orthogonal language elements. In particular, I'd love to see a sequence like:

  1. define T1 | ... | Tn to be a type-discriminated union type (what I feel is an important part of this proposal);
  2. define ~T to be a new kind of type representing a supertype of T that acts like an unboxed interface but supports all the operations of T;
  3. add a clause to assignability of union types that implicitly chooses the right discriminator when exactly one dependent type is identical to the assigned value's type;
  4. say that a type is assignable to the interface type when it has all the interface's methods and is assignable to all the non-interface types embedded in it;
  5. remove everything about different kinds of interfaces, because now we can just say "interfaces can list methods and embed types" and "to instantiate a type parameter P with a type T, T must be assignable to P."

I'm not certain how viable all of those steps are, and most of them are out of scope for this proposal. I'm choosing to keep the proposal close to minimal, especially considering it is already very long.

Allowing values of non-basic interfaces is probably a shorter path to the goal of "unifying" interfaces at least when the measure is word count, but I also feel the disagreements in #19412 and #41716, so I'm not convinced that path can be taken at all.

@zephyrtronium
Copy link
Contributor Author

zephyrtronium commented Jan 5, 2023

@DeedleFake

The zero value of interfaces as unions has been discussed elsewhere, especially in #41716. I don't think that a good solution was ever decided on.

That is my point. Hence why it would be helpful to have a separate proposal instead of continuing to discuss a somewhat vague idea in (I hope) a very specific and concrete proposal about a related but different thing.

@DeedleFake
Copy link

I wasn't clear, but my original comment was meant as a response to @ianlancetaylor's comment a few before where he mentioned the idea of using interfaces as unions. I agree that the nil question is something major and it somewhat tilts me in favor of this proposal over simple reuse of interfaces, but I also think the orthogonality of reusing interfaces is a major draw and that it should be discussed relative to this proposal. It may also be possible to combine some of the ideas and thereby fix the nil problem while still reusing interfaces, or maybe even by adding a new feature, independent of usage as sum types, to interfaces that modifies their zero value behavior and removes their ability to be nil.

As these are two possible ways to solve the same problem, I don't think discussing their relative merits in the same comments section is that strange.

@ianlancetaylor
Copy link
Contributor

@zephyrtronium I filed #57644 for using type constraints as ordinary interface types. It's basically #41716 updated for the current language.

@zephyrtronium
Copy link
Contributor Author

@DeedleFake
Sorry, I misunderstood, and I was rather brusque in my last reply. I agree it makes sense to discuss the relative merits of different approaches to a similar problem. My concern was that people seemed to be suggesting another idea and evaluating this proposal against that idea without actually investigating its challenges and consequences. That's why I was suggesting opening another proposal for non-basic interface values. Now that we have #57644 (thanks @ianlancetaylor, sorry for having you placate my whining) and hence have a specific proposal to evaluate against, it makes sense to discuss the advantages of each; but I would still object to using this issue to discuss, say, sum types with no zero value.

@zephyrtronium
Copy link
Contributor Author

Now that we've had some time to evaluate #57644, I think I can provide a more concrete answer to @ianlancetaylor's earlier question, "is there a strong reason to have this proposal instead of doing that?" Because #57644 defines sum types as a variety of interface types, some significant problems arise.

The first problem is the zero value. Searching through the page for nil, I count currently five people who seem to disagree with the choice of that proposal with opinions ranging from "a different choice should be made" to "this choice is sufficient reason to reject the proposal." Since #57644 is "basically #41716 updated for the current language," I think it's reasonable to count four people with similar opinions there (interestingly with no overlap). Since this proposal chooses to make the zero value explicit, it largely avoids that problem; a zero value is required, but it can be made something meaningful and useful.

Another problem is the inability to define methods. Perhaps it would be reasonable to re-evaluate #39799 or #23185 if interfaces gain the ability to function as sum types, but they were rejected for good reasons, and it seems awkward to propose that they could only be used on interfaces with union elements if we wanted to circumvent those reasons. Without methods, it ends up that sum types don't replace any code I'd write today – the ability to restrict types to a finite list is still useful on its own for safety guarantees, but any kind of polymorphic behavior using them becomes easily dozens of lines of boilerplate. The union types of this proposal do not have any restrictions on defining methods.

A third problem is that interface-based sum types can't have method-bearing interfaces as variants; interface { string | fmt.Stringer } is illegal and probably infeasible to make legal in the general case. This arises because the dynamic type of an interface is never an interface. Conversely, union types as proposed have no such issue and are free to contain any types, including interfaces, even in type-discriminated unions.

@gophun has voiced an additional complaint that using interfaces for sum types may confuse the purpose of interfaces. In Go today, interfaces generally represent polymorphism; they are open types that are supertypes of infinitely many non-interface and interface types. Interfaces containing unions are suddenly finite, especially because they cannot contain interfaces. While both kinds of interface share the property of "a static type that can be one of multiple dynamic types," they are in other senses antithetical. The union types proposed here create an entirely separate thing to represent this logical disparity.

One advantage I can see to #57644 over this proposal is that the definition of a sum type can be factored out into several independent interfaces and composed afterward into one larger one without a semantic change, a la #57644 (comment). On the other hand, this proposal's union types can always be written across multiple lines, obviating the motivation to do so. I'm not sure I would call that "a strong reason" regardless. The strong reasons to prefer #57644 are the uniformity with today's Go and the lower pedagogical load.

So, we have in my opinion four strong reasons to have this proposal at least compete with #57644:

  • There is no contention over the zero value (at least not yet).
  • These types can have methods and therefore can be assigned to interfaces even when they include primitive types.
  • These types can include variants that are interfaces and so can provide an additional layer of polymorphism.
  • The concept is a new one, rather than repurposing an old concept for a logically different idea.

@DeedleFake
Copy link

DeedleFake commented Jan 10, 2023

It seems to me that the main draw of #57644 is that it attempts to reuse the constraint type lists for something that looks very similar, but it's actually quite different. The constraint type lists exist primarily to allow for a way to deal with operators in constraints. That usage doesn't apply to union types because there would be no way to guarantee at compile-time that the underlying types are the same. For example:

// Works fine because v1 and v2 are the same type.
func add[T ~int | ~uint](v1, v2 T) T { return v1 + v2 }

// Who knows what types v1 and v2 actually are relative to each other?
func add(v1, v2 interface{ int | uint }) (interface{ int | uint }) { return v1 + v2 }

I think it's a mistake to try to cram union behavior into type lists just because it looks similar. If a way can be found to make it work the way unions actually should with a similar syntax that isn't confusing, that would be great, but I'd rather have a completely new type and leave type lists only for constraints than wind up with a half-implemented union type that isn't very useful in practice just for the sake of what looks like consistency.

@DmitriyMV
Copy link
Contributor

@zephyrtronium

Since this proposal chooses to make the zero value explicit, it largely avoids that problem; a zero value is required, but it can be made something meaningful and useful.

Your Response type, specifically

type Response switch string `json:"type"` {
	case "alpha": string `json:"data"`
	case "beta": int    `json:"data"`
}

Does look like it have an implicit nil value.


Also, I think the last example is wrong and should be:

func Use() {
	r, err := Decode()
	if err != nil {
		// Note that this can include the case where the JSON specifies a
		// variant not in the union type.
		panic(err)
	}
	switch r := r.(switch) {
	case "alpha":
		doStringStuff(r)
	case "beta":
		doIntStuff(r)
	// case "anime": // Invalid because "anime" is not in the discriminator set.
	default:
		panic("r is nil")
	}
}

@zephyrtronium
Copy link
Contributor Author

zephyrtronium commented Jan 11, 2023

@DmitriyMV Good catch. The type in that example is invalid according to the proposal's own rules, because it needs to specify either default: or case "":. I'll update that when I get a chance. Note that it is not an implicit nil, though; the zero value is Response{}. Unlike a nil interface, you can call methods on it and use the dependent value of the zero variant.

Edit: Updated.

@yordis
Copy link

yordis commented Apr 21, 2024

In my work with event sourcing and event-driven systems, I frequently encounter the necessity of defining numerous empty interfaces to represent union types.

Each stream involves a repetitive process of declaring these interfaces to manage the polymorphism inherent in Commands and Events.

All Decide, Evolve, and Event Processors consequently rely heavily on type-casting. In some cases, it isn't a big deal; in others, it is high-risk and error-prone.

//Every stream in the system will do this!
type state struct{}
type Event interface{}
type Command interface{}

func decide(s state, command Command) (Event[], error) {
  switch c := command.(type) {
    // ...
  }
}
func evolve(s state, event Event) state {
  switch e := event.(type) {
    // ...
  }
}
//Every event handler
func handleEvent(ctx context.Context, event any) error {
 switch e := event.(type) {
    // ...
 }
}

Although I appreciate the simplicity of Go, the repetitive nature of this pattern in professional practice can be frustrating. The frequent need to handle type-casting adds complexity, especially the nuances between pointers, values, and actual registered types. Since Event Processors interact with generic interface{} types rather than concrete ones, there's an inherent risk of runtime type errors.

Event-driven systems typically feature loosely coupled components; the teams/people responsible for defining events often differ from those implementing the Processors. Therefore, Event Processors can not dictate how to implement a given interface to deal with polymorphism, practically speaking, and extremely high risk due to coupling; Conway's Law is not in our favor here.


A possible syntax improvement inspired by Rust might involve using brackets from Generics Syntax, which could offer a more intuitive and error-reducing approach:

enum Foo {
    Stringy[String],
    Numerical[u32]
}

@zephyrtronium
Copy link
Contributor Author

A possible syntax improvement inspired by Rust might involve using brackets from Generics Syntax, which could offer a more intuitive and error-reducing approach

Any new keyword added to Go is inherently not backward compatible. It isn't viable to add an enum keyword because that would cause any code using enum as a variable name to cease to compile. This proposal specifically serves as a way to reconcile unions with the keywords and syntax already used in Go. I don't think my proposed syntax is especially amazing, so alternative suggestions are good, but those alternatives need to be viable.

@Merovius
Copy link
Contributor

Merovius commented Apr 22, 2024

@zephyrtronium We now have a mechanism to add new keywords, if we want to: They can be only a keyword, if the go directive in go.mod has the right version. Of course not any added keyword is possible - in particular, we can't allow code to be valid before and after its introduction, with different meanings. I'm not sure whether the given suggestion is such a case (I don't think so, but there might be non-obvious ambiguities).

(Also, a proposal requiring a keyword will still have a higher cost, so a correspondingly higher burden of proof that this cost is justified. So it'll still be better not to need a new keyword. Just pointing out, that it's no longer categorically true that we can't have new keywords).

@atdiar
Copy link

atdiar commented Apr 26, 2024

@zephyrtronium I've been rereading the proposal again.

I seem to understand that the whole point is to introduce type constructors parametrizable by value.
Somewhat as opposed to typed value constructors which create a value of a given type. (dual)

This is the same need that is expressed in proposals that tackle the same issue, e.g. #67064 and #65555.

However, I think that implementing what you are describing may introduce complexity that can be avoided. For instance, it adds a new, non-obvious type kind.
I don't think it is necessary.
I'd rather see some additional form of subtyping/refinement types that would help us tackle the issue of const value/types.
With the preexisting facilities added when introducing parametric polymorphism in Go, I believe that this would integrate more seamlessly into the language as a form of dependent type.

For example:

type Deux int={2}
Should be a subtype of int.

I think it should require less changes at the compiler level (to be checked). And then could work as a type argument to generic functions, essentially taking care of the dependent typing needs.

That might be a better route for typed enums, unions and heterogenous collections.

In any case, that is an interesting take.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LanguageChange Proposal v2 A language change or incompatible library change
Projects
None yet
Development

No branches or pull requests