Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: Go 2: add spaceship operator <=> #45319

Closed
ianlancetaylor opened this issue Mar 31, 2021 · 54 comments
Closed

proposal: Go 2: add spaceship operator <=> #45319

ianlancetaylor opened this issue Mar 31, 2021 · 54 comments
Labels
Milestone

Comments

@ianlancetaylor
Copy link
Contributor

It's moderately common in any programming language to write code like (from compress/bzip2/huffman.go):

	if pairs[i].length < pairs[j].length {
		return true
	}
	if pairs[i].length > pairs[j].length {
		return false
	}

Perl has a special operator for this kind of code: <=>, known as the spaceship operator. The spaceship operator is also available in C++20, Ruby, and PHP.

In Go, the new operator <=> would be permitted for any two values a and b. It would follow the same type rules as for any ordered comparison. The expression a <=> b would evaluate to an untyped integer value. The value would be -1 if a < b, 1 if a > b, and 0 if a == b. For code like the above it would be used as

	if cmp := pairs[i].length <=> pairs[j].length; cmp < 0 {
		return true
	} else if cmp > 0 {
		return false
	}

or as

	switch pairs[i].length <=> pairs[j].length {
	case -1:
		return true
	case 1:
		return false
	}

The operator lets code capture all aspects of a comparison in a single expression. The cost is that in the typical case additional comparisons are required to use the result, but those additional comparisons are simpler, and the code is easier to read because there is no need to verify that the different comparisons are using exactly the same values.

@ianlancetaylor ianlancetaylor added LanguageChange v2 A language change or incompatible library change Proposal labels Mar 31, 2021
@ianlancetaylor ianlancetaylor added this to the Proposal milestone Mar 31, 2021
@seebs
Copy link
Contributor

seebs commented Mar 31, 2021

lessThan, greaterThan := a <=> b
if lt, gt := pairs[i].length < pairs[j].length; lt {
  return true
} else if gt {
  return false
}
// they're equal

// or alternatively:
switch { // implicit-true
case lt: return true
case gt: return false
}
// they're equal

I suggest this because, while I actually find the thing where true and false aren't numeric values and can't be |d together somewhat frustrating at times, I think it also makes for clearer code.

Alternatively:

lessThan, greaterThan := a <> b
lessThan, equalTo, greaterThan := a <=> b

@robpike
Copy link
Contributor

robpike commented Mar 31, 2021

Isn't this sort of thing what the years of work on parametric types is supposed to solve? :)

@seh
Copy link
Contributor

seh commented Apr 1, 2021

Why mandate that the nonzero return values be -1 or 1, as opposed to just negative or positive respectively? With the latter approach, <=> implementations can use subtraction.

@atdiar
Copy link

atdiar commented Apr 1, 2021

The most straightforward version without the operator is much easier to understand for me.
Just a data point.

Probably stems from the fact that it uses regular mathematical operators.

It seems harder to human-parse the version with the spaceship operator.

A generic comparison function if can be defined would be better.

@seebs
Copy link
Contributor

seebs commented Apr 1, 2021

I actually sort of like <> as a less-than-greater-than operator, and i know roughly what it means.

if this didn't exist, and i showed someone a program using it, and they otherwise knew Go, i don't think they'd be confused by it.

i was also tempted at one point by proposing something like a , ok form for comparison operators, but if you change the semantics of the existing operators, you have broken code probably.

oh, and i think the reason to mandate -1/1 is so you can switch on them. (and now i want to be able to use -1 as an index into a thing, but i see problems here.)

if you did it with arbitrary ranges, the switch would be:

switch {
case cmp < 0:
case cmp > 0:
}

but at that point, you might as well just write

switch {
case a < b:
case b > a:
}

and i just realized that's wrong and i'm leaving the error here to point out: yes, the spaceship operator DOES prevent mistakes actual programmers can make even while thinking about the problem.

@cespare
Copy link
Contributor

cespare commented Apr 1, 2021

@seebs <> is used in SQL to mean !=.

@sbinet
Copy link
Member

sbinet commented Apr 1, 2021

as syntax is always a bit contentious, I'd propose to recognize that fact as a primordial law of human nature and call it a tie.

actually, make that the tie starship: |=|.

(an improved version of gopls could also add some extra FX, playing the sound[1] of a TIE fighter as soon as such a symbol would appear on your favorite editor. adding extra harmonics for each extra TIE operator appearing.)

@Integralist
Copy link

I'm not sure if the example given is the most common use case, because if it is then I think for the sake of one line of code (the original was 6 loc the new operator version is 5 loc) the code clarity is reduced for not much benefit. I personally much prefer the current go implementation with explicit conditional blocks.

@martisch
Copy link
Contributor

martisch commented Apr 1, 2021

One upside I can see to an explicit <=> is that it is much easier for e.g. the Go GC compiler to recognise and optimize for then trying to figure out if a switch or if else chain is optimizeable (and keeping observable side effects). A simple case that can be replaced would be and it could just be made a direct function call to the runtime internal cmpstring:

func Compare(a, b string) int {

@atdiar
Copy link

atdiar commented Apr 1, 2021

Is time.Time at least partially ordered if we take into account timezones?
Because it just hit me that I didn't pay attention to this... 🙃

@yiyus
Copy link

yiyus commented Apr 1, 2021

What advantage does the proposed solution offer with respect to:

if cmp := pairs[i].length - pairs[j].length; cmp < 0 {
	return true
} else if cmp > 0 {
	return false
}

?

@josharian
Copy link
Contributor

Or we could also introduce a ternary logic predeclared type. (Also useful for optional bools.) <=> would return one of those.

@martisch
Copy link
Contributor

martisch commented Apr 1, 2021

What advantage does the proposed solution offer with respect to:

if cmp := pairs[i].length - pairs[j].length; cmp < 0 {
	return true
} else if cmp > 0 {
	return false
}

?

That the proposed solution doesnt return false when pairs[i].length == 0 and pairs[j].length == 1 because length is an unsigned integer (see original code linked) and cmp < 0 thereby is never true in the code quoted.

@robpike
Copy link
Contributor

robpike commented Apr 1, 2021

Another option is to have an operator, perhaps <> or <=>, which randomly chooses whether the comparison succeeds or fails. This could be used to drive high-quality tests for fault-tolerant programming.

@yiyus
Copy link

yiyus commented Apr 1, 2021

@martisch Ok. That, and being able to say that Go has a spaceship operator, I think are reasons enough to support this proposal.

@Dynom
Copy link

Dynom commented Apr 1, 2021

Readability is not always an easy thing to reason about. And as such I currently haven't seen an improvement that <=> could bring:

switch pairs[i].length <=> pairs[j].length {
case -1:
	return true
case 1:
	return false
}

versus:

switch {
case pairs[i].length < pairs[j].length:
	return true
case pairs[i].length > pairs[j].length:
	return false
}

@seebs
Copy link
Contributor

seebs commented Apr 1, 2021

Consider:

switch {
case pairs[i].length < pairs[j].length:
	return true
case pairs[j].length > pairs[i].length:
	return false
}

It could take a while to spot the bug there. That's the only real benefit of the thing, I think -- lets you do the comparison once and get both results, making it impossible to accidentally use two slightly incompatible/different comparison terms.

@ghost
Copy link

ghost commented Apr 1, 2021

I'd like to see the spaceship operator hidden behind a new option to the Go compiler. As I see it, any program that uses the operator MUST be compiled as

go build -spaceship

otherwise it is a compile-time error.

@josharian
Copy link
Contributor

Not to bikeshed, but should it perhaps be written 🚀?

@sbinet
Copy link
Member

sbinet commented Apr 1, 2021

@seebs indeed that bug is quite hard to catch.
perhaps time to introduce the "dumbo-perator": ( |<=>| )

return (i|<=>|j).length

@ianlancetaylor
Copy link
Contributor Author

Jokes aside, I think that if we do add this to Go we should spell it as <=> because that is how it is spelled in other languages. We would get no benefit from being different here.

@zephyrtronium
Copy link
Contributor

zephyrtronium commented Apr 1, 2021

Why mandate that the nonzero return values be -1 or 1, as opposed to just negative or positive respectively? With the latter approach, <=> implementations can use subtraction.

<=> returning an element of {-1, 0, 1} is particularly important, I feel. This allows a beautiful expression for min and max:

func Min(a, b int) int {
	return (a&(a<=>b)^(b<=>a)&b^1)&((a<=>b)|(b<=>a)) |^ ((a<=>b)|(b<=>a))&a
}

func Max(a, b int) int {
	return (a&(b<=>a)^(a<=>b)&b^1)&((a<=>b)|(b<=>a)) |^ ((a<=>b)|(b<=>a))&a
}

Or, if writing more lines of code is forgivable, an option that works for any ordered type:

type T = int // or float64, string, ...

func v0(a, b T) T { return a }
func v1(a, b T) T { return b }

func Min(a, b T) T {
	// Every ASCII bracket character on one line -- stunning!
	return [3](func(a, b T) T){v0, v0, v1}[1+(a<=>b)](a, b)
}

func Max(a, b T) T {
	return [3](func(a, b T) T){v0, v0, v1}[1-(a<=>b)](a, b)
}

Notably, if <=> is constant-time, then both of these implementations are as well.


A question, though: what is the result of a <=> b when a, b, or both are NaN?

@seebs
Copy link
Contributor

seebs commented Apr 1, 2021

i would expect x, y := a <=> b to be identical to x, y := a < b, b > a

@egonelbre
Copy link
Contributor

Not to bikeshed, but should it perhaps be written 🚀?

Unicode has 🛸, which might be more appropriate given the similarity.

@DmitriyMV
Copy link

So... was this a real proposal or an April fools joke? 🧐

@ianlancetaylor
Copy link
Contributor Author

Returning multiple values from the operator is a possibility, but it makes it considerably harder to use the operator in an expression.

@seebs
Copy link
Contributor

seebs commented Apr 2, 2021

Much though I like it as a new thing for operators to do, for it to be usable in general you'd need a way to generally handle expressions producing multiple values in expressions... Although come to think of it, I'm not sure how you'd use it in an expression, in general. Like, in an expression, you'd presumably want one or the other of the values.

This gets into the same space as quo, rem := a /% b, or bits, carry := a ** b or whatever else for operations that really do naturally produce two results.

@zephyrtronium
Copy link
Contributor

I did indeed interpret this as an April fools' joke. But, as my previous comment demonstrates and @DmitriyMV alluded to, I think that the proposal is very abusable. If I'm not mistaken, it makes it possible to implement any combinational circuit. While this is not quite the same power as the C family's ? : ternary operator providing Turing-complete expressions, I think the same justifications against including it apply.

Also, my question from my previous comment still stands. What is the result of a <=> b when a, b, or both are NaN, considering that all of a < b, a > b, and a == b are false in any of those cases? Is it even legal to use <=> with floating-point operands? Does the operation panic when either operand is NaN (analogous to using u == v when either is an interface type with non-comparable dynamic type)? I'm not familiar with how other languages handle this.

@Dynom
Copy link

Dynom commented Apr 4, 2021

[..]
It could take a while to spot the bug there. That's the only real benefit of the thing, I think -- lets you do the comparison once and get both results, making it impossible to accidentally use two slightly incompatible/different comparison terms.

I'm sorry, but I can't get behind that rationale with something this trivial. It's still possible to make a mistake, even with <=>. It's a programming language, if you want guarantees, write a test that verify correctness.

@josharian
Copy link
Contributor

do the comparison once and get both results, making it impossible to accidentally use two slightly incompatible/different comparison terms.

If it happens frequently enough, this might be a good candidate for a vet (or staticcheck) check.

@bcmills
Copy link
Contributor

bcmills commented Apr 5, 2021

I am absolutely a fan of replacing pairs of < and > comparisons with switch-friendly ordered comparisons.

However, I am not a fan of the <=> syntax, especially given that it does not appear to enable any sort of short-circuit evaluation. Is this really worth adding a whole new token, rather than a predeclared generic function and

For example, with a predeclared cmp function, the original examples might read as:

	if ord := cmp(pairs[i].length, pairs[j].length); ord < 0 {
		return true
	} else if ord > 0 {
		return false
	}

or (my preferred idiom):

	if ord := cmp(pairs[i].length, pairs[j].length); ord != 0 {
		return ord < 0
	}

@bcmills
Copy link
Contributor

bcmills commented Apr 5, 2021

@yiyus, @martisch

What advantage does the proposed solution offer

It is also guaranteed not to overflow — even if the things being compared may include large negative values.

It is also less prone to transposition bugs. When you want the semantics of x < y, checking for cmp(x, y) < 0 is arguably easier to remember than x - y < 0.

@fzipp
Copy link
Contributor

fzipp commented Apr 5, 2021

It's moderately common in any programming language to write code like (from compress/bzip2/huffman.go):
[...]
those additional comparisons are simpler, and the code is easier to read because there is no need to verify that the different comparisons are using exactly the same values.

The code in compress/bzip2/huffman.go is difficult to read because the repetition of pairs[X].length obfuscates the boolean expressions, and because the branches alternate between "return true" and "return false". This code would be clearer:

sort.Slice(pairs, func(i, j int) bool {
	a := pairs[i].length
	b := pairs[j].length
	if a == b {
		return pairs[i].value < pairs[j].value
	}
	return a < b
})

It makes clear that this is a less function, and the special case is a == b.

For reference, this is the original code:

sort.Slice(pairs, func(i, j int) bool {
	if pairs[i].length < pairs[j].length {
		return true
	}
	if pairs[i].length > pairs[j].length {
		return false
	}
	if pairs[i].value < pairs[j].value {
		return true
	}
	return false
})

So the motivating example from the real world doesn't convince me that a spaceship is necessary for clarity.

@seebs
Copy link
Contributor

seebs commented Apr 5, 2021

I actually sort of like the built-in function. Downside, the name is probably clashy with stuff, but we already have that with predeclared identifiers all the time. (It took me at least a year to stop writing len := len(s) and then being confused when, five lines later, i couldn't call an integer as though it were a function taking a slice as a parameter.)

@josharian
Copy link
Contributor

This operation comes up most often in an ordered sequence of comparisons: if a1 != a2 return a1 < a2, else if b1 != b2 return b1 < b2, else return false.

Perhaps there's a cleaner way to write that entire construction using generics, thus obviating the need for new language support. I'm imagining a callsite roughly like: return seqcmp(a1, a2, b1, b2, false). The final value is the default value if all previous arg pairs compare equal.

I'm not entirely sure how to actually write that function signature though. Hopefully I'm missing something obvious.

@seebs
Copy link
Contributor

seebs commented Apr 5, 2021

You could almost do this with a hypothetical returnIfNotZeroValue x statement, except that that in turn would only work with the -1/0/1 comparators, not the < and > comparators.

@josharian
Copy link
Contributor

returnIfNotZeroValue would also work nicely with errors (and #21182). But that's a bit of a distraction for the purposes of this conversation. :)

@bcmills
Copy link
Contributor

bcmills commented Apr 5, 2021

@josharian, I don't think a generic seqcmp is viable without a whole lot of extra machinery — at least, not in a way that is efficient as the number of pairs scales up.

(Short-circuiting is often very helpful for comparators, and short-circuiting in a function like that requires some form of lazy evaluation, which I don't see Go ever acquiring.)

@randall77
Copy link
Contributor

Instead of seqcmp, we could allow < of structs which do the obvious thing:

type S struct {
    x, y int
}
S{x:5, y:6} < S{x:3, y:4}

So < on S compares the x field and returns the result if not equal, otherwise compares the y field. It's equivalent to the proposed seqcmp but fits the language better IMO.
It does enshrine field ordering in code order, which is something we've been avoiding so far.

Not sure if that would cover most of the intended uses of <=>.

@ianlancetaylor
Copy link
Contributor Author

@bcmills

(Short-circuiting is often very helpful for comparators, and short-circuiting in a function like that requires some form of lazy evaluation, which I don't see Go ever acquiring.)

Well, there is #37739, but I agree that it's not particularly likely.

@josharian
Copy link
Contributor

josharian commented Apr 5, 2021

@randall77 as you know, there are some subtleties around short-circuiting and interfaces. Do they panic if short-circuited? If not, there's an asymmetry vs struct equality. Not a showstopper, but unfortunate.

Struct field ordering matters for type identity, reflection, etc. So I'm not sure adding a dependency on it for ordering hurts too much.

There's also a proposal from @martisch to support array ordering. Can't find it at the moment.

@guodongli-google
Copy link

do the comparison once and get both results, making it impossible to accidentally use two slightly incompatible/different comparison terms.

If it happens frequently enough, this might be a good candidate for a vet (or staticcheck) check.

This is going to be a pure syntactic matching check, and a StaticCheck checker seems more appropriate. The frequency may not be high since such typos will not pass the unit tests.

@randall77
Copy link
Contributor

@randall77 as you know, there are some subtleties around short-circuiting and interfaces. Do they panic if short-circuited? If not, there's an asymmetry vs struct equality. Not a showstopper, but unfortunate.

I think if we defined struct ordering with <, we would require that each field be orderable. Interfaces are not orderable (only comparable), so I think the issues you mentioned would not surface here.

There's also a proposal from @martisch to support array ordering. Can't find it at the moment.

#39355

@griesemer
Copy link
Contributor

Syntactic ambiguities aside, if there was a <=> or equivalent, one could define

a, b, c <=> e, f, g

such that the result is a <=> e unless that is zero, in which case it would be b <=> f unless that is zero, in which case it would be c <=> g (etc.).

For instance, given two source "positions" p and q, one might sort them just so:

p.filename, p.line, p.column <=> q.filename, q.line, q.column

But @randall77 's suggestion (comparing of structs) would do the same thing.

@ianlancetaylor
Copy link
Contributor Author

The advantage of a, b, c <=> e, f, g over a struct comparison is that the former could short-circuit: if a and b are not equal, there would be no need to evaluate b, c, f, or g.

@deanveloper
Copy link

deanveloper commented Apr 15, 2021

I don't see why the original proposal can't just be something like math.Compare. It is much more clear, although I guess a disadvantage is that it wouldn't be able to return an untyped constant. But, Go doesn't have many uncommon operators, and it would be nice if it stayed that way.

In math/v2 (which will hopefully use generics) there also wouldn't need to be different functions for different types:

package math

func Compare[T constraint.Ordered, R constraint.Real](t1, t2 T) R {
    switch {
    case t1 < t2:
        return -1
    case t1 > t2:
        return 1
    }
    return 0
}

And the original example with math.Compare:

	if cmp := math.Compare(pairs[i].length, pairs[j].length); cmp < 0 {
		return true
	} else if cmp > 0 {
		return false
	}

@ianlancetaylor
Copy link
Contributor Author

Based on the discussion and the emoji voting, this is a likely decline. Leaving open for four weeks for final comments.

@ianlancetaylor
Copy link
Contributor Author

No further comments.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests