Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: make array types ordered when their element type is ordered #39355

Closed
martisch opened this issue Jun 2, 2020 · 16 comments
Closed

proposal: make array types ordered when their element type is ordered #39355

martisch opened this issue Jun 2, 2020 · 16 comments
Labels
FrozenDueToAge LanguageChange Proposal v2 A language change or incompatible library change
Milestone

Comments

@martisch
Copy link
Contributor

martisch commented Jun 2, 2020

According to the Go spec array types are comparable but not ordered. Which is odd for e.g. [...]byte array values since strings are ordered and comparable.

As arrays values dont have a capacity and the same number of elements for the same type the problems of having multiple ways to consistently define an ordering or comparable with or without taking capacity into account such as for slices is not the case for arrays types. (#38377, #23725)

I propose to make array types ordered if their element types are ordered so it is possible to use comparison operators like < on two array values.

Semantics

Array values are ordered if values of the array element type are ordered.
An array a is less than an array b if there exists and index i with a[i] < b[i] and there is no index j with j < i and a[i] != b[i].

Note: This does not require the array element values to be sorted. Just for the type of the elements having a defined sort ordering according to the Go specification. e.g. int, uint8, float64... . Arrays of type [2]bool or [3]interface{}wont be orderable.

Examples

func lessVector(a,b [16]byte) bool {
  return a < b
}

Or to avoid a copy:

func lessVector(a,b *[16]byte) bool {
  return *a < *b
}

For the above the compiler could then optimise this to an inlined 128bit comparison on e.g. amd64.

What are the current alternatives that work and have the same semantics?

For byte arrays we have:

func lessVector(a,b [16]byte) bool {
  return string(a[:]) < string(b[:])
}
func lessVector(a,b [16]byte) bool {
  return bytes.Compare(a[:], b[:]) < 0
}

Others array types need a manually crafted function looping over the elements for each type.

Motivation

This came up when writing a function that finds a minimum from a slice of array values efficiently. For [16]byte array values one can use string conversions to tap into the compilers optimisation for comparing known length byte sequences. The Go Gc compiler inlines known length short string comparisons. This currently doesn't work for string(a[:]) < string(b[:]) but could be detected by the compiler. It however instead be better if the code was more concise and had the same optimisations applied to e.g. [8]uint which cant use the string conversion pattern.

Complexity and Overhead

Go code that doesn't use array ordering wont be affected as this proposal is backwards compatible and doesn't add runtime overhead for other array operations. A new spec compatible Go Compiler would need an adjustment of the type checker and compiler.

In contrast this can avoid complexity by adding a concise expressible way to order two large memory regions in form of arrays thats easy to detect and optimise vectorised by the compiler instead of trying to detect a for loop that does element wise comparison.

The computational work done for array comparison is similar to string comparisons or to already allowed array equality checks. Therefore the work involved in a simple operator such as < should not be surprisingly large for a Go programer already using strings and arrays.

A minimal implementation could implement one generic array compare runtime function similar to the generic runtime hashing function that is used with compiler generated calls:

func typehash(t *_type, p unsafe.Pointer, h uintptr) uintptr {

A compiler could then specialise functions for some types and inline and optimise small array compares to avoid using a generic function as an optimisation.

Alternatives

Go users can write out their compare function as currently and the go compiler can detect such a loop and optimise matching ones for some types like uint/byte/... to inlined memory compares.

This would be a special case of a potential general future autovectorisation optimization.
Upsides:

  • does not require a language change
  • keeps the language consistent in that neither arrays, slices or structs support ordering
  • no need to support e.g. floats or complex that have NaNs in the implementation
  • leaves the possibility open for a future language change when more evidence can be collected based on vectorisation optimization if this case is used often enough

Other considerations

In context of the addition of generics this likely also allow array types to be used in containers/functions that expect the passed in type to be ordered. On the otherside generics will
even make it easier to write a generic array compare function (based on converting to slices first).

@gopherbot gopherbot added this to the Proposal milestone Jun 2, 2020
@martisch martisch added LanguageChange v2 A language change or incompatible library change labels Jun 2, 2020
@martisch martisch changed the title proposal: make array types ordered proposal: make array values ordered Jun 2, 2020
@martisch martisch changed the title proposal: make array values ordered proposal: make array values ordered when they have ordered elements Jun 2, 2020
@martisch martisch changed the title proposal: make array values ordered when they have ordered elements proposal: make array values ordered when they have orderable elements type Jun 2, 2020
@martisch martisch changed the title proposal: make array values ordered when they have orderable elements type proposal: make array values ordered when they have orderable element type Jun 2, 2020
@mvdan
Copy link
Member

mvdan commented Jun 2, 2020

I think this is reasonable - having to convert a byte array to a slice, and then to a string, is unfortunate even if the compiler knows to optimize those away.

I'm curious to understand why you want to draw the line at arrays, though. The proposal makes [...]byte or [...]int ordered, and one can still convert []byte or []rune to string to use < on them. But other slice types like []int16 or []float32 are left without a good way to use <. With this proposal one could somehow copy the contents into an array, but that feels wrong.

That is, why not allow slice values to be ordered when their element type is also ordered, just like you propose for arrays?

@martisch martisch changed the title proposal: make array values ordered when they have orderable element type proposal: make array types ordered when their element type is ordered Jun 2, 2020
@martisch
Copy link
Contributor Author

martisch commented Jun 2, 2020

Ordering is a property of values of the same type. The type of a []byte slice with capacity 16 or 17 is the same even if the length is 16. However when ordering these two different slices of same type its unclear if capacity and values past the length should be part of the ordering or not. See for previous discussions on this in #38377 which already needs to be considered for equality. Strings dont have the same problem since they have more semantical context in which they are used and no capacity.

I dont think this proposal further restricts the possible ways slice equality could be defined as array equality is already defined. It may restrict the ways in which a consistent slice ordering could be defined for the subset of slice values with same length and capacity. However I do not think the ordering proposed for these is controversial as the string ordering already also is consistent with the proposal here.

@jimmyfrasche
Copy link
Member

What about structs?

@bcmills
Copy link
Contributor

bcmills commented Jun 2, 2020

@jimmyfrasche, that seems like a matter for a separate proposal.

@martisch
Copy link
Contributor Author

martisch commented Jun 2, 2020

I intenionally left out structs as they would have their own additional complications to discuss: ordering of fields (future go versions might pack them differently in order into memory), blank fields... (not saying discussions on these might be controversial but its a broader discussion than this proposal) and I wanted to avoid getting making ordering for arrays work defocused by how it would work for structs.

On the implementation side I expect supporting array ordering to be much simpler than arbitrary structs. So there might be some new per type alg functions necessesary for structs like we have for equality.

@ianlancetaylor
Copy link
Contributor

For an array of a floating point type, I'm not sure how NaN values should be handled. For a single floating point value, a comparison involving NaN always evaluates to false. Does this mean that we should simply ignore any array elements where either side of the comparison is NaN? And if there is NaN at every index the comparison evaluates to false? What do other languages do? Or am I just missing something obvious?

@cherrymui
Copy link
Member

An array a is less than an array b if there exists and index i with a[i] < b[i] and there is no index j with j < i and a[i] != b[i].

If I understand this definition correctly, if the first non-equal element is NaN (in either array), a < b will evaluate to false. If there is an i such that a[i] < b[i], later NaN elements don't matter.

@martisch
Copy link
Contributor Author

martisch commented Jun 2, 2020

For an array of a floating point type, I'm not sure how NaN values should be handled. For a single floating point value, a comparison involving NaN always evaluates to false. Does this mean that we should simply ignore any array elements where either side of the comparison is NaN?

Since Go does not ignore NaN in equality comparison they should not be ignored.
I think it is only consistent if [1]float64 and other float arrays behave like their element type. Go does not ignore math.NaN < X (variable X) but always evaluates to false. Therefor it makes sense for [1]float64{math.Nan()} < [1]float64{X} to be also false. Then [...]float64{math.Nan(), ...} < [...]float64{X, ...}. Which leaves the question what happens if there are element wise comparisons before the math.Nan() which are not all equal. In that case the array comparison is whatever the first non equal element wise comparison is.

Python:

>>> (float('nan'),0) < (2,1)
False
>>> (float('nan'),2) < (2,1)
False
>>> (1,2) < (float('nan'),1)
False
>>> (1,float('nan')) < (2,1)
True

C++ seems to follow the same lexicographical comparison for std::array.
https://en.cppreference.com/w/cpp/container/array/operator_cmp

@bcmills
Copy link
Contributor

bcmills commented Jun 3, 2020

The NaN question seems closely related to #8606, in that it pertains to order-of-evaluation for comparisons.

However, unlike #8606, comparing a NaN doesn't generally cause a Go program to panic, so an implementation that uses pipelining or vectorization wouldn't change its behavior depending on how far ahead it evaluated.

@bradfitz
Copy link
Contributor

The only contrived downside I can think of for this is that there might be existing types in the wild with opaque underlying types that are arrays that would now be orderable when the author doesn't want that, and they'd have no way to prevent that without a breaking API change. (For instance, the type might already provide a Less/Compare method that works correctly, and < might not.)

@bradfitz
Copy link
Contributor

Aside: this is how https://golang.org/pkg/internal/fmtsort/#Sort works today.

@jimmyfrasche
Copy link
Member

This would complicate the latest generics draft. You wouldn't be able to use a type list to find all ordered types. There would need to be an additional predeclared constraint for ordered like there is for comparable.

You would, however, be able to write a func Less(type T constraints.Ordered)(a, b []T) bool predicate and use it with arrays like slices.Less(arr1[:], arr2[:]). Ideally that would eventually get optimized down to be as efficient as arr1 < arr2.

@martisch
Copy link
Contributor Author

martisch commented Jun 21, 2020

This would complicate the latest generics draft. You wouldn't be able to use a type list to find all ordered types. There would need to be an additional predeclared constraint for ordered like there is for comparable.

Yes that is right. Thanks for pointing it out. I do not think that adds much complexity as there already is a predeclared comparable. Having ordered as a frequently used constraint that is closely related to comparable also predeclared even without this proposal seems a good idea to me. This will leave the question open to have more types orderable in the future when those types cant be enumerated anymore in a package constraints.

@jimmyfrasche
Copy link
Member

Another thing to consider is that the constraints in the draft could later be defined to work as regular interfaces:

  • comparable would be interface{} except that you could only assign comparable types to it
  • an interface with a type list would only let you assign types matching that type list to it

That would limit what you could put in the interface but not change anything fundamental about interfaces. They'd still work exactly as ordinary interfaces in every other regard.

If there were a predeclared ordered constraint, does that mean that you can use < and co. with such interfaces but not other interfaces? Otherwise, would an ordered interface never be usable except as a type constraint? Would it mean that you can only put comparable values in it but you can't actually use the comparison? Used as a constraint ordered would be different than a type list containing the now fixed list of ordered types.

None of that is insurmountable but it has the potential to knock over a few dominoes and this would all need to be worked out at the same time as generics.

@martisch
Copy link
Contributor Author

martisch commented Jun 21, 2020

  • comparable would be interface{} except that you could only assign comparable types to it

That would be odd and backwards incompatible I think since there should be the possibility to have all types with no constraint. e.g. for storing values into an interface{}.

If there were a predeclared ordered constraint, does that mean that you can use < and co. with such interfaces but not other interfaces?

This proposal does not define ordering on regular interfaces. Regular interface values are not ordered even for existing types. Since the hypothetical new proposal is "They'd still work exactly as ordinary interfaces in every other regard." the answer would be no one can not use '<' with such interfaces by definition of them working exactly as ordinary interfaces.

There seem to be other aspects that dont work out if using constraints as regular interfaces. A type from a type list with slices/string can currently be ranged over but slices/strings as an interface value cant.

None of that is insurmountable but it has the potential to knock over a few dominoes and this would all need to be worked out at the same time as generics.

The current generics draft does not define constraints to be ordinary interfaces so I do not see it knocking over dominoes in the current one. Changing constraints to regular interfaces with interfaces{} not meaning no constraints and no type being able to be used with '<' if specified in place of generic type is the bigger knock over to begin with in my opinion.

Taking array ordering into account for generics is fine but I do not see it making generics (in a way that doesnt break existing backwards compatibility) much more complex if it would be introduced before generics (which I do not mean needs to happen). The complexity is already there with comparable types. Ordering is another relation that can be treated like comparable. Both need to be distinguishable with syntax but that is solvable.

@martisch
Copy link
Contributor Author

Retracting this proposal. Doing my part to lessen the backlog of proposals that need decline rationales so the ones that are likelier to be approved can be picked up by the proposal committee. #33892 (comment)

Having thought more about this and implementation especially with complex element types in mind (float, arrays, structs, interfaces ...) I think arrays being ordered on their own is not useful enough to be a language feature.

In general if ordering is extended it should likely be adding with ordering of slices and structs at the same time to make a more compelling case for consistent language addition.

Having e.g. [2]int be ordered but [2]int[:] or struct{int, int} not makes ordering even more odd in the sense vs currently no composite types being ordered. Especially once generics are supported.

To keep binary sizes small there needs to be a generic implementation for more complex cases like element types that e.g. are arrays themself and e.g. can contain floats. This generic implementation adds complexity, is not performant and not likely to be used often.

For performance it is easier to detect loops that try to compare element wise (ordering or equality) and vectorise them in the compiler for simple types e.g. uint that benefit the most from this. I will try to make a CL (when I find the time) for just that to see how complex that is and how often it would trigger. This is in line with other Go compiler optimisations such as range clear idioms. It does not need a language change and other compilers and tools can choose not to optimize this and dont dont need to be extended.

@golang golang locked and limited conversation to collaborators Jul 22, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge LanguageChange Proposal v2 A language change or incompatible library change
Projects
None yet
Development

No branches or pull requests

8 participants