Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: Go 2: Multi-dimensional slices #41129

Closed
RaananHadar opened this issue Aug 29, 2020 · 15 comments
Closed

proposal: Go 2: Multi-dimensional slices #41129

RaananHadar opened this issue Aug 29, 2020 · 15 comments
Labels
Milestone

Comments

@RaananHadar
Copy link

RaananHadar commented Aug 29, 2020

This proposal was originally denied back in 2014, not because it was unjustified, but because the go team said that go1 was, justifiably, stable, with the argument that consideration of this should be deferred to go2.

Its 2020 and data science, machine learning and AI are more important today than ever.
There are some really compelling arguments for why go can be the next disruptive language for data science.

I would like to thank @btracey and the gonum team for their originally written proposal.

Please do reconsider.

@gopherbot gopherbot added this to the Proposal milestone Aug 29, 2020
@RaananHadar RaananHadar changed the title proposal: Go 2 : Multi-dimensional slices proposal: Go 2: Multi-dimensional slices Aug 29, 2020
@seankhliao
Copy link
Member

Unless the concerns in #6282 are addressed, the mere passage of time should not change the decision that was made before

@RaananHadar
Copy link
Author

RaananHadar commented Aug 29, 2020

#6282 is a long discussion so it deserves a recap.

There are no critical standing concerns to be addressed, the go team acknowledged that this is a valid point and though the proposal is not 100% perfect, it is justified and should be brought up to more mature discussion when people will start discussing go2 more seriously.

Here is the original decision, to quote @griesemer :

We suggest that this discussion continue with the intent to come up with a new and improved design (possibly along the lines of one of the alternatives above) with the intent of having a blueprint for consideration for Go 2.

This issue fell between the cracks and was frozen due to age. It should be revived.

@ianlancetaylor
Copy link
Contributor

But someone still needs to come up with a new and improved design. Opening this issue doesn't help without that.

@ianlancetaylor ianlancetaylor added v2 A language change or incompatible library change LanguageChange labels Aug 29, 2020
@RaananHadar
Copy link
Author

RaananHadar commented Aug 30, 2020

This is a difficult feature. Somewhat analogous to generics, a good design is not something that appeared over night. It required significant engineering efforts, including involvement also from the core go dev team. And a since a good design is going to require a community effort, this will not be solved unless a problem is categorized, discussed and tracked, which it was not.

I am arguing that this is a valid problem that was put aside because it was a tough problem that came in a time where the go language was not open to changes. Though I have the capability to present a design, I think this would be presumptuous and I believe that bringing everyone to discuss this is the important first step.

It is true that I should involve the gonum team in this. I am hopeful.

@ianlancetaylor
Copy link
Contributor

I agree with what you say. My point is a different one. This isn't how we use the issue tracker, and, in general, this kind of discussion does not work on the issue tracker. The problem is that the issue tracker is not threaded, and the GitHub interface starts hiding comments when there are too many of them. A better place for this kind of discussion is a mailing list.

So by all means have the discussion. Just don't do it here. Thanks.

@RaananHadar
Copy link
Author

RaananHadar commented Aug 31, 2020

@ianlancetaylor I completely agree with you on this one. I think the original issue staled because it was too big of a discussion. My plan for now is to make an Experience Report which was not submitted for the original issue, because experience reports did not exist at the time of original proposal. And move with the community to a refined proposal.

EDIT: having gone more deeper into all of the prior proposals, I just noticed now that @ianlancetaylor was behind #13253, I apologize for not noticing that and crediting your work in the original post. I was not fully aware of the alternative proposal at the time.

@griesemer
Copy link
Contributor

For the sake of discussion, let's assume Go had support for multi-dimensional slices. Such slices would need to support a certain set of operations, such as creation/allocation of those slides, the ability to index elements, possibly iteration over the elements, maybe element-wise operations (adding two matrices), etc.

Also for the sake of discussion, I am going to make a few claims:

  1. Multi-dimensional slices are nice for small problems but don't scale. Real problems require detailed control over the layout of matrices: There is a reason why numerical libraries such as LINPACK, LAPACK, etc. but also gonum support various representations of matrices. There are dense and sparse matrices. There may be distributed matrices. The representations of these matrices can be very different from a basic multi-dimensional slice - at best they may use multi-dimensional slices underneath.

  2. AI applications, while operating on matrices, depend on maximum throughput and (typically) specialized hardware. To achieve the throughput or drive the hardware, the data must be brought into a suitable form which may be different from the "built-in" representation of matrices/multi-dimensional slices of a programming language. Again, real applications require detailed control over the layout of such data structures.

In summary, what we need is full programability of (memory) layout. And with programmability of layout we also need programmability of operations, as they depend on the layout.

  1. Go already provides support for that: it's called user-defined data types. With generics, we can also parameterize these data types (vectors, dense and sparse matrices, etc.) by element type. Using methods, we also can implement the necessary operations.

So if anything is missing it's perhaps convenience of notation, i.e., syntax.

  1. Operator methods are not sufficient. Simply permitting operator "names" such as +, *, etc. as method names (as in func (x *Vector) + (y *Vector) *Vector, so that we can write x + y which would mean x.+(y)) is a fairly easy language change but unfortunately it also doesn't scale. First, we don't just need operator methods, we probably also want operator method overloading: given a matrix m and a scalar value x, m + m would invoke the + method with a matrix receiver and argument, while m + x would have to invoke a method with a matrix receiver and a scalar argument. That is, we need two + methods (same name, different signatures), or perhaps a generic + method with method-specific type parameters. Operator overloading is not something we would be willing to add to Go (it would cause serious damage to the simplicity of method lookups); and generic methods are not sufficiently understood to be considered (see the generics draft design). One could make the 2nd operator parameter an empty interface and use a type switch inside the operator method, but that seems also not very satisfactory.

  2. But it gets worse: for performance, we would not write (for example) a matrix multiplication as B = A*x (translated to B = A.*(x)) because it would require a (possibly large) allocation for the result B. For performance, we would want to pass in the result memory as an argument (perhaps the receiver) to the operations. That is what math/big and the gonum package do. In short, unless there's yet another mechanism to somehow control how results are allocated, operator methods fall flat in real applications.

To make progress here we need to have strong arguments as to why these claims are wrong.

There's one thing we could do on the syntax side: we could perhaps support indexing methods. If we declare methods such as [] and []=

type Matrix[T Numeric] struct ...
func (m *Matrix[T]) [] (i, j int) T ... // getter: x = m[i, j] translates to x = m.[](i, j)
func (m *Matrix[T]) []= (i, j int, x T) ... // setter: m[i, j] = x translates to m.[]=(i, j, x)

we could implement multi-dimensional data structures in libraries yet provide notationally convenient access to the elements. Such methods would only accept a prescribed signature (n integer arguments and a result or argument of a given element type). (See also https://www.dotconferences.com/2016/10/robert-griesemer-prototype-your-design.)

@ianlancetaylor
Copy link
Contributor

Based on the discussion above, this proposal is a likely decline. Discussing the topic is fine but there is no detailed proposal here. Leaving open for four weeks for final comments.

@RaananHadar
Copy link
Author

RaananHadar commented Sep 23, 2020

Thank you for giving me the time to do the required homework on this. The post by @griesemer and your past works gave me extremely valuable food for thought. I have written and published an experience report on this issue and linked it to the official experience reports page. I could not have done this work without all the collective past contributions of the Go community and the core Go dev team.

I became convinced, only after the completing it, that the solution proposed by @griesemer:

  1. Solves the right problem.
  2. Solves it properly.

I am proposing that you consider index operator methods as proposed by @griesemer as the right solution to this problem and I do hope that you reach the same conclusion after you read my experience report.

Thanks again for this opportunity.

@changkun
Copy link
Member

changkun commented Sep 24, 2020

A 15 Years of array programming experience report (and what makes python success today):

Harris, C.R., Millman, K.J., van der Walt, S.J. et al. Array programming with NumPy. Nature 585, 357–362 (2020). https://doi.org/10.1038/s41586-020-2649-2

@chewxy
Copy link

chewxy commented Sep 28, 2020

Hello, author of gorgonia.org/tensor here. Hope you don't mind me chiming in.

The tensor package has a very numpy-ish API, and scales indefinitely with number of dimensions (i.e. you can have a 4/5/6/...128 dimensional array.

I think @griesemer merely scratches the surface of the issues with making multidimensional slices. I also think there may be some benefits to additional accessor syntax. Just not in the way most would imagine it.

Here's an example:

Assume T is a 3 dimensional array with shape (5, 10, 10). With the tensor package, you can write something like this to slice the array:

T.Slice(S(0,3), nil, S(5,9))

In Python/Numpy, it'd look like this:

T[0:3, :, 5:9]

The latter is very much cleaner than the former.

Ideally with the [] and []= syntax extensions, they should be parameterized over a given tensor rank (or dimension, if you are coming from the numpy world).

Just having it as a plain accessor (as suggested) isn't going to do much though. You could currently do something like this:

T.SetAt(value, i,j,k)
x, err := T.At(i,j,k)

Going from that to this:

T[i][j][k] = value
x := T[i][j][k]

brings little utility. When programming with arrays, you seldom set values like this. Instead you typically set values in batches. So, using the same T as an example, you need to be able to syntactically support doing something like this:

T[i] = ... // multidimensional value here

This is doable with some slight modification to the extension that @griesemer proposed, e.g.

type Dense[T any] struct ...
func (d *Dense[T]) [] (coords ...type U) ... // getter or slicer
func (d *Dense[T]) []= (val T, coords ...type U) ... // setter

where U is defined

type U interface{
    int, Slice
}

Numpy's killer feature is the ability to slice with syntax and I think Go should have them too.

Of course, the problem is if you allow this, then abuses of syntax extensions will be abound. So we gotta balance that as well.

@RaananHadar
Copy link
Author

RaananHadar commented Sep 28, 2020

Thanks @chewxy. I completely agree with you. I apologize for not being clear on this.
My experience report stresses that the index operators require parametrized arguments for many common use-cases to really bring significant benefit to the user.

Whether this is:

type Dense[T any] struct ...
func (d *Dense[T]) [] (i, j int)

or

func (d *Dense[T]) [] (coords ...type U)

Doesn't change much on my part. As long as we are adding [] and []= with a declared signature, the rest is all functionality that already exists today in Go and can be implemented by the user.

As for syntax abuse, you can abuse interface{} with function calls today. I think Go programmers have learned not to abuse it and I don't think its any different.

@RaananHadar
Copy link
Author

RaananHadar commented Oct 2, 2020

You can find a proposal which is based on the proposal template here.

Thanks again for your time.

@RaananHadar
Copy link
Author

RaananHadar commented Oct 3, 2020

Please note, I have made some significant additions to the proposal to better address the ability to add custom slicing operations. This better addresses my experience report and now fully supports the use-case detailed by @chewxy .

EDIT: For a discussion of the proposal please follow this link.

@ianlancetaylor
Copy link
Contributor

There was further discussion, but no change in consensus on this particular proposal.

@golang golang locked and limited conversation to collaborators Oct 13, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

7 participants