-
Notifications
You must be signed in to change notification settings - Fork 18k
proposal: math/rand: add Shuffle #20480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
And Starting to get a little heavy. |
I see no clear need for Yes, |
I was really hoping to keep the swapper experiment limited to sort. If we do this (if), Shuffle should take an interface that is the obvious subset of sort.Interface (no Less). It seems like then we'd also need ShuffleSlice for convenience, though. But maybe instead of thinking about analogies to sort.Sort we should think about analogies to sort.Search. What about:
where you'd write
Maybe that's OK? |
Any thoughts on comment above? |
I looked through my own code and the Go code at work and all the shuffling I can find operates on slices. So from a user perspective, the original proposal (ShuffleSlice which uses swapper) is more convenient since the swap function will always be the same Any of these proposals seem reasonable and would be useful for me, though. |
I agree with @cespare.
|
I'm with @rsc, in general. I've often found that I either want to track the elements got shuffled, or I want to shuffle multiple different slices (say, the inputs and outputs). The more general function would also make it easy to shuffle rows or columns of a matrix (for me). This would be hard with just the slice formulation. |
Thanks, @btracey. The "shuffle multiple slices" and row/column use cases are pretty compelling. I'm convinced. |
Does anyone have any numbers on how often this even comes up? I mean, sorting data happens all the time; I can't remember the last time I needed to shuffle data. |
In Gonum, we currently have two uses of Searching a large chunk of my personal research code over the past couple of years, I have about 10 distinct uses (more if you count mulitple implementations of similar functionality). Loosely, I've used it in machine learning for stochastic gradient descent variations, in cross validation for training/testing partitioning (probably would not reimplement with shuffle), for choosing a unique set of bits to flip in a bit string (not shuffle), and for dealing with algorithms that depend on data order, but you want to remove that bias. I also have a file that appears to read in a csv, shuffle the order of the rows, and then write a new csv. I do not remember why I thought that was necessary. [0] https://godoc.org/gonum.org/v1/gonum/stat/samplemv#LatinHypercube |
Josh are you saying you're now convinced to do: rand.Shuffle(n int, swap func(i, j int)) ? |
Yes. I've not had a chance yet to gather empirical data about demand. |
I gathered some quick and dirty empirical data from GitHub searches, restricted with For a baseline:
Perm/shuffling:
Off the topic of my head, in the standard library, in addition to the two cases above from the compiler, we also do a shuffle of cases in the select implementation, and might do one in the sort tests. My pulse from this is that rand.Shuffle would be of low-to-medium popularity among math/rand APIs. If we decide not to add |
I don't think it'll affect your results too much, but is it easy to search for |
Let's start with rand.Shuffle and see about rand.ShuffleSlice after that. There should be no crypto/rand.ShuffleSlice (mentioned above, but I can't see why). |
Assuming @josharian wants to do this, but if not speak up and anyone else can. |
Change https://golang.org/cl/51891 mentions this issue: |
Shuffle uses the Fisher-Yates algorithm. Since this is new API, it affords us the opportunity to use a much faster Int31n implementation that mostly avoids division. As a result, BenchmarkPerm30ViaShuffle is about 30% faster than BenchmarkPerm30, despite requiring a separate initialization loop and using function calls to swap elements. Fixes golang#20480 Updates golang#16213 Updates golang#21211 Change-Id: Ib8956c4bebed9d84f193eb98282ec16ee7c2b2d5
Shuffle uses the Fisher-Yates algorithm. Since this is new API, it affords us the opportunity to use a much faster Int31n implementation that mostly avoids division. As a result, BenchmarkPerm30ViaShuffle is about 30% faster than BenchmarkPerm30, despite requiring a separate initialization loop and using function calls to swap elements. Fixes golang#20480 Updates golang#16213 Updates golang#21211 Change-Id: Ib8956c4bebed9d84f193eb98282ec16ee7c2b2d5
CL 33909 re-implements Fisher-Yates shuffling--simple and easy but notoriously error prone. CL 41973 used rand.Perm to avoid re-implementing it, but at the cost of an alloc+copy.
Random shuffling of a slice is a very common need. Now that we have reflect.Swapper, however, we could implement a reasonably efficient Shuffle function in math/rand:
Or ShuffleSlice?
The text was updated successfully, but these errors were encountered: