Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: math/rand: add Shuffle #20480

Closed
josharian opened this issue May 24, 2017 · 17 comments
Closed

proposal: math/rand: add Shuffle #20480

josharian opened this issue May 24, 2017 · 17 comments

Comments

@josharian
Copy link
Contributor

CL 33909 re-implements Fisher-Yates shuffling--simple and easy but notoriously error prone. CL 41973 used rand.Perm to avoid re-implementing it, but at the cost of an alloc+copy.

Random shuffling of a slice is a very common need. Now that we have reflect.Swapper, however, we could implement a reasonably efficient Shuffle function in math/rand:

// Shuffle pseudo-randomizes the order of elements in slice.
func Shuffle(slice interface{})

Or ShuffleSlice?

@josharian josharian added this to the Proposal milestone May 24, 2017
@bradfitz
Copy link
Contributor

And crypto/rand.ShuffleSlice, and math/rand.(*Rand).ShuffleSlice?

Starting to get a little heavy.

@josharian
Copy link
Contributor Author

I see no clear need for crypto/rand.Shuffle; there's no crypto/rand.Perm.

Yes, math/rand.(*Rand).Shuffle. The "all methods are duplicated" pattern is easy to notice as a user, and thus isn't really the same as doubling the API surface with new methods.

@rsc
Copy link
Contributor

rsc commented Jun 5, 2017

I was really hoping to keep the swapper experiment limited to sort. If we do this (if), Shuffle should take an interface that is the obvious subset of sort.Interface (no Less). It seems like then we'd also need ShuffleSlice for convenience, though.

But maybe instead of thinking about analogies to sort.Sort we should think about analogies to sort.Search. What about:

rand.Shuffle(n int, swap func(i, j int))

where you'd write

rand.Shuffle(len(x), func(i, j int) { x[i], x[j] = x[j], x[i] })

Maybe that's OK?

@rsc rsc changed the title proposal: add math/rand.Shuffle proposal: math/rand: add Shuffle Jun 5, 2017
@rsc
Copy link
Contributor

rsc commented Jun 12, 2017

Any thoughts on comment above?

@cespare
Copy link
Contributor

cespare commented Jun 12, 2017

I looked through my own code and the Go code at work and all the shuffling I can find operates on slices. So from a user perspective, the original proposal (ShuffleSlice which uses swapper) is more convenient since the swap function will always be the same func(i, j int) { x[i], x[j] = x[j], x[i] } boilerplate.

Any of these proposals seem reasonable and would be useful for me, though.

@josharian
Copy link
Contributor Author

I agree with @cespare.

  • Though a rand.Shuffle that accepts a swap function is more general, I suspect that the "sort a slice" case so dominates usage that it'd be better to optimize for it at the cost of generality.
  • It is worth double-checking whether the two would be equivalent, or at least close, in terms of performance. If there's a big gap, I'd favor the more efficient one.
  • In the end, I'd be perfectly happy with either API.

@btracey
Copy link
Contributor

btracey commented Jun 13, 2017

I'm with @rsc, in general. I've often found that I either want to track the elements got shuffled, or I want to shuffle multiple different slices (say, the inputs and outputs). The more general function would also make it easy to shuffle rows or columns of a matrix (for me). This would be hard with just the slice formulation.

@josharian
Copy link
Contributor Author

Thanks, @btracey. The "shuffle multiple slices" and row/column use cases are pretty compelling. I'm convinced.

@rsc
Copy link
Contributor

rsc commented Jun 13, 2017

Does anyone have any numbers on how often this even comes up? I mean, sorting data happens all the time; I can't remember the last time I needed to shuffle data.

@btracey
Copy link
Contributor

btracey commented Jun 13, 2017

In Gonum, we currently have two uses of rand.Perm (well, one use that occurs twice), for Latin Hypercube sampling [0], where you want samples in random buckets. This could be implemented with shuffle instead of Perm. We also have a use in our testing suite for generating a matrix in a particular form [1](name not our fault, copied from the old fortran days).

Searching a large chunk of my personal research code over the past couple of years, I have about 10 distinct uses (more if you count mulitple implementations of similar functionality). Loosely, I've used it in machine learning for stochastic gradient descent variations, in cross validation for training/testing partitioning (probably would not reimplement with shuffle), for choosing a unique set of bits to flip in a bit string (not shuffle), and for dealing with algorithms that depend on data order, but you want to remove that bias.

I also have a file that appears to read in a csv, shuffle the order of the rows, and then write a new csv. I do not remember why I thought that was necessary.

[0] https://godoc.org/gonum.org/v1/gonum/stat/samplemv#LatinHypercube
[1] https://godoc.org/gonum.org/v1/gonum/lapack/testlapack#DgetriTest

@rsc
Copy link
Contributor

rsc commented Jun 19, 2017

Josh are you saying you're now convinced to do:

rand.Shuffle(n int, swap func(i, j int))

?

@josharian
Copy link
Contributor Author

Yes. I've not had a chance yet to gather empirical data about demand.

@josharian
Copy link
Contributor Author

I gathered some quick and dirty empirical data from GitHub searches, restricted with language:Go.

For a baseline:

Perm/shuffling:

  • rand.Perm: 9,502 results. Of the first two pages of results, 5 were implementing shuffling, 6 were simple demo/experimentation code, 2 were wrappers around math.Rand, 1 didn't call rand.Perm, 3 were used for random input for tests/benchmarks. Extrapolation suggests ~2,500 uses of rand.Perm to implement shuffling.
  • fisher-yates: 479 results
  • shuffle: 10,304 results

Off the topic of my head, in the standard library, in addition to the two cases above from the compiler, we also do a shuffle of cases in the select implementation, and might do one in the sort tests.

My pulse from this is that rand.Shuffle would be of low-to-medium popularity among math/rand APIs.

If we decide not to add rand.Shuffle, I would suggest considering having rand.Shuffle replace rand.Perm as the shuffling primitive in math/rand for Go 2.

@btracey
Copy link
Contributor

btracey commented Jun 20, 2017

I don't think it'll affect your results too much, but is it easy to search for *rand.Rand types that call those methods? In gonum we make sure to generate explicit streams so we have reproducible tests.

@rsc
Copy link
Contributor

rsc commented Jul 17, 2017

Let's start with rand.Shuffle and see about rand.ShuffleSlice after that. There should be no crypto/rand.ShuffleSlice (mentioned above, but I can't see why).

@rsc rsc modified the milestones: Go1.10, Proposal Jul 17, 2017
@rsc
Copy link
Contributor

rsc commented Jul 17, 2017

Assuming @josharian wants to do this, but if not speak up and anyone else can.

@josharian josharian self-assigned this Jul 21, 2017
@gopherbot
Copy link

Change https://golang.org/cl/51891 mentions this issue: math/rand: add Shuffle

josharian added a commit to josharian/go that referenced this issue Aug 12, 2017
Shuffle uses the Fisher-Yates algorithm.

Since this is new API, it affords us the opportunity
to use a much faster Int31n implementation that mostly avoids division.
As a result, BenchmarkPerm30ViaShuffle is
about 30% faster than BenchmarkPerm30,
despite requiring a separate initialization loop
and using function calls to swap elements.

Fixes golang#20480
Updates golang#16213
Updates golang#21211

Change-Id: Ib8956c4bebed9d84f193eb98282ec16ee7c2b2d5
josharian added a commit to josharian/go that referenced this issue Nov 13, 2017
Shuffle uses the Fisher-Yates algorithm.

Since this is new API, it affords us the opportunity
to use a much faster Int31n implementation that mostly avoids division.
As a result, BenchmarkPerm30ViaShuffle is
about 30% faster than BenchmarkPerm30,
despite requiring a separate initialization loop
and using function calls to swap elements.

Fixes golang#20480
Updates golang#16213
Updates golang#21211

Change-Id: Ib8956c4bebed9d84f193eb98282ec16ee7c2b2d5
@golang golang locked and limited conversation to collaborators Sep 8, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants