proposal: math/rand: rework for Go 2 #26263

josharian · 2018-07-07T15:12:21Z

Introduction

I propose that math/rand be overhauled for Go 2.

Rob Pike has proposed some aspects of a better math/rand for Go 2 in #21835. This proposal is intended to complement and extend that proposal. It may make sense to merge them, although their scopes are (intentionally) mostly disjoint.

This proposal is not fully polished, but I have been sitting on it for months. It past time to post it. I look forward to a vigorous discussion.

A re-think of math/rand should include:

compatibility guarantees and reproducibility
the Source interface
the core PRNG
the concurrency strategy
the routines that convert raw PRNG output to other forms and distributions
the API surface
migration and interoperability between Go 1 and Go 2

This proposal will discuss all of these, in turn, and then knit the results of those discussions into a single proposal.

Please note that I am not a subject matter expert. Input from experts (and non-experts) is welcomed.

Compatibility guarantees

I begin with compatibility guarantees, as decisions here impact almost every aspect of the rest of the proposal.

The Go 1 Compatibility Guarantee left some ambiguity about when the values produced math/rand should remain stable. For the life of Go 1, math/rand has remained extremely stable. To my knowledge, only a single change to the value stream has ever occurred (#16124). Many opportunities to break stream stability have been declined. There's a fair amount of discussion in #8013; it is recommended background reading. More discussion can also be found in: #13215, #14416, #11871, #8731, #12290, #6721.

There are plausible use cases in which stability matters, such as test cases discovered using testing/quick. There are also plausible use cases in which stability is irrelevant, such as adding jitter to exponential backoff code. Stability adds significant brittleness; there's not much more to math/rand beyond the exact value steam.

I propose a compromise.

Top-level convenience functions, like Intn, will offer no stability guarantees. They will be documented accordingly. The top level Seed function will be removed; the PRNG will be seeded on startup. (See #11871 for discussion of how to seed; probably package time.) Using a new seed on every program execution will prevent accidental dependence on a particular value stream. People who just want some random numbers can just get them, and the implementation remains free to improve over time.

The Rand methods will provide stability for the lifetime of Go 2 and will be documented as such. Notably, to use Rand methods, you must provide a Source, which may be Seeded. This explicitness fits well with the stability requirement.

This means that any given Source implemented in math/rand must provide stability. However, since Source is an interface, if there are significant advances in PRNGs during Go 2, they can be added as new Sources, or provided by third party libraries.

This also means that the Rand methods must be stable. This is the biggest source of brittleness, since the Rand methods are not easily swappable the way Sources are. And we learned during Go 1 that these methods will be found over time to have bugs, to have non-optimal implementations, and to use non-optimal algorithms (#16213). We could decide to make these converter methods pluggable, but that adds significant API overhead.

Taken together, it is likely that the top level convenience functions will diverge over time from the stable Rand methods. That is OK. Among other things, it means that if/when the time for Go 3 arrives, we will have well-tested, state of the art functions that we can promote to Rand methods. (This is the opposite approach to the one advocated in #25551.)

The `Source` interface

As Rob Pike notes in #21835, Source64 should become the default interface (and possibly renamed to Source); Source should be deprecated and removed.

I propose further that Seed be modified to accept a uint64 seed instead of an int64 seed. This better matches Source64, and highlights the opacity of the bits involved.

The core PRNG

The Go 1 core PRNG has several shortcomings. They are ably discussed by Rob Pike in #21835. He proposes adopting PCG as a replacement, and that (after a series of compatibility steps) its constructor be renamed to NewSource.

I propose that we adopt PCG, but that the end goal be for it retain the name PGC and be implemented as a struct rather than using a constructor:

type PCG struct {
	// unexported fields
}

func (*PCG) Uint64() uint64 { /* ... */ }
func (*PCG) Seed(uint64)     { /* ... */ }

Giving it an non-generic name makes it easier to add new PRNGs during Go 2, should there be significant improvements in the state of the art, or if it becomes clear that it would be valuable to have multiple PRNGs with different characteristics.

Exposing it as a struct reduces API surface area and allows performance-sensitive clients to make direct calls to its methods.

Concurrency

The Go 1 top-level convenience functions are concurrency-safe. The mutex they contain is a hidden source of performance pain. See #24121, #25057, #20387, and #21393 for discussion. CLs 43611 and 109817 attempt to work around this while preserving stability; they have both, to my mind, met with failure.

If you relax the stability requirement, as I propose, then there are many ways to improve scalability while retaining concurrency-safety, including: (1) sharded mutexes, as in CL 43611; (2) a sync.Pool, as in CL 109817; and (3) per-CPU storage, as proposed in #18802.

One could also imagine adding a sync.Locker field to Rand, to be used only when non-nil, to assist with concurrency safety when using Rand methods. However, if the convenience functions are performant and scale well, the primary motivation to use Rand methods is reproducibility, and concurrent access is inimical to reproducibility. This suggests that asking developers to manage their own locking in such cases is very reasonable, and that we are better off avoiding the bloat.

See also recent related discussion in #25988.

Conversion routines

Discussed elsewhere in this proposal.

API surface

Go 2 provides an opportunity to improve the API. I have already proposed some modifications.

I further propose:

Remove Perm and Rand.Perm, as it is trivially implemented using Shuffle. See also discussion in proposal: math/rand: add Shuffle #20480.
Make NewZipf a top-level function and add Rand.NewZipf, so that it matches the usage pattern of the rest of the package.
Change Int31n and Rand.Int31n to Uint32n and Rand.Uint32n. Similarly so for Int63n and Intn. These are better primitives; you can implement Int31n in terms of Uint32n, but not vice versa, because of the lost bit.
Considering adding normal distribution generators that accept a stddev and mean. This is a common use case. Also, I suspect (but do not actually know) that by pushing stddev and mean into the generation, we might be able to reduce floating point error, using ratio-of-uniforms generation instead of ziggurat. Maybe something similar for ExpFloat64 and rate parameter? Expert input on this point would be appreciated.
Remove Uint32 and Uint64, as they are trivially implemented in terms of the new Source interface--the former as a cast, the latter by calling the Uint64 method directly.

Migration and interoperability between Go 1 and Go 2

Rob Pike laid out a graceful migration plan for the PCG source in #21835. I will not repeat it here.

The changes in this proposal are more drastic. They involve changing APIs and changing reproducibility guarantees.

The modifications are gofix-able, assuming that you are OK losing reproducibility. But that is a large assumption. Given that backdrop, I'm inclined to lean on vgo (or whatever versioning support gets added to the toolchain) to allow Go 1 math/rand and Go 2 math/rand to co-exist. Code can be updated using an automated tool, but with the user's understanding that the value streams will change. The challenge, then, is primarily a documentation one, with some assists from gofix.

The text was updated successfully, but these errors were encountered:

ianlancetaylor · 2018-07-08T20:05:12Z

Do you have any thoughts on the relationship between the math/rand package and caching of test results?

josharian · 2018-07-08T22:56:06Z

Do you have any thoughts on the relationship between the math/rand package and caching of test results?

That seems more of a cmd/go question than a math/rand question.

I can see the argument for disabling caching for tests in which a random element is a key input to the test, as with testing/quick (#26276). But marking a test as not cacheable because any math/rand function has been called seems like a bad idea, since they may be called as an unrelated side-effect of what you’re actually testing, e.g. by some init function in some dependency. I’m not sure how to draw the line, but I think it’s unlikely that the line is “uses math/rand”.

In any case, I think this question is probably orthogonal to this proposal.

zephyrtronium · 2019-12-31T20:24:21Z

This is a somewhat older proposal that doesn't have much activity, but I'd still like to add one thing I've learned: there is no one-size-fits-all PRNG. The classic example is cryptographic generators, but there are occasionally subtler considerations; e.g. I've been bitten several times by using a PRNG with uniformity in too few dimensions before learning how that can matter. I believe the best approach is to provide multiple generators that target different goals, along with some elementary advice on how to choose one – which could be as simple as "try [highly uniform generator] if package-level functions or [very fast generator] produce unexpected results," or it could be a reference to a blog post with explanations and examples from experts.

Offering multiple generators also provides an opportunity for a clearer API. Rather than the name rand.PCG, which is only particularly relevant to those who read about PRNGs, the same algorithm could be exposed as rand.Fast. Something like 64-bit Mersenne Twister, MELG-64, or an implementation of the extended generator technique outlined in the PCG paper could then be added under a name like rand.VectorUniform. Even better would be to provide type Fast = PCG (and likewise) and state that the particular generator aliased by that name may change between releases.

josharian · 2019-12-31T21:54:21Z

@zephyrtronium thanks for the input. Fortunately the standard library doesn’t need to be everything to everybody. It should be a good default for most use cases. It’s ok for (say) folks doing statistical modeling to reach for something outside the standard library. (Although input from folks like @kortschak is always welcome.) And we do have crypto/rand. I’d rather err on the side of a simpler API.

kortschak · 2019-12-31T22:14:50Z

We provide alternative PRNGs at gonum.org/v1/gonum/mathext/prng. These satisfy the x/exp/rand interfaces and have state serialisation. We'd also be happy to add others if they are needed (PRs welcome).

zephyrtronium · 2020-01-01T01:22:18Z

It's hard to argue against a simple API. My main concern is that inappropriate choice of RNG can lead to incredibly subtle bugs, and the impact of dimensional uniformity in particular is an issue I didn't understand until I started implementing and researching PRNG algorithms, years after it affected programs I'd written. Providing multiple PRNGs would help make it clear that there are reasons to choose a particular one, which might mitigate that type of problem for others.

kortschak · 2020-01-01T01:54:44Z

There are multiple choices, just not provided by the standard library. I agree with @josharian that putting additional implementations in the math/rand (or now the x/exp/rand) package is probably not a good idea; having numerous choices is as likely to result in incorrect choices when there is a need for precise choices, and more likely in the case that precise choices are not needed. Having a single PRNG that works reasonably well for most cases is I think the correct approach here. Whether there should be documentation to explicitly point the fact that other PRNGs available may be worth considering, though standard library packages generally don't name third party packages by name, and I'm not suggesting that Gonum's implementations should be noted as special.

flyingmutant · 2022-07-28T08:50:23Z

This might be a bit too far-fetched for the "standard" PRNG package, but is probably worth mentioning: Source interface is a relatively costly abstraction, and I think that the vast majority of math/rand users don't benefit from it (as they only use the default Source provided). From my experience implementing a Source-free PRNG package, there is probably around 2x-4x possible performance to gain in exchange for a mild API breakage.

Merovius · 2022-09-29T13:33:42Z

As discussing math/rand is hip again and because I wanted to have this written down somewhere:

I propose further that Seed be modified to accept a uint64 seed instead of an int64 seed. This better matches Source64, and highlights the opacity of the bits involved.

I would instead remove Seed from the Source interface altogether. Instead, every concrete source type should decide itself how it is to be seeded. A non-deterministic, periodically re-seeded one probably does not want to have a Seed method at all. Or a crypto/rand one, if you think that's a good idea. Meanwhile, a PCG source probably wants func (*PCG) Seed(hi, lo uint64).

The only reason to have Seed in Source is so that *Rand can have a Seed method. So I'd remove that as well. If you want to re-seed, use the Source itself. It does mean that you have to keep the Source around separately, if you want to re-seed. But… oh well.

To me, this also follows from @josharian's observation of separating "the core PRNG" from "the routines that convert raw PRNG output to other forms and distributions". If they are separate, let's keep them separate. And seeding is definitely part of the former concern.

kortschak · 2022-09-29T20:08:23Z

It does mean that you have to keep the Source around separately, if you want to re-seed. But… oh well.

All the Gonum code that uses a PRNG takes a Source as an argument and constructs a new *Rand if needed, for these reasons.

DeedleFake · 2022-09-29T22:08:36Z

@Merovius

Meanwhile, a PCG source probably wants func (*PCG) Seed(hi, lo uint64).

I think that makes sense. That would remove the need for things like #49934.

cespare · 2022-09-29T22:41:29Z

I would instead remove Seed from the Source interface altogether.

Yeah, this sounds right to me.

A while ago I played around with a fast contention-avoiding (CPU-local) rand package. I ended up adding Seed to satisfy the interface but made it a no-op. That's one context where Seed doesn't make sense.

zephyrtronium · 2022-09-30T03:15:55Z

Meanwhile, a PCG source probably wants func (*PCG) Seed(hi, lo uint64).

As @kortschak mentioned on #49934, I wonder whether even this is redundant with UnmarshalBinary. Indeed, in (an unpublished version of) my own randomness package, the thing that distinguishes a PRNG from a generic random sequence is implementing BinaryMarshaler and BinaryUnmarshaler along with providing the state size, and I have a number of convenience functions for seeding through that.

rsc · 2023-06-13T02:00:46Z

See #60751 for a GitHub Discussion that I hope will lead to a math/rand/v2. Closing as duplicate of that.

gopherbot added this to the Proposal milestone Jul 7, 2018

gopherbot added the Proposal label Jul 7, 2018

josharian added the v2 A language change or incompatible library change label Jul 7, 2018

ianlancetaylor mentioned this issue Sep 12, 2018

proposal: Go 2: math/rand: make globalRand public and drop package methods #25551

Closed

ianlancetaylor added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Oct 2, 2018

mpx mentioned this issue Nov 28, 2019

proposal: x/exp/rand: add support for PRNG state serialisation and deserialisation #35886

Closed

zephyrtronium mentioned this issue Dec 4, 2020

proposal: use PCG Source in math/rand for Go 2 #21835

Closed

This comment has been minimized.

Sign in to view

josharian mentioned this issue Dec 2, 2021

math/rand: reconsider lock-free source #49892

Closed

rsc mentioned this issue Sep 6, 2022

math/rand: seed global generator randomly #54880

Closed

bakkot mentioned this issue Dec 4, 2022

Thoughts on choice of algorithm tc39/proposal-seeded-random#19

Open

oguzhand95 mentioned this issue Apr 20, 2023

chore(ci): Fix data race in store test cerbos/cerbos#1537

Merged

rsc closed this as not planned Won't fix, can't repro, duplicate, stale Jun 13, 2023

rsc mentioned this issue Aug 2, 2023

math/rand/v2: revised API for math/rand #61716

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal: math/rand: rework for Go 2 #26263

proposal: math/rand: rework for Go 2 #26263

josharian commented Jul 7, 2018

ianlancetaylor commented Jul 8, 2018

josharian commented Jul 8, 2018

zephyrtronium commented Dec 31, 2019

josharian commented Dec 31, 2019

kortschak commented Dec 31, 2019

zephyrtronium commented Jan 1, 2020

kortschak commented Jan 1, 2020

This comment has been minimized.

This comment has been minimized.

flyingmutant commented Jul 28, 2022

Merovius commented Sep 29, 2022 •

edited

kortschak commented Sep 29, 2022

DeedleFake commented Sep 29, 2022

cespare commented Sep 29, 2022

zephyrtronium commented Sep 30, 2022

rsc commented Jun 13, 2023

proposal: math/rand: rework for Go 2 #26263

proposal: math/rand: rework for Go 2 #26263

Comments

josharian commented Jul 7, 2018

Introduction

Compatibility guarantees

The Source interface

The core PRNG

Concurrency

Conversion routines

API surface

Migration and interoperability between Go 1 and Go 2

ianlancetaylor commented Jul 8, 2018

josharian commented Jul 8, 2018

zephyrtronium commented Dec 31, 2019

josharian commented Dec 31, 2019

kortschak commented Dec 31, 2019

zephyrtronium commented Jan 1, 2020

kortschak commented Jan 1, 2020

This comment has been minimized.

This comment has been minimized.

flyingmutant commented Jul 28, 2022

Merovius commented Sep 29, 2022 • edited

kortschak commented Sep 29, 2022

DeedleFake commented Sep 29, 2022

cespare commented Sep 29, 2022

zephyrtronium commented Sep 30, 2022

rsc commented Jun 13, 2023

The `Source` interface

Merovius commented Sep 29, 2022 •

edited