math/rand: Read should return consistent results, regardless of buffer size #16124

pin · 2016-06-20T04:14:49Z

Read returns byte sequence that depends on the buffer length. That's because Rand uses Int63 value to produce 7 random bytes and throws away unused bits from Int63 when done with the particular call. Next call always starts with fresh Int63 leaving "gap" from previous Read if it was not finished on 7-bytes boundary.

Therefore, you can't fix seed and read pseudo-random bytes deterministically without fixing buffer size.

Proposed fix

My proposal is to keep unused Int63 reminder in the Rand struct. It let us fix the issue and still have the same bytes from the first Read call as we have now. The performance hit is minimal also. The downside is an extra state in Rand struct.

One of the other options is to use only single byte from each Int63 call. No state in Rand but the 2x-4x performance hit.

What version of Go are you using (go version)?
go version devel +f3d5478 Fri Jun 17 19:15:29 2016 +0000 linux/amd64
What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOOS="linux"

What did you do?
This is play-adopted unit-test using testing/iotest that fails: https://play.golang.org/p/dbz-VlFPCa
What did you expect to see?
The same random bytes stream no matter read buffer size used.

The text was updated successfully, but these errors were encountered:

adg · 2016-06-20T04:47:42Z

Seems like something that's easily fixed.

The one concern I see is compatibility. If someone relies on this broken behavior, their program may cease to function correctly. However, I don't think we have made any promises about the stability of math/rand as a pseudo-random source. And this is certainly a bug. So the compatibility concern is probably not relevant.

randall77 · 2016-06-20T04:53:33Z

I don't see the point behind this proposal. If you're giving math/rand a seed (and the same seed multiple times in different runs), it is because you're trying to make some nondeterministic code deterministic. But surely you're also issuing the same sized reads in that case?

Therefore, you can't fix seed and read pseudo-random bytes deterministically without fixing buffer size.

In other words, why would you not fix the buffer size if you were trying to make a computation deterministic?

gopherbot · 2016-06-20T05:00:20Z

CL https://golang.org/cl/24251 mentions this issue.

dsnet · 2016-06-20T10:22:49Z

Issue #8013 has some discussion about whether we could change the output of the rand package. The Read method was introduced in Go1.6, so the number of users dependent on it may be low.

That said, if we do change it, should we consider this change for Go1.7 to reduce the amount of time the faulty behavior stays present? I'm almost certain that this change will break some code. The longer people are using and relying on the current behavior, the worse it will be later.

josharian · 2016-06-20T14:43:42Z

The same random bytes stream no matter read buffer size used.

I'm with @randall77; it's not clear to me why this needs to be the case. Is there a concrete context where you need this?

pin · 2016-06-20T20:56:57Z

@randall77 I would like to have deterministic sequence with known distribution but in the same time luxury of dealing with io.Reader which allows me to take buffer size out of the equation and combine/nest/wrap it without worrying about buffer size.

Say io.CopyN one instance to hash.Hash implementation, then send stream produced with other instance (made with the same seed) over the network and write it to the same hash.Hash implementation on the other end. Buffer size will differ for both cases, but I would expect to get the same hash sum.

This is the natural way of thinking for me:

PRNG generates a sequence of random values that is determined by a seed
io.Reader is a way to read/receive a sequence of bytes, a buffer is just the way to get the data
io.Reader implemented by rand.Rand is a way to generate sequence of bytes determined by a seed

@josharian The reasonable expectation I believe is that, for instance, io.LimitReader(rand.New(rand.NewSource(seed), length) is the way to produce pseudo-random byte stream of given length and that's content is determined by the seed only.

So that one could send that stream over the network, store on disk, etc. Then afterwards read it back (i.e. get io.Reader with the same stream but now coming from the network, disk, etc), create the second stream using io.LimitReader/rand.Reader with the same seed/length and validate data (read both streams in turn and compare bytes).

It is convenient because one can use arbitrary-sized streams that could exceed RAM size, have many of them and just keep seed and length in order to be able to validate content. It is sound way to do some kinds of testing of network device, storage service, etc. And doing such testing one can't always fix buffer size because it could be determined by say network buffers (or say RPC mechanism and validator implementation could have different buffers).

One of the real-world examples is TFTP library unit-tests, see https://github.com/pin/tftp/blob/master/tftp_test.go#L126 for instance. There is roll-your-own io.Reader wrapper around rand.Source exactly because of the this issue: when I implement validator I don't want to take TFTP library buffer size into the equation.

In general, I would expect that provided the same seed and calling no other methods on Rand except Read (and wrapping it into something like io.LimitReader actually enforces that) I would get the same stream of bytes no matter how io.Reader is read.

adg · 2016-06-21T01:03:14Z

I agree with @dsnet that this might be worth getting in for 1.7.

randall77 · 2016-06-21T14:51:25Z

I see. It does sound useful to do io.LimitReader(rand.New(rand.NewSource(seed), length) to generate a deterministic random-looking stream.

One way to get the determinism you want would be to wrap rand.NewSource with a iotest.OneByteReader. And then probably a bufio.Reader for efficiency. But you've convinced me, I'm ok with doing this directly in rand.

josharian · 2016-06-21T15:08:00Z

I am a bit concerned about the backwards compatibility still. We've declined to do reasonable things with rand in the past. I wonder whether it would be preferable to do this in 1.8 and fix those other bugs as well, all in one release. If this were introduced in 1.7 the case for changing it now would be clear.

randall77 · 2016-06-21T15:35:55Z

Another case where we decided not to change math/rand because of compatibility issues:

#13215

pin · 2016-06-22T05:29:48Z

@josharian, @randall77, I remember reading through #13215 some time ago because of the concern related to backwards compatibility and trying to figure out the extent it can be changed now.

My assumption is that the fix of existing io.Reader implementation without the API change is the good trade-off between breaking things and fixing things:

Leaving Read as it is now is the only fully backward compatible approach but the longer it like this the more code rely on inconsistent results. The same reasoning could be applied to some extent to the case above but:
On the other side code which would depend on consistent results now written in more complex roll-your-own way. In contrast with the case above.
And current implementation is somewhat deceptive in my experience. I was trying to find a bug in my code before I realized there are different streams when sending data over the network (small buffer) vs ioutil.ReadAll. And it looks like you have some wired bug in your code because data at the start of the stream (from a first read) matches! o_O

Other more radical changes I was thinking about are:

Pulling io.Reader implementation into a separate struct (say rand.Reader) would eliminate not so obvious interference between Read and other methods and leave rand.Rand simple convenience wrapper around rand.Source without any additional state. But it is a bigger breaking change. And on the other hand is not that surprising that calling other methods on instance would affect Read: that happens to File for example when one calls Seek.
I'm curious why rand source is not just io.Reader itself? I suspect that might be related to the fact it can't be just io.Reader, but more like ReadSeeder so other io.Reader implementations would not satisfy it anyway, therefore, it is not that beneficial.

gopherbot · 2016-06-28T07:00:21Z

CL https://golang.org/cl/24520 mentions this issue.

Updates #16124 Change-Id: Ib58f2bb37fd1559efc512a2e3cba976f09b939a0 Reviewed-on: https://go-review.googlesource.com/24520 Reviewed-by: Andrew Gerrand <adg@golang.org>

adg changed the title ~~math/rand Read implements io.Reader incorrectly~~ math/rand: Read should return consistent results, regardless of buffer size Jun 20, 2016

adg added this to the Go1.8 milestone Jun 20, 2016

adg added the help wanted label Jun 20, 2016

adg modified the milestones: Go1.7Maybe, Go1.8 Jun 21, 2016

gopherbot closed this as completed in 8d966ba Jun 27, 2016

gopherbot pushed a commit that referenced this issue Jun 28, 2016

doc/go1.7.html: mention recent changes to Rand.Read

3001334

Updates #16124 Change-Id: Ib58f2bb37fd1559efc512a2e3cba976f09b939a0 Reviewed-on: https://go-review.googlesource.com/24520 Reviewed-by: Andrew Gerrand <adg@golang.org>

josharian mentioned this issue Jun 29, 2016

proposal: math/rand: speed up Int31n with multiply/shift instead of modulo #16213

Closed

dsnet mentioned this issue Jul 9, 2016

math/rand: use of Read is racy #16308

Closed

ianlancetaylor mentioned this issue Aug 4, 2016

crypto/rand: Reader is internally buffered inconsistently on different platforms #16593

Closed

josharian mentioned this issue May 17, 2017

math/rand: use a sharded lock for the default shared Source #20387

Closed

golang locked and limited conversation to collaborators Jun 28, 2017

gopherbot added the FrozenDueToAge label Jun 28, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

math/rand: Read should return consistent results, regardless of buffer size #16124

math/rand: Read should return consistent results, regardless of buffer size #16124

pin commented Jun 20, 2016

adg commented Jun 20, 2016

randall77 commented Jun 20, 2016

gopherbot commented Jun 20, 2016

dsnet commented Jun 20, 2016 •

edited

Loading

josharian commented Jun 20, 2016

pin commented Jun 20, 2016

adg commented Jun 21, 2016

randall77 commented Jun 21, 2016

josharian commented Jun 21, 2016

randall77 commented Jun 21, 2016

pin commented Jun 22, 2016

gopherbot commented Jun 28, 2016

math/rand: Read should return consistent results, regardless of buffer size #16124

math/rand: Read should return consistent results, regardless of buffer size #16124

Comments

pin commented Jun 20, 2016

Proposed fix

adg commented Jun 20, 2016

randall77 commented Jun 20, 2016

gopherbot commented Jun 20, 2016

dsnet commented Jun 20, 2016 • edited Loading

josharian commented Jun 20, 2016

pin commented Jun 20, 2016

adg commented Jun 21, 2016

randall77 commented Jun 21, 2016

josharian commented Jun 21, 2016

randall77 commented Jun 21, 2016

pin commented Jun 22, 2016

gopherbot commented Jun 28, 2016

dsnet commented Jun 20, 2016 •

edited

Loading