Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

math/rand: Read should return consistent results, regardless of buffer size #16124

Closed
pin opened this issue Jun 20, 2016 · 12 comments
Closed

math/rand: Read should return consistent results, regardless of buffer size #16124

pin opened this issue Jun 20, 2016 · 12 comments

Comments

@pin
Copy link
Contributor

pin commented Jun 20, 2016

Read returns byte sequence that depends on the buffer length. That's because Rand uses Int63 value to produce 7 random bytes and throws away unused bits from Int63 when done with the particular call. Next call always starts with fresh Int63 leaving "gap" from previous Read if it was not finished on 7-bytes boundary.

Therefore, you can't fix seed and read pseudo-random bytes deterministically without fixing buffer size.

Proposed fix

My proposal is to keep unused Int63 reminder in the Rand struct. It let us fix the issue and still have the same bytes from the first Read call as we have now. The performance hit is minimal also. The downside is an extra state in Rand struct.

One of the other options is to use only single byte from each Int63 call. No state in Rand but the 2x-4x performance hit.

  1. What version of Go are you using (go version)?
    go version devel +f3d5478 Fri Jun 17 19:15:29 2016 +0000 linux/amd64
  2. What operating system and processor architecture are you using (go env)?
GOARCH="amd64"
GOOS="linux"
  1. What did you do?
    This is play-adopted unit-test using testing/iotest that fails: https://play.golang.org/p/dbz-VlFPCa
  2. What did you expect to see?
    The same random bytes stream no matter read buffer size used.
@adg adg changed the title math/rand Read implements io.Reader incorrectly math/rand: Read should return consistent results, regardless of buffer size Jun 20, 2016
@adg adg added this to the Go1.8 milestone Jun 20, 2016
@adg
Copy link
Contributor

adg commented Jun 20, 2016

Seems like something that's easily fixed.

The one concern I see is compatibility. If someone relies on this broken behavior, their program may cease to function correctly. However, I don't think we have made any promises about the stability of math/rand as a pseudo-random source. And this is certainly a bug. So the compatibility concern is probably not relevant.

@randall77
Copy link
Contributor

I don't see the point behind this proposal. If you're giving math/rand a seed (and the same seed multiple times in different runs), it is because you're trying to make some nondeterministic code deterministic. But surely you're also issuing the same sized reads in that case?

Therefore, you can't fix seed and read pseudo-random bytes deterministically without fixing buffer size.

In other words, why would you not fix the buffer size if you were trying to make a computation deterministic?

@gopherbot
Copy link

CL https://golang.org/cl/24251 mentions this issue.

@dsnet
Copy link
Member

dsnet commented Jun 20, 2016

Issue #8013 has some discussion about whether we could change the output of the rand package. The Read method was introduced in Go1.6, so the number of users dependent on it may be low.

That said, if we do change it, should we consider this change for Go1.7 to reduce the amount of time the faulty behavior stays present? I'm almost certain that this change will break some code. The longer people are using and relying on the current behavior, the worse it will be later.

@josharian
Copy link
Contributor

The same random bytes stream no matter read buffer size used.

I'm with @randall77; it's not clear to me why this needs to be the case. Is there a concrete context where you need this?

@pin
Copy link
Contributor Author

pin commented Jun 20, 2016

@randall77 I would like to have deterministic sequence with known distribution but in the same time luxury of dealing with io.Reader which allows me to take buffer size out of the equation and combine/nest/wrap it without worrying about buffer size.

Say io.CopyN one instance to hash.Hash implementation, then send stream produced with other instance (made with the same seed) over the network and write it to the same hash.Hash implementation on the other end. Buffer size will differ for both cases, but I would expect to get the same hash sum.

This is the natural way of thinking for me:

  • PRNG generates a sequence of random values that is determined by a seed
  • io.Reader is a way to read/receive a sequence of bytes, a buffer is just the way to get the data
  • io.Reader implemented by rand.Rand is a way to generate sequence of bytes determined by a seed

@josharian The reasonable expectation I believe is that, for instance, io.LimitReader(rand.New(rand.NewSource(seed), length) is the way to produce pseudo-random byte stream of given length and that's content is determined by the seed only.

So that one could send that stream over the network, store on disk, etc. Then afterwards read it back (i.e. get io.Reader with the same stream but now coming from the network, disk, etc), create the second stream using io.LimitReader/rand.Reader with the same seed/length and validate data (read both streams in turn and compare bytes).

It is convenient because one can use arbitrary-sized streams that could exceed RAM size, have many of them and just keep seed and length in order to be able to validate content. It is sound way to do some kinds of testing of network device, storage service, etc. And doing such testing one can't always fix buffer size because it could be determined by say network buffers (or say RPC mechanism and validator implementation could have different buffers).

One of the real-world examples is TFTP library unit-tests, see https://github.com/pin/tftp/blob/master/tftp_test.go#L126 for instance. There is roll-your-own io.Reader wrapper around rand.Source exactly because of the this issue: when I implement validator I don't want to take TFTP library buffer size into the equation.

In general, I would expect that provided the same seed and calling no other methods on Rand except Read (and wrapping it into something like io.LimitReader actually enforces that) I would get the same stream of bytes no matter how io.Reader is read.

@adg
Copy link
Contributor

adg commented Jun 21, 2016

I agree with @dsnet that this might be worth getting in for 1.7.

@adg adg modified the milestones: Go1.7Maybe, Go1.8 Jun 21, 2016
@randall77
Copy link
Contributor

I see. It does sound useful to do io.LimitReader(rand.New(rand.NewSource(seed), length) to generate a deterministic random-looking stream.

One way to get the determinism you want would be to wrap rand.NewSource with a iotest.OneByteReader. And then probably a bufio.Reader for efficiency. But you've convinced me, I'm ok with doing this directly in rand.

@josharian
Copy link
Contributor

I am a bit concerned about the backwards compatibility still. We've declined to do reasonable things with rand in the past. I wonder whether it would be preferable to do this in 1.8 and fix those other bugs as well, all in one release. If this were introduced in 1.7 the case for changing it now would be clear.

@randall77
Copy link
Contributor

Another case where we decided not to change math/rand because of compatibility issues:

#13215

@pin
Copy link
Contributor Author

pin commented Jun 22, 2016

@josharian, @randall77, I remember reading through #13215 some time ago because of the concern related to backwards compatibility and trying to figure out the extent it can be changed now.

My assumption is that the fix of existing io.Reader implementation without the API change is the good trade-off between breaking things and fixing things:

  • Leaving Read as it is now is the only fully backward compatible approach but the longer it like this the more code rely on inconsistent results. The same reasoning could be applied to some extent to the case above but:
  • On the other side code which would depend on consistent results now written in more complex roll-your-own way. In contrast with the case above.
  • And current implementation is somewhat deceptive in my experience. I was trying to find a bug in my code before I realized there are different streams when sending data over the network (small buffer) vs ioutil.ReadAll. And it looks like you have some wired bug in your code because data at the start of the stream (from a first read) matches! o_O

Other more radical changes I was thinking about are:

  • Pulling io.Reader implementation into a separate struct (say rand.Reader) would eliminate not so obvious interference between Read and other methods and leave rand.Rand simple convenience wrapper around rand.Source without any additional state. But it is a bigger breaking change. And on the other hand is not that surprising that calling other methods on instance would affect Read: that happens to File for example when one calls Seek.
  • I'm curious why rand source is not just io.Reader itself? I suspect that might be related to the fact it can't be just io.Reader, but more like ReadSeeder so other io.Reader implementations would not satisfy it anyway, therefore, it is not that beneficial.

@gopherbot
Copy link

CL https://golang.org/cl/24520 mentions this issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants