Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bytes: return EOF early in Reader.Read #21852

Closed
destel opened this issue Sep 12, 2017 · 18 comments
Closed

bytes: return EOF early in Reader.Read #21852

destel opened this issue Sep 12, 2017 · 18 comments

Comments

@destel
Copy link

destel commented Sep 12, 2017

I have a bytes.Reader reader and I want copy it contents to bytes.Buffer using ReadFrom method. I create buffer with preallocated space and expect that additional allocations will not happen.

Instead buffer allocates twice more memory than required to store data. This happens because bytes.Reader doesn't return EOF immediately when it reaches end of stream. Instead it returns EOF only on the next call to Read causing bytes.Buffer to double it's underlying storage.

Link to example code https://play.golang.org/p/j7S3hNkEB5

Code consists of 3 blocks.
First one demonstrates bytes.Reader behaviour.
Second one demonstrates bytes.Buffer.ReadFrom behaviour with preallocated space
Third one avoids reallocation by preallocating slightly bigger buffer

My go version is 1.9

@mdempsky mdempsky changed the title Double memory allocation when copying data from bytes.Reader to bytes.Buffer bytes: double memory allocation when copying data from Reader to Buffer Sep 12, 2017
@dsnet dsnet changed the title bytes: double memory allocation when copying data from Reader to Buffer bytes: return EOF early in Reader.Read Sep 12, 2017
@dsnet
Copy link
Member

dsnet commented Sep 12, 2017

We can change bytes.Reader to return io.EOF the moment it hits the end. This is entirely within the contract of io.Reader.

@ianlancetaylor ianlancetaylor added this to the Go1.10 milestone Sep 12, 2017
@destel
Copy link
Author

destel commented Sep 12, 2017

And by the way strings.Reader behaves the same

@destel
Copy link
Author

destel commented Sep 12, 2017

And the last thing. buffer.ReadFrom always does reallocation when there is less than 512 bytes of free space. So currently it's absolutely useless to preallocate small amounts of memory manually.

buffer.ReadFrom function has loop inside, What do you guys think about skipping reallocation on the first iteration of the loop?

Just to give some context, I'll describe my use case.
I have function which does some stream processing (for example encryption) and I know how to calculate output size by input size. I want to create additional function which does the same thing but uses byte slices instead of streams. And I'd like to preallocate needed memory manually and avoid reallocations.
Code below:

func EncryptStream(data io.Reader) io.Reader {
	//do some useful stuff here and return reader of the known (same) size
	return ...
}

func Encrypt(data []byte) []byte {
	res := bytes.NewBuffer(make([]byte, 0, len(data)))

	res.ReadFrom(EncryptStream(bytes.NewReader(data)))

	return res.Bytes()
}

@dsnet
Copy link
Member

dsnet commented Sep 12, 2017

If we fix Reader.Read to return io.EOF eagerly, Buffer.ReadFrom should be allocation free for most cases since b.grow(MinRead) only ensures that there is at least MinRead bytes available (which will be true most of the time). This is only problematic when io.EOF is not returned eagerly, and it must go through the loop a second time.

This will of-course always allocate if Reader.Len < MinRead, but I believe that situation is rare.

@cznic
Copy link
Contributor

cznic commented Sep 13, 2017

We can change bytes.Reader to return io.EOF the moment it hits the end. This is entirely within the contract of io.Reader.

Even though this is true, I'd think twice before fixing what's not broken. I guess there are/may be programs in the wild depending incorrectly on the current behavior.

@bradfitz
Copy link
Contributor

I tried to change bytes and strings Reader once to do early EOF and a gazillion tests broke. I didn't deem it worth it. Hopefully the io.Reader signature would be different in Go 2

@bradfitz
Copy link
Contributor

(that was years ago. Maybe it's gotten better but I doubt it. I suspect more failures)

@dsnet
Copy link
Member

dsnet commented Sep 13, 2017

Sigh... incorrect usages of io.Reader is fairly annoying. I'll do a regression test and see what breaks if we change this.

@dsnet dsnet self-assigned this Sep 13, 2017
@dsnet dsnet added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Sep 13, 2017
@bradfitz
Copy link
Contributor

No, it wasn't incorrect usages. It was just tests expecting a certain golden behavior output from things.

@dsnet
Copy link
Member

dsnet commented Sep 19, 2017

In my test, there were about 40 failures. By far, the most common failure is code doing the equivalent of:

b := make([]byte, n)
if _, err := r.Read(b); err != nil {
    return err // Assume err != nil implies something went drastically wrong.
}
// Use b assuming it read n bytes...

This is a straight up wrong use of io.Reader.

There were a small handful that I guess you could call "golden behavior" changes, and they have to deal with the use of iotest.DataErrReader(bytes.NewReader(...)).

@peterGo
Copy link
Contributor

peterGo commented Oct 1, 2017

@bradfitz

I don't understand why this issue is still open. It's a golang-nuts question.

The issue is an XY problem: "I have a bytes.Reader reader and I want to copy its contents to bytes.Buffer using the ReadFrom method." "The XY problem is asking about your attempted solution rather than your actual problem (http://xyproblem.info)."

The issue is a performance question, but there are no Go benchmarks, only example code (https://play.golang.org/p/j7S3hNkEB5).

The solution to the actual problem, I have a bytes.Reader reader and I want to efficiently copy its contents to bytes.Buffer, is to use io.Copy (an implicit io.WriteTo) or io.WriteTo. Don't use io.ReadFrom.

For example, see BenchmarkCopy and BenchmarkWriteTo,

BenchmarkReadRaw-8            	 2000000     791 ns/op	   10240 B/op    1 allocs/op
BenchmarkReadFromExact-8      	  500000    2493 ns/op	   32160 B/op    4 allocs/op
BenchmarkReadFromExactMin-8   	 1000000    1001 ns/op	   11040 B/op    3 allocs/op
BenchmarkCopy-8               	 2000000     923 ns/op	   10400 B/op    3 allocs/op
BenchmarkWriteTo-8            	 2000000     858 ns/op	   10352 B/op    2 allocs/op

rf_test.go: https://play.golang.org/p/ZB-CtPqJHJ

@peterGo
Copy link
Contributor

peterGo commented Oct 1, 2017

@dsnet

fix Reader.Read to return io.EOF eagerly

I disagree.

@dsnet
Copy link
Member

dsnet commented Oct 1, 2017

@peterGo, I don't understand what your objection actually is.

The use of io.WriterTo is one possible solution, and I think it's a good one, but it doesn't invalidate other approaches that may be taken. There is a more general problem that this is demonstrating, and that is returning io.EOF the moment the end of a stream has been hit is a very useful signal for performance purposes (examples: [1], [2], [3]). Given that bytes.Reader is a very popular implementation of io.Reader, there are many applications that can benefit from this given that the
change is entirely within the contracts of io.Reader and how simple the change is.

That being said, I'm leaning against making any change to bytes.Reader, only because making the change (even though it should be legal) is going to result in breaking a bunch of code that makes poor assumptions about io.Reader. I haven't gotten around to doing an analysis of the cost of fixing these and classifying the extent of the "damage".

@dsnet
Copy link
Member

dsnet commented Nov 8, 2017

Having analyzed the failures, most of them are obviously wrong similar to my example above:

b := make([]byte, n)
if _, err := r.Read(b); err != nil {
    return err // Assumes err != nil always means failure, and forgets to handle io.EOF specially
}

If making this change breaks code that does something similar to the above, then that's a good thing. It's exposing a bug that was always there.

Given that it is after the freeze and this is not a bug fix, I'm pushing this to the next milestone.

@dsnet dsnet modified the milestones: Go1.10, Go1.11 Nov 8, 2017
@bradfitz
Copy link
Contributor

bradfitz commented Nov 8, 2017

SGTM

@dsnet dsnet removed the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Nov 9, 2017
@bradfitz bradfitz modified the milestones: Go1.11, Unplanned May 29, 2018
@pongad
Copy link
Contributor

pongad commented Sep 19, 2018

It looks like no one is working on this, so I'll give it a go. I have a draft CL; will mail after polishing it a little more.

I've been thinking perhaps we should randomly pick whether to return EOF with the last Read. Somehow that idea sounds simultaneously terrible and terribly tempting.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/138588 mentions this issue: bytes: make Reader.Read return EOF early

@rsc
Copy link
Contributor

rsc commented Nov 13, 2018

Sorry but I don't think we can reasonably change this. The bytes and strings Reader.Read behavior is what it is, and that is not returning EOF until there's nothing else to return (just like bufio.Reader.Read). Like Brad said back in 2017.

The original report was not about EOF. It was about trying to an efficient copy from a bytes.Reader into a bytes.Buffer. Don't use ReadFrom for that. Use WriteTo, which can do a better job. Even better, use io.Copy, which is clearer and knows tricks like "WriteTo is better than ReadFrom".

@rsc rsc closed this as completed Nov 13, 2018
@golang golang locked and limited conversation to collaborators Nov 13, 2019
@rsc rsc unassigned dsnet Jun 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

9 participants