Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: os: File should implement io.ByteReader and io.ByteWriter #22771

Closed
j-delaney opened this issue Nov 16, 2017 · 12 comments
Closed

proposal: os: File should implement io.ByteReader and io.ByteWriter #22771

j-delaney opened this issue Nov 16, 2017 · 12 comments

Comments

@j-delaney
Copy link

j-delaney commented Nov 16, 2017

File already implements multiple io interfaces[0]. Implementing more lets users use File without having to wrap it in another reader.

This is inspired by hitting a bug where gob.Decoder requires an io.Reader but if it does not implement io.ByteReader as well it will wrap it in a bufio.Reader which broke using File.Seek. This was a bad example, please see my newer examples at #22771 (comment)

I already have a patch ready that would add this should this proposal be approved.

[0] Specifically io.Reader, io.Close, io.ReadAt, io.Seeker, io.Writer, and io.WriteAt

@gopherbot gopherbot added this to the Proposal milestone Nov 16, 2017
@j-delaney j-delaney changed the title Proposal: os.File should implement io.ByteReader, io.ByteWriter, and io.ByteScanner Proposal: os.File should implement io.ByteReader and io.ByteWriter Nov 16, 2017
@ianlancetaylor ianlancetaylor changed the title Proposal: os.File should implement io.ByteReader and io.ByteWriter proposal: os: File should implement io.ByteReader and io.ByteWriter Nov 16, 2017
@ianlancetaylor
Copy link
Contributor

My hesitation would be that if you want to read or write a single byte from/to a file it is in fact normally the right thing to do to wrap the os.File in a bufio.Reader, since for os.File the ReadByte and WriteByte implementations will unavoidably be fairly inefficient.

Can you clarify what broke in the case you mentioned?

@ianlancetaylor
Copy link
Contributor

Thanks. It's not the most convincing example since in general gob does not permit bouncing around in the stream. As the package doc explains, gob encodes types only once in the stream, so seeking in the stream while decoding from it will not work in the general case.

@j-delaney
Copy link
Author

Reposting my comment that Ian is replying to (I deleted it since I posted it with the wrong GitHub account):

I see what you mean, it would definitely be less efficient than using bufio. I think it solves a different problem though since File is for immediately writing to disk while bufio is for buffered writes.

I built a minimum-viable-example of the issue I encountered at https://play.golang.org/p/WuDB9zZczd

@j-delaney
Copy link
Author

Yeah, I realize my example use-case is not a supported or normal one. I did some digging on GitHub and found that there are a fair amount of people that are pretty much re-implementing this
proposal:

In general though, I think expanding File to implement more io interfaces is a good idea. I don't think it's out-of-scope for a File to be able to read and write single bytes.

@dsnet
Copy link
Member

dsnet commented Nov 17, 2017

As Ian mentioned, io.ByteReader and io.ByteWriter have significant performance detriments. Since they are almost always going through an interface, they can't be inlined and will always be a separate method call. The compress/flate.Reader goes through a ReadByte for every byte of input and it is one of the reasons why the Go implementation is drastically slower than the C implementation.

I should also note that flate.NewReader is similar to gob.NewDecoder in that it special-cases io.ByteReader. The fact that most things do not implement io.ByteReader is good thing, so that internal buffer occurs. This allows flate to be able to call bufio.Reader.Peek and gain a significant performance benefit. I imagine the same optimization trick can be applied to gob as well.

@itstehkman
Copy link

@ianlancetaylor hmm i read the gob package documentation, and it did not seem clear that it encodes types only once in the stream, or that doing a file.Seek would break gob.Decode.

From the documentation:
"Decode reads the next value from the input stream and stores it in the data represented by the empty interface value. If e is nil, the value will be discarded. Otherwise, the value underlying e must be a pointer to the correct type for the next data item received. If the input is at EOF, Decode returns io.EOF and does not modify e."

Is it implied that since it reads from an input stream, that the seek pointer of whatever implements that io.Reader, must point to the top of the stream? Doesn't seem like it to me, but I may have misread or glanced over something. Maybe it is the documentation that needs changing then.

@dsnet
Copy link
Member

dsnet commented Nov 17, 2017

Anytime buffering is involved, it is almost always a bad idea to be seeking the underlying reader.

Secondly, I don't think of that as a reasonable default to assume. If gob.NewDecoder takes in an io.Reader which is fundamentally a streaming API, I would expect gob.Decoder to also operate in a streaming manner.

@j-delaney
Copy link
Author

I'm learning a lot here, but I think this is going off into a bit of a tangent. I recognize that my gob.NewDecoder example is not a good use-case for this change and I think the discussion around if a user should be able to call Seek on the underlying reader for a Decoder is an entirely separate one.

@cespare
Copy link
Contributor

cespare commented Nov 17, 2017

To focus on the specific proposal: I definitely don't think we should do this because one-byte reads and writes are not efficient. I believe that the valid use cases for this is rare enough that it's better to not provide this functionality which would be ripe for misuse.

I see what you mean, it would definitely be less efficient than using bufio. I think it solves a different problem though since File is for immediately writing to disk while bufio is for buffered writes.

Note that bufio.Writer has Flush for those times that you want to ensure that everything has been written to the underlying writer.

@j-delaney
Copy link
Author

Let me preface this with saying this is my first proposal to Go and I am definitely not anywhere near an expert on the language like the others here are. I've looked for documentation on the design philosophy behind Go but the best I've found is https://golang.org/doc/faq#principles so if anyone has a better link I'd love to read it. I really appreciate all the Golang developers and other experienced Gophers being so patient with me here.

My reasoning was that developers using os.File should already be aware of the performance penalty they are incurring. If there were worries about users being inefficient with File.WriteByte then os.File shouldn't even have Write but instead users should be made to use another (buffered) Writer and then explicitly Flush. There is already an abstraction with File.Write that users are expected to understand.

Is the point against this that File.Write can be used in an efficient or inefficient manner while File.WriteByte will most likely be used inefficiently and will encourage people to default to it rather than using a superior buffered writer?

@ianlancetaylor
Copy link
Contributor

I would say:

  1. os.(*File).ReadByte and WriteByte are not particularly efficient.
  2. There is code that tests whether a value supports ReadByte and, if not, uses bufio.NewReader to get a type that supports ReadByte.
  3. If we add a ReadByte method to os.File, this code will change from being efficient to being inefficient.
  4. Therefore, this change will likely cause real code to silently become less efficient.
  5. That doesn't mean we can't do it, but it means we need a clear benefit.
  6. The examples given above could be supported by the standard library, if we choose, with a simple function that reads a single byte from an io.Reader, using ReadByte if available.
  7. Therefore, I see no clear benefit to this change.

@rsc
Copy link
Contributor

rsc commented Nov 20, 2017

I see what you mean, it would definitely be less efficient than using bufio. I think it solves a different problem though since File is for immediately writing to disk while bufio is for buffered writes.

This inefficiency can't be swept under the rug. Many packages assume that if an io.Reader implements io.ByteReader then using the ReadByte method is about as efficient as buffering the original reader. They would get much much slower if we added an inefficient ReadByte to os.File (for example, compress/bzip2, compress/flate, compress/zlib, encoding/gob, encoding/xml all do this).

@rsc rsc closed this as completed Nov 20, 2017
@golang golang locked and limited conversation to collaborators Nov 20, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants