Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: io: Add io.ReaderFromBuffer interface #58331

Open
fredwangwang opened this issue Feb 5, 2023 · 7 comments
Open

proposal: io: Add io.ReaderFromBuffer interface #58331

fredwangwang opened this issue Feb 5, 2023 · 7 comments

Comments

@fredwangwang
Copy link

When calling io.CopyBuffer(writer, reader, buf) , it tries to use the io.ReaderFrom interface on writer if it is available. For example net.TcpConn implemets the io.ReaderFrom and will try to use syscalls like splice or sendfile to avoid additional copy in the user space.
This is all well-intended and provides better performance in the case of copying from one net conn to another. However. If the reader is not a real file, but just some struct implements io.Reader, it actually falls back to using io.Copy.

This is problem-some, as the very intention of using io.CopyBuffer is that we get to control the buffer and potentially pool the buffers to avoid GCs. But the generic fallback to io.Copy makes the buffer passed in useless and makes an additional allocation.

This behavior is clearly shown in the following heap profile (left is the buffer passed in, right is the allocation made by genericReadFrom fallback (io.Copy)):
Screenshot from 2023-02-04 21-16-11

This is a very surprising behavior to the end developers without the understanding of TcpConn readfrom internals. As an obvious optimization of using io.CopyBuffer in the hot-path turns out to be even worse than just call io.Copy.

So would propose to add an interface similar to io.ReaderFrom, but allows to take additional buffer, thus the name io.ReaderFromBuffer. This way nice optimization regarding fd copies can still happen, but in case none applies, the genericReadFrom can fallback to io.CopyBuffer instead of io.Copy, to take advantage of the buffer passed in, if there is any.

@gopherbot gopherbot added this to the Proposal milestone Feb 5, 2023
@ianlancetaylor
Copy link
Contributor

See also #16474.

@earthboundkid
Copy link
Contributor

The core issue is that the extended interface is always called if it exists. What one wants is a mechanism like errors.Unwrap or http.ResponseController.Unwrap or fs.ErrNotImplemented to signal that the method doesn't actually implement a direct ReaderFrom/WriterTo and shouldn't be used in a particular case.

@fredwangwang
Copy link
Author

Thanks for the context @ianlancetaylor!

just thinking out..

What one wants is a mechanism like errors.Unwrap or http.ResponseController.Unwrap or fs.ErrNotImplemented to signal that the method doesn't actually implement a direct ReaderFrom/WriterTo

@carlmjohnson It does make sense to me. However, agree with the comment #16474 (comment) here:

... such as io.ErrNotSupported, but it would likely break callers on a massive scale

Considering v2 might not exist ever, to me at least, having something that is non-breaking yet making io.CopyBuffer less surprising has its merit.

Also interestingly mentioned here #16474 (comment)

Conceptually, it seems that ReaderFrom should have a buffer argument and ignore it if it is not needed or nil

this is actually what this proposal about, but in a non API breaking way.


there are two usecase of io.CopyBuffer that I am aware from reading the linked issue:

  1. To optimize.
    I suspect this is the reason most people use io.CopyBuffer. By adding io.ReaderFromBuffer interface and implements it in things like os.File, io.TcpConn, etc. The kernel fast path can still happen behind scene, and if it does not apply, it will gracefully fallback to io.CopyBuffer instead of io.Copy.
    The net result is at least as fast as the unintuitive hack, with the exact same memory footprint
io.CopyBuffer(struct{ io.Writer }{dst}, struct{ io.Reader }{src}, buf)
  1. To shape.
    I am not positive that anyone actually relies on this. But it is mentioned here proposal: io: CopyBuffer should avoid ReadFrom/WriteTo #16474 (comment) in a test case:

... another case where the behavior of io.CopyBuffer was unexpected. The test relied upon copy being done in specific chunk sizes

Now there are two scenarios to this,

  • The struct implements io.ReaderFromBuffer wraps a file and fast path is available (e.g. io.TcpConn). Then shaping is not supported. But I doubt how much value there is to shape the chunk size when it is handling at the kernel level. It is important for kernel drivers for sure, but not so much with go at user space level.
  • The struct implements io.ReaderFromBuffer is just some code. Then shaping can still happen -- the struct can just take the passed in buffer and do whatever it needs with it.

Bottomline, having io.ReaderFromBuffer still does NOT guarantee the passed buffer will be used (if faster path available), but it will not result in unexpected additional allocations like io.ReaderFromdoes.

@earthboundkid
Copy link
Contributor

Just starting returning io.ErrNotSupported in existing methods would break existing clients, yeah. What you'd want to do is to make a new interface, to avoid bikeshedding call it io.ReaderFromV2 and io.WriterToV2, and then those would be allowed to return io.ErrNotSupported, which io.Copy/io.CopyBuffer would know about and be able to deal with.

@ianlancetaylor
Copy link
Contributor

ErrNotSupported is #41198, which was accepted but I've always feel a bit leery about implementing it.

@earthboundkid
Copy link
Contributor

Yes. I wrote on #41198 that the proposal is flawed because “ErrNotSuported” is domain specific. Not supporting http.FlushError is different than not supporting io.ReaderFrom. An overhaul of io.Copy should have io-specific ErrNotSupported variables.

@gopherbot
Copy link

Change https://go.dev/cl/475575 mentions this issue: io: optimize copyBuffer to make use of the user-provided buffer for fallbacks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Incoming
Development

No branches or pull requests

5 participants