proposal: bufio: add Reader.Buffer #50960

dsnet · 2022-02-01T23:00:36Z

The Reader type is initialized with a buffer size. Once set, there is no API to adjust this.

This can make it challenging when parsing a file with a combination of Peek and Discard calls.
One such pattern is something like:

var r *bufio.Reader = ...
for {
    buf, err := r.Peek(headerSize)
    if err != nil {
        return err
    }
    r.Discard(len(buf))
    hdr := parseHeader(buf)

    buf, err := r.Peek(hdr.Size)
    if err != nil {
        return err
    }
    r.Discard(len(buf))
    processData(buf)
}

The code above works just fine, but fails unexpectedly when the payload size exceeds the internal buffer of Reader.

The current workaround is to check for ErrBufferFull, allocate a separate buffer, and read the expected number of bytes into that buffer. The workaround looks like:

  var r *bufio.Reader = ...
  for {
      ...

      buf, err := r.Peek(hdr.Size)
-     if err != nil {
-         return err
-     }
-     r.Discard(len(buf))
+     switch {
+     case err == nil:
+         r.Discard(len(buf))
+     case err == bufio.ErrBufferFull:
+         buf = make([]byte, hdr.Size)
+         if _, err := io.ReadFull(r, buf); err != nil {
+             return err
+         }
+     default:
+         return err
+     }
      processData(buf)
  }

It is unfortunate that I need to allocate a relatively large buffer in order to read large records and that I'm not able to reuse that buffer for subsequent Reader operations.

I propose the addition of Reader.Buffer, with similar semantics as Scanner.Buffer:

Buffer sets the buffer to use when reading and the maximum size of buffer that may be allocated during reading. The maximum Peek or ReadSlice size is the larger of max and cap(buf). If max <= cap(buf), Reader will use this buffer only and do no allocation.

Buffer panics if it is called after scanning has started.

Using this new API, the code above would look like:

  var r *bufio.Reader = ...
+ r.Buffer(nil, 64<<20) // we never expect records to exceed 64MiB
  for {
      ...
  }

\cc @josharian

The text was updated successfully, but these errors were encountered:

dsnet · 2022-02-02T06:05:35Z

The workaround does not work in a situation where temporary errors are expected as you can't "unread" the buffer. This is one of the reasons why Peek+Discard is a useful pattern since Peek does not mutate the state of the bufio.Reader until you're ready to advance it.

rsc · 2022-02-02T18:50:20Z

Reader has a fixed size buffer set at the beginning of time. I can't quite tell which of a few things you are asking for:

The ability to make bufio.Reader expand its buffer as needed up to a given maximum.
The ability to change the buffer size of bufio.Reader on the fly, by providing a new buffer.
Something else?

I am assuming you don't want to use bufio.NewReaderSize(r, 64<<20) followed by r.Reset to reuse the buffer, because you don't want every Reader to be sitting on 64 MB?

dsnet · 2022-02-02T18:52:31Z

The ability to make bufio.Reader expand its buffer as needed up to a given maximum.

Yes.

The ability to change the buffer size of bufio.Reader on the fly, by providing a new buffer.

No. The proposed Buffer method would not allow this after reading similar to how Scanner.Buffer is not allowed after scanning has started.

I am assuming you don't want to use bufio.NewReaderSize(r, 64<<20) followed by r.Reset to reuse the buffer, because you don't want every Reader to be sitting on 64 MB?

Exactly. Start with something small (e.g., 4KiB) and permit growing up to some maximum specified limit.

rsc · 2022-02-02T19:07:44Z

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

rsc · 2022-02-09T18:16:00Z

Do you think you need the ability to set the initial slice yourself, or are you OK with bufio always allocating it?
I've always found the Scanner API a bit difficult to remember and am not sure about mimicking it here.

If we don't need the ability to set the slice ourselves, maybe we could

Make NewReaderSize(64<<20) start out with a smaller allocation, so that you get the auto-growing as needed?
Add a new SetMaxSize(int) method to enable growing?

Do either of those work for you? Any thoughts on which is better?

dsnet · 2022-02-09T18:24:00Z

I've always found the Scanner API a bit difficult to remember

Our use case didn't care about setting the slice. I've personally never used the "set the slice" feature in bufio.Scanner.

Do either of those work for you? Any thoughts on which is better?

Both options work for us. Option 2 is more flexible as it allows modifying the buffer size after the fact, but then you'll need to deal with what happens when the buffer is set to be smaller than what's currently utilized.

rsc · 2022-02-16T18:24:35Z

Let's figure out whether we can do without API changes, since that's always preferable.
Suppose we make NewReaderSize(N) for any N > 4k start with a 4k buffer and then double up to N
(the last step might not be a complete doubling).
This could possibly trip up new programs on old releases, but there are probably not many of those,
since this is not going to be widely used.

Any reasons not to take this approach?

dsnet · 2022-02-16T19:01:10Z

That approach SGTM, but this behavior is potentially observable through the Read method call based on the number of bytes that it returns. It's not valid for programs to depend on this artifact.

I can make a prototype CL and we can run through global testing and see how many things break and whether it's legitimate.

rsc · 2022-02-23T18:30:29Z

Sounds good, thanks Joe.

rsc · 2022-03-02T18:30:55Z

Waiting on a prototype CL from Joe to see if we can get away without an API change here.

rsc · 2022-03-23T17:56:17Z

Placing on hold for prototype CL. Please feel free to move out of hold once it is ready.

rsc · 2022-03-23T18:13:28Z

Placed on hold.
— rsc for the proposal review group

dsnet added the Proposal label Feb 1, 2022

gopherbot added this to the Proposal milestone Feb 1, 2022

ianlancetaylor added this to Incoming in Proposals (old) Feb 2, 2022

rsc moved this from Incoming to Active in Proposals (old) Feb 2, 2022

rsc moved this from Active to Hold in Proposals (old) Mar 23, 2022

rsc added the Proposal-Hold label Mar 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal: bufio: add Reader.Buffer #50960

proposal: bufio: add Reader.Buffer #50960

dsnet commented Feb 1, 2022 •

edited

dsnet commented Feb 2, 2022

rsc commented Feb 2, 2022

dsnet commented Feb 2, 2022

rsc commented Feb 2, 2022

rsc commented Feb 9, 2022

dsnet commented Feb 9, 2022

rsc commented Feb 16, 2022

dsnet commented Feb 16, 2022

rsc commented Feb 23, 2022

rsc commented Mar 2, 2022

rsc commented Mar 23, 2022

rsc commented Mar 23, 2022

proposal: bufio: add Reader.Buffer #50960

proposal: bufio: add Reader.Buffer #50960

Comments

dsnet commented Feb 1, 2022 • edited

dsnet commented Feb 2, 2022

rsc commented Feb 2, 2022

dsnet commented Feb 2, 2022

rsc commented Feb 2, 2022

rsc commented Feb 9, 2022

dsnet commented Feb 9, 2022

rsc commented Feb 16, 2022

dsnet commented Feb 16, 2022

rsc commented Feb 23, 2022

rsc commented Mar 2, 2022

rsc commented Mar 23, 2022

rsc commented Mar 23, 2022

dsnet commented Feb 1, 2022 •

edited