Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bytes: add Buffer.Available and Buffer.AvailableBuffer #53685

Closed
sirkon opened this issue Jul 5, 2022 · 23 comments
Closed

bytes: add Buffer.Available and Buffer.AvailableBuffer #53685

sirkon opened this issue Jul 5, 2022 · 23 comments

Comments

@sirkon
Copy link

sirkon commented Jul 5, 2022

Hi!

I propose to introduce a new method to bytes.Buffer:

// Extend grows buffer by n bytes, moves write position at the end of 
// just grown data and return a part of buffer what extended the
// previous content.
func (b *Buffer) Extend(n int) []byte {
    b.Grow(n)
    l := len(b.buf)
    b.buf = b.buf[:l + n]
    return b.buf[l:]
}

Rationale.

Grow combined with Write is nice, but some APIs, binary.PutUvarint for instance, works with slices of bytes. This means memory copying will be needed in order to write uvarint-encoded data. Lots of memory copies in my case where I have manual serialization for some data structures.

I guess this method would be handy in cases like mine, where I would:

  1. Extend buffer by the length of the serialized data.
  2. Will serialize right into the returned slice.
@sirkon sirkon added the Proposal label Jul 5, 2022
@gopherbot gopherbot added this to the Proposal milestone Jul 5, 2022
@sirkon sirkon changed the title proposal: affected/package: bytes proposal: affected/package: bytes, extend buffer. Jul 5, 2022
@mdlayher mdlayher changed the title proposal: affected/package: bytes, extend buffer. proposal: bytes: add Buffer.Extend Jul 5, 2022
@dsnet
Copy link
Member

dsnet commented Jul 5, 2022

I agree that we need a better way to have bytes.Buffer interact with []byte, but I believe it is wrong to target Put-like operations where you need to know the size of the buffer up-front †. Doing so would greatly limit the ways this API can be used. Put-like operation are much rarer compared to append-like APIs (e.g.,
strconv.AppendXXX, fmt.AppendXXX, binary.AppendXXX, utf8.AppendRune, hash.Sum, etc.)

In #47527, we added bufio.Writer.AvailableBuffer so that it could be used together with append-like APIs.
I believe it would be better to provide a method like:

// AvailableBuffer returns an empty buffer with non-zero capacity.
// This buffer is intended to be appended to and
// passed to an immediately succeeding Write call.
func (b *Buffer) AvailableBuffer() []byte {
    if cap(b.buf) - len(b.buf) {
        b.grow(1)
    }
    return b.buf[len(b.buf):]
}

Thus, usage would look like:

var bb *bytes.Buffer = ...
b := bb.AvailableBuffer()
b = binary.AppendUvarint(b, ...)
bb.Write(b)

Note that this operation is allocation-free and copy-free in the case where the Buffer has enough capacity. The Write operation uses copy under the hood, which is a noop if the source and destination are the same.

Aside: If we want to be consistent with bufio.Writer, we could also add:

func (b *Buffer) Available() int {
    return cap(b.buf) - len(b.buf)
}

† Also, most Put-like operations have had an append-like equivalent added in recent years. Examples:

  • utf8.AppendRune instead of utf8.EncodeRune
  • binary.AppendUvarint instead of binary.PutUvarint
  • binary.LittleEndian.AppendUint64 instead of binary.LittleEndian.PutUint64

If other Put-like operations lack an append-like equivalent, we should propose that one be added.

@lmb
Copy link
Contributor

lmb commented Sep 29, 2022

I've often wished for a way to get at the underlying buffer so I support this proposal. It seems like this has fallen through the cracks maybe, since there has been no update since July 6th?

@dsnet could you explain why you'd have AvailableBuffer guarantee non-zero capacity? Seems like that is different from bufio.Writer.AvailableBuffer.

@dsnet
Copy link
Member

dsnet commented Sep 29, 2022

I can't remember why it guarantees non-zero capacity. It isn't strictly necessary.

@ianlancetaylor
Copy link
Contributor

This proposal is in the list of incoming proposals, of which there are many. It hasn't fallen through the cracks.

@dsnet
Copy link
Member

dsnet commented Sep 29, 2022

I remember now. If the usage pattern is:

var bb *bytes.Buffer = ...
b := bb.AvailableBuffer()
b = binary.AppendUvarint(b, ...)
bb.Write(b)

we would want the allocation to occur within AvailableBuffer so that the internal buffer of Buffer is grown first. If the allocation occurs in binary.AppendUvarint, then it will allocate again when we call bb.Write.

It's a small optimization to reduce the amount of allocations when an empty Buffer is first used.

@lmb
Copy link
Contributor

lmb commented Oct 3, 2022

Ian: I didn't mean this as a criticism, sorry if it came across that way. It's not clear from the outside whether a proposal has been evaluated to go into active or not.

To add something to the discussion, I've been playing around with AvailableBuffer and friends and came up with the following two functions:

func readBuffer(r io.Reader, n int) ([]byte, error) {
	switch r := r.(type) {
	case *bufio.Reader:
		buf, err := r.Peek(n)
		if errors.Is(err, bufio.ErrBufferFull) {
			// We've requested a buffer than is larger than the reader. Fall
			// back to allocating a new slice.
			break
		}
		if err != nil {
			return nil, err
		}

		// Advance the internal state of r by reading into the already peeked
		// buffer. There is no additional copy, since dst == src.
		_, err = r.Read(buf)
		return buf, err

	case *bytes.Buffer:
		buf := r.Next(n)
		if len(buf) == 0 {
			return nil, io.EOF
		} else if len(buf) < n {
			return nil, io.ErrUnexpectedEOF
		}

		return buf, nil
	}

	buf := make([]byte, n)
	_, err := io.ReadFull(r, buf)
	return buf, err
}

func writeBuffer(w io.Writer, n int) []byte {
	switch w := w.(type) {
	case *bufio.Writer:
		if w.Available() < n {
			break
		}

		return w.AvailableBuffer()

	// XXX: Need bytes.Buffer.AvailableBuffer here.
	}

	return make([]byte, 0, n)
}

With these two functions it's possible to wrap append-style encoding APIs into functions that take io.Reader / io.Writer which don't allocate in the common case. One example where I would like to see this is binary.Read and binary.Write.

Some more observations:

  • Having both bytes.Buffer.AvilableBuffer and bufio.Writer.AvailableBuffer with the same semantics would allow switching on interface{AvailableBuffer() []byte; io.Writer } in writeBuffer and would allow users to benefit from the same optimization with custom readers.
  • readBuffer is more complicated since Peek() and Next() have distinct semantics. Using Peek + Read this way is counterintuitive, since one has to know that the copy in Read is elided for this sequence to make any sense. OTOH Next() requires synthesising io.EOF / io.UnexpectedEOF in the caller.

P.S.: Should we split AvailableBuffer into a new, distinct proposal?

@dsnet
Copy link
Member

dsnet commented Oct 3, 2022

@lmb:

  • In writeBuffer, it is probably more efficient to call w.Grow if the available buffer is too small.
  • In readBuffer, it is generally more common to use r.Peek with r.Discard.

@ianlancetaylor
Copy link
Contributor

@lmb I didn't take it as criticism, I was just explaining the current status.

@lmb
Copy link
Contributor

lmb commented Oct 4, 2022

  • In writeBuffer, it is probably more efficient to call w.Grow if the available buffer is too small.

Something like this (if and when bytes.Buffer supports AvailableBuffer())?

	case *bytes.Buffer:
		w.Grow(n)
		return w.AvailableBuffer()

@gopherbot
Copy link

Change https://go.dev/cl/440135 mentions this issue: encoding/binary: avoid allocations for common Readers / Writers

@rsc
Copy link
Contributor

rsc commented Feb 1, 2023

Given that we added https://pkg.go.dev/bufio#Writer.AvailableBuffer in Go 1.18, it sounds like we should use the same pattern in bytes.Buffer to allow the same idioms?

@rsc
Copy link
Contributor

rsc commented Feb 1, 2023

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@dsnet dsnet changed the title proposal: bytes: add Buffer.Extend proposal: bytes: add Buffer.AvailableBuffer Feb 7, 2023
@rsc
Copy link
Contributor

rsc commented Feb 8, 2023

It sounds like people are happy with AvailableBuffer() []byte like in bufio.Writer.
Do we also need to add Available() int for alignment between bufio.Writer, or is that not important?
If there's not a strong argument for including Available() int, we can leave it out to start.

@dsnet
Copy link
Member

dsnet commented Feb 8, 2023

I have a use for Available to compute whether I should manually grow the buffer to reduce the probability of allocations later on with append.

Something like:

// Always ensure at least 25% capacity remains before the next batch of append operations.
if b.Available() < b.Len()/4 {
    b.Grow(b.Available()+1)
}

That said, this is no different than:

if cap(b.AvailableBuffer()) < b.Len()/4 {
    b.Grow(cap(b.AvailableBuffer())+1)
}

but isn't as readable.

@gopherbot
Copy link

Change https://go.dev/cl/469558 mentions this issue: encoding/json: use append-like operations for encoding

@rsc rsc changed the title proposal: bytes: add Buffer.AvailableBuffer proposal: bytes: add Buffer.Available and Buffer.AvailableBuffer Feb 22, 2023
@rsc
Copy link
Contributor

rsc commented Feb 22, 2023

Have all concerns about this proposal been addressed?

@hherman1
Copy link

This is more of an aside and not a specific concern: I find it surprising that bytes.Buffer has been fine without this method for so long (and it seems to me like a frequently used struct) and so I’m curious if people have an opinion about why this is necessary now/what changed.

@dsnet
Copy link
Member

dsnet commented Feb 23, 2023

This feature was technically always possible where:

b := bb.AvailableBuffer()

is semantically identical to:

b := bb.Bytes()[bb.Len():]

but doing so depended upon undocumented guarantees of Buffer.Bytes.

Years ago, I tried modifying the documentation on Bytes to guarantee that this was safe. I never followed through on that CL, but have relied on this property for performance in several applications over the years. It's perhaps the only performant way to use append-like APIs with the Buffer type.

@lmb
Copy link
Contributor

lmb commented Feb 24, 2023

I’m curious if people have an opinion about why this is necessary now/what changed.

Pure anecdata / speculation, I think that people who found performance with Buffer to be lacking switched to []byte. That has the downside of bifurcating APIs into "takes a byte slice" and "takes an io.Writer". With AvailableBuffer on the two common stdlib buffers it we can write wrappers that bridge Reader / Writer to []byte efficiently for the common case.

gopherbot pushed a commit that referenced this issue Feb 24, 2023
As part of the effort to rely less on bytes.Buffer,
switch most operations to use more natural append-like operations.
This makes it easier to swap bytes.Buffer out with a buffer type
that only needs to support a minimal subset of operations.

As a simplification, we can remove the use of the scratch buffer
and use the available capacity of the buffer itself as the scratch.
Also, declare an inlineable mayAppendQuote function to conditionally
append a double-quote if necessary.

Performance:

    name              old time/op    new time/op    delta
    CodeEncoder          405µs ± 2%     397µs ± 2%  -1.94%  (p=0.000 n=20+20)
    CodeEncoderError     453µs ± 1%     444µs ± 4%  -1.83%  (p=0.000 n=19+19)
    CodeMarshal          559µs ± 4%     548µs ± 2%  -2.02%  (p=0.001 n=19+17)
    CodeMarshalError     724µs ± 3%     716µs ± 2%  -1.13%  (p=0.030 n=19+20)
    EncodeMarshaler     24.9ns ±15%    22.9ns ± 5%    ~     (p=0.086 n=20+17)
    EncoderEncode       14.0ns ±27%    15.0ns ±20%    ~     (p=0.365 n=20+20)

There is a slight performance gain across the board due to
the elimination of the scratch buffer. Appends are done directly
into the unused capacity of the underlying buffer,
avoiding an additional copy. See #53685

Updates #27735

Change-Id: Icf6d612a7f7a51ecd10097af092762dd1225d49e
Reviewed-on: https://go-review.googlesource.com/c/go/+/469558
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Auto-Submit: Joseph Tsai <joetsai@digital-static.net>
Reviewed-by: Bryan Mills <bcmills@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Run-TryBot: Joseph Tsai <joetsai@digital-static.net>
@rsc
Copy link
Contributor

rsc commented Mar 1, 2023

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

@rsc
Copy link
Contributor

rsc commented Mar 8, 2023

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

@rsc rsc changed the title proposal: bytes: add Buffer.Available and Buffer.AvailableBuffer bytes: add Buffer.Available and Buffer.AvailableBuffer Mar 8, 2023
@rsc rsc modified the milestones: Proposal, Backlog Mar 8, 2023
@gopherbot
Copy link

Change https://go.dev/cl/474635 mentions this issue: bytes: add Buffer.Available and Buffer.AvailableBuffer

@gopherbot
Copy link

Change https://go.dev/cl/499617 mentions this issue: doc/go1.21: document new bytes.Buffer methods

gopherbot pushed a commit that referenced this issue May 31, 2023
For #53685

Change-Id: I237297d19afeb36ad738074d0c61caa7012f65ac
Reviewed-on: https://go-review.googlesource.com/c/go/+/499617
Reviewed-by: Eli Bendersky <eliben@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Bypass: Ian Lance Taylor <iant@google.com>
@dmitshur dmitshur modified the milestones: Backlog, Go1.21 Jun 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

No branches or pull requests

8 participants