Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bufio: set default buffer size from *bytes.Reader #39332

Closed
josharian opened this issue May 31, 2020 · 21 comments
Closed

bufio: set default buffer size from *bytes.Reader #39332

josharian opened this issue May 31, 2020 · 21 comments
Labels
FrozenDueToAge help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance Suggested Issues that may be good for new contributors looking for work to do.
Milestone

Comments

@josharian
Copy link
Contributor

josharian commented May 31, 2020

I just came across some code that does this rather natural thing:

r := bufio.NewReader(bytes.NewReader(buf))
//  use r to do bufio-ish things

However, len(buf) is typically low, much smaller than the default buffer size of 4096. I think that bufio.NewReader should check whether r is a *bytes.Reader, and if so, use r.Len() to size the buffer (assuming r.Len() is less than 4096).

Implementation is easy, and would be a good first contribution for someone.

Any objections?

cc @griesemer @bradfitz @ianlancetaylor per golang.org/s/owners

@josharian josharian added help wanted Performance Suggested Issues that may be good for new contributors looking for work to do. labels May 31, 2020
@josharian josharian added this to the Go1.16 milestone May 31, 2020
josharian added a commit to josharian/sdp that referenced this issue May 31, 2020
The default bufio.Reader buffer size is 4096 bytes.
That's far larger than most SDPs.
Size it to match the input.
We could do with smaller yet, but this is simple
and easy and pretty high impact. I also filed
golang/go#39332.

name         old time/op    new time/op    delta
Unmarshal-8    9.44µs ± 5%    8.32µs ± 0%  -11.83%  (p=0.001 n=4+13)

name         old alloc/op   new alloc/op   delta
Unmarshal-8    7.50kB ± 0%    4.04kB ± 0%  -46.10%  (p=0.000 n=4+15)

name         old allocs/op  new allocs/op  delta
Unmarshal-8       133 ± 0%       133 ± 0%     ~     (all equal)
josharian added a commit to josharian/sdp that referenced this issue May 31, 2020
The default bufio.Reader buffer size is 4096 bytes.
That's far larger than most SDPs.
Size it to match the input.
We could do with smaller yet, but this is simple
and easy and pretty high impact. I also filed
golang/go#39332.

name         old time/op    new time/op    delta
Unmarshal-8    9.44µs ± 5%    8.32µs ± 0%  -11.83%  (p=0.001 n=4+13)

name         old alloc/op   new alloc/op   delta
Unmarshal-8    7.50kB ± 0%    4.04kB ± 0%  -46.10%  (p=0.000 n=4+15)

name         old allocs/op  new allocs/op  delta
Unmarshal-8       133 ± 0%       133 ± 0%     ~     (all equal)
@bradfitz
Copy link
Contributor

Sounds reasonable. The dependency is already there anyway.

We could in theory do even better though and use the same underlying array so bufio doesn't need to allocate its own of any size.

@josharian
Copy link
Contributor Author

In theory yes but I worry about Hyrum’s Law and people modifying the buffer underfoot. Maybe we should start there early in cycle and fall back to the less aggressive optimization if needed.

@qingyunha
Copy link
Contributor

Should strings.Reader add here ?

@josharian
Copy link
Contributor Author

@qingyunha i don’t think there’s a bufio->strings dependency yet, and I’d be reluctant to add one.

@bcmills
Copy link
Contributor

bcmills commented Jun 1, 2020

If the optimization is based on r.Len(), would it make sense to check for a Len() int method in general rather than *bytes.Reader in particular? (Or are we worried about Len() int methods with other semantics?)

@bradfitz
Copy link
Contributor

bradfitz commented Jun 1, 2020

Or are we worried about Len() int methods with other semantics?

We have been in the past. This same issue came up earlier for http.NewRequest's type switch:

case *bytes.Buffer:
and we decided just looking for a method wasn't enough.

@bradfitz
Copy link
Contributor

bradfitz commented Jun 1, 2020

(But if we wanted, we could use a combination of reflect + interface assertion to get the strings.Reader length without importing strings, and still verify it's only pkg strings and not something else with Len)

@bradfitz
Copy link
Contributor

bradfitz commented Jun 1, 2020

The comment from Bradford Lamson-Scribner has been deleted since I started replying, but FWIW, you can obtain the bytes.Reader's underlying buffer like: https://play.golang.org/p/t0o_t_lO3ck

That kinda violates the WriterTo contract at least in spirit, but limited shenanigans are generally permitted within the standard library when sufficiently documented & tested. Better than unsafe at least.

@bradford-hamilton
Copy link
Contributor

@bradfitz thank you! I saw @josharian post this issue, and was interested in putting together a CL. Not long after posting, I realized it could be a great first contribution for someone (maybe depending on which implementation is preferred) and sort of took my hat out of the ring. I will keep 👀 on this though, if no one else gains interest - happy to work on it.

@josharian
Copy link
Contributor Author

@bradford-hamilton I say go for it. I’d be tempted to do two CLs, first only using Len, then sharing the underlying buffer, so that we can roll back the second one independently if needed.

@bradfitz instead of WriteTo, seems like just Bytes would do it. What am I missing?

@bcmills yeah, I wish we could use a method, and we probably could have long ago. Probably not now, sadly.

@bcmills
Copy link
Contributor

bcmills commented Jun 1, 2020

@josharian, *bytes.Reader does not have a Bytes() []byte method. (Given that the slice can be extracted using other shenanigans, the lack of that method seems like an oversight, but perhaps that is for a separate issue.)

@josharian
Copy link
Contributor Author

@bcmills oops, indeed. Thanks.

@bradford-hamilton
Copy link
Contributor

@josharian sounds good, I'll start with a CL using the bytes.Reader's Len(), and follow it up with another that uses the same underlying byte slice.

Sounds like @bcmills suggestion of a Bytes() []byte method would be another straightforward follow up CL if we decide it makes sense to have it.

@josharian
Copy link
Contributor Author

Adding new API requires opening a new proposal issue first. Feel free to do that!

@bradford-hamilton
Copy link
Contributor

bradford-hamilton commented Jun 1, 2020

Sure, that makes sense to me. I will put some thought into it and maybe put something together after feeling it out a bit in these CLs. I'll keep this conversation posted.

@gopherbot
Copy link

Change https://golang.org/cl/235977 mentions this issue: bufio: update bufio's NewReaderSize default sizing logic

@bcmills
Copy link
Contributor

bcmills commented Jun 1, 2020

It occurs to me that the existence of Reset methods could make this a major regression for a fairly large class of programs.

Consider what happens if, say, the *bytes.Reader is reset after the bufio.Reader is initialized:
https://play.golang.org/p/NFSbXOU7GAq

Or what happens if the bufio.Reader itself is reset:
https://play.golang.org/p/NFSbXOU7GAq

Programs probably already assume that the buffer size of a *bufio.Reader does not change over time, so if we pick an inappropriate default initially, we can't change it.

In contrast, a poor size is trivial to fix on the caller side when appropriate:

	var r *bufio.Reader
	if br, ok := b.(*bytes.Reader); ok {
		r = bufio.NewReaderSize(br, br.Len())
	} else {
		r = bufio.NewReader(b)
	}

@dmitshur dmitshur added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jun 2, 2020
@josharian
Copy link
Contributor Author

@bcmills I’m still pondering.

I wonder whether we should track whether we auto-sizes the buffer originally and be prepared to switch to a large buffer if needed during Reset. Then we potentially have two allocations instead of one, but at most two. That’s for the “use Len” approach.

For the more invasive, “re-use the existing buffer” approach, i think it is clearer what to do in Reset—if we are working with a borrowed buffer, switch to a new borrowed buffer (or allocate a new one if there isn’t one to borrow).

If that’s all too clever, maybe we should just address this with documentation. (And a staticcheck check? cc @dominikh )

@cespare
Copy link
Contributor

cespare commented Jun 3, 2020

I think that's too clever. I'm not even convinced this is worth a staticcheck warning.

r := bufio.NewReader(bytes.NewReader(buf))

This is probably a common pattern inside tests, but how much does the wasted 4k matter in a test? And how often does it happen in non-test code?

Just looking at the Go tree, there are 4 instances of the string bufio.NewReader(bytes.NewReader( and they are all in test files. 23/26 instances of bufio.NewReader(strings.NewReader( are in test files. (And one of the three others is this questionable line.)

@josharian
Copy link
Contributor Author

FWIW, I came across it in real code by profiling an actual server.

@josharian
Copy link
Contributor Author

Sounds like the consensus is that this isn’t possibly to do now in a way that is simple and uniformly helpful.

josharian added a commit to josharian/sdp that referenced this issue Jun 6, 2020
The default bufio.Reader buffer size is 4096 bytes.
That's far larger than most SDPs.
Size it to match the input.
We could do with smaller yet, but this is simple
and easy and pretty high impact. I also filed
golang/go#39332.

name         old time/op    new time/op    delta
Unmarshal-8    9.44µs ± 5%    8.32µs ± 0%  -11.83%  (p=0.001 n=4+13)

name         old alloc/op   new alloc/op   delta
Unmarshal-8    7.50kB ± 0%    4.04kB ± 0%  -46.10%  (p=0.000 n=4+15)

name         old allocs/op  new allocs/op  delta
Unmarshal-8       133 ± 0%       133 ± 0%     ~     (all equal)
josharian added a commit to josharian/sdp that referenced this issue Jun 7, 2020
The default bufio.Reader buffer size is 4096 bytes.
That's far larger than most SDPs.
Size it to match the input.
We could do with smaller yet, but this is simple
and easy and pretty high impact. I also filed
golang/go#39332.

name         old time/op    new time/op    delta
Unmarshal-8    9.44µs ± 5%    8.32µs ± 0%  -11.83%  (p=0.001 n=4+13)

name         old alloc/op   new alloc/op   delta
Unmarshal-8    7.50kB ± 0%    4.04kB ± 0%  -46.10%  (p=0.000 n=4+15)

name         old allocs/op  new allocs/op  delta
Unmarshal-8       133 ± 0%       133 ± 0%     ~     (all equal)
Sean-Der pushed a commit to pion/sdp that referenced this issue Jun 9, 2020
The default bufio.Reader buffer size is 4096 bytes.
That's far larger than most SDPs.
Size it to match the input.
We could do with smaller yet, but this is simple
and easy and pretty high impact. I also filed
golang/go#39332.

name         old time/op    new time/op    delta
Unmarshal-8    9.44µs ± 5%    8.32µs ± 0%  -11.83%  (p=0.001 n=4+13)

name         old alloc/op   new alloc/op   delta
Unmarshal-8    7.50kB ± 0%    4.04kB ± 0%  -46.10%  (p=0.000 n=4+15)

name         old allocs/op  new allocs/op  delta
Unmarshal-8       133 ± 0%       133 ± 0%     ~     (all equal)
@golang golang locked and limited conversation to collaborators Jun 4, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance Suggested Issues that may be good for new contributors looking for work to do.
Projects
None yet
Development

No branches or pull requests

8 participants