encoding/json: incorrect usage of sync.Pool #27735

dsnet · 2018-09-18T17:11:41Z

https://golang.org/cl/84897 introduces a denial-of-service attack on json.Marshal via a live-lock situation with sync.Pool.

Consider this snippet:

type CustomMarshaler int

func (c CustomMarshaler) MarshalJSON() ([]byte, error) {
	time.Sleep(500 * time.Millisecond) // simulate processing time
	b := make([]byte, int(c))
	b[0] = '"'
	for i := 1; i < len(b)-1; i++ {
		b[i] = 'a'
	}
	b[len(b)-1] = '"'
	return b, nil
}

func main() {
	processRequest := func(size int) {
		json.Marshal(CustomMarshaler(size))
		time.Sleep(1 * time.Millisecond) // simulate idle time
	}

	// Simulate a steady stream of infrequent large requests.
	go func() {
		for {
			processRequest(1 << 28) // 256MiB
		}
	}()

	// Simulate a storm of small requests.
	for i := 0; i < 1000; i++ {
		go func() {
			for {
				processRequest(1 << 10) // 1KiB
			}
		}()
	}

	// Continually run a GC and track the allocated bytes.
	var stats runtime.MemStats
	for i := 0; ; i++ {
		runtime.ReadMemStats(&stats)
		fmt.Printf("Cycle %d: %dB\n", i, stats.Alloc)
		time.Sleep(time.Second)
		runtime.GC()
	}
}

This is a variation of #23199 (comment) of a situation suggested by @bcmills.

Essentially, we have a 1-to-1000 ratio of a routines that either use 1KiB or 256MiB, respectively. The occasional insertion of a 256MiB buffer into the sync.Pool gets continually held by the 1KiB routines. On my machine, after 300 GC cycles, the above program occupies 6GiB of my heap, when I expect it to be 256MiB in the worst-case.

\cc @jnjackins @bradfitz

The text was updated successfully, but these errors were encountered:

gopherbot · 2018-09-18T17:18:14Z

Change https://golang.org/cl/136035 mentions this issue: encoding/json: fix usage of sync.Pool

flimzy · 2019-08-25T10:30:36Z

It's been nearly a year, and no real attention to this issue.

So what are the options for fixing this?

The simplest would be to set an arbitrary size limit for buffers to recycle. Whatever number is selected will hurt performance for someone, I'd wager.

When using a json.Encoder, an additional configuration parameter could be exposed (i.e. SetMaxBufferSize). That complicates sharing buffer pools across encoder instances, though.

Providing a package-level variable is an option, too, but an ugly one that should absolutely be avoided (I just mention it for completeness sake).

Eliminating the need internal buffering in the json encoder could be another option. It wouldn't solve this problem for everyone, but it would provide recourse for those who are affected enough to care. I believe this was the exact issue that I think brought this up (https://go-review.googlesource.com/c/go/+/135595), which makes it a bit ironic that this issue may be holding up that one :)

bcmills · 2019-08-26T01:53:45Z

@flimzy, one approach would be to keep statistics on the output size, and only return a buffer to the pool if it isn't a statistical outlier. The problem there is that the statistical computations need to be cheap enough that they don't swamp out the benefit from using sync.Pool in the first place.

Another (simpler?) approach would be to pull a buffer from the pool, use it, and abandon it instead of returning it to the pool according to some probability distribution computed from the final high-water mark and the actual buffer size. (One simple distribution would be: 1.0 if the buffer is within 2x of the final high-water mark, or 0.0 otherwise.)

dsnet · 2019-08-26T04:11:24Z

@flimzy, are you running into an issue related to this in production? Seeing how it performs badly in production may be helpful data in coming up with a reasonable approach for this.

When I filed this issue, it was just a theoretical issue I saw. It was not a concrete problem I was experiencing.

flimzy · 2019-08-26T07:21:36Z

@dsnet: No, I'm not. I'm interested in resolving this issue because it seems to be a blocker for making headway on some other json-encoder related issues that are affecting me: https://go-review.googlesource.com/c/go/+/135595, and by extension #33714

flimzy · 2019-08-26T07:55:00Z

Thanks for the feeback, @bcmills

I like the simplicity of your second approach. I think we need to be careful not to discard buffers to aggressively, in the case of applications that naturally have a large disparity of JSON payloads. A simple example comes to mind: A REST API that responds with {"ok":true} for PUT requests, but with 10s or 100s of kb responseses for a 'GET'. Perhaps considering 10k (or another arbitrary limit) as the high-water mark, if it was smaller than that would guard against this.

I can still imagine certain applications where this would be sub-optimal, but maybe this is a reasonable starting point, to be tuned later

flimzy · 2019-08-26T09:27:27Z

As I consider this further, I have to wonder: Is this really a DoS attack vector? If it were for json.Unmarshal, I can see it being easily exploited (just send a huge payload to your favorite Go-powered REST API). But as this is for json.Marshal, for this to be exploited, it would mean that several other possible validation opportunities were disregarded.

I don't mean to suggest this doesn't warrant some amount of attention, I just suspect the danger is over-stated.

iand · 2019-08-26T11:47:25Z

Another approach would be to remove the package level encodeStatePool and allow the caller to optionally supply one when creating an encoder. The reasoning here is that the caller has more knowledge about the distribution of payloads and the edge cases that a package-level statistical count would struggle with.

flimzy · 2019-08-26T16:44:06Z

Another approach would be to remove the package level encodeStatePool and allow the caller to optionally supply one when creating an encoder.

This does seem like the only approach to cover all use cases optimally. Is there any precedent for something similar elsewhere in the stdlib to use for guidance?

gopherbot · 2019-09-13T08:14:21Z

Change https://golang.org/cl/195217 mentions this issue: Add a simple, optional 'keep buffer' capability to the Encoder type

flimzy · 2019-09-13T08:22:05Z

I have created https://golang.org/cl/195217 as the simplest change I could think of to address the concerns raised in #27735. My main concern with this implementation is that it would become possible for a third-party library to starve the buffer pool by, for instance, never returning anything to the pool.

The other realistic approach, IMO, is to expose the encodeStatePool object to the public API. This will have a larger footprint on the public API, and will require all users of json.NewEncoder wishing to use the new functionality, to propagate their own pool throughout their code (including to any third-party libraries which do JSON encoding), to avoid using lots of fragmented pools.

Even with the drawbacks, the latter may be the better approach, but I started with the former because it is so simple, it seemed worth the minor effort to put it in front of reviewers.

dsnet · 2020-12-05T10:17:19Z

I've been thinking of ways to guarantee bounds on the worst-case behavior, and this is a prototype idea:

// bufferPool is a pool of buffers.
//
// Example usage:
//
//	p := getBuffer()
//	defer putBuffer(p)
//	p.buf = appendFoo(p.buf, foo)        // may resize buffer to arbitrarily large sizes
//	return append([]byte(nil), p.buf...) // single copy of the buffer contents
//
var bufferPool = sync.Pool{
	New: func() interface{} { return new(pooledBuffer) },
}

type pooledBuffer struct {
	buf     []byte
	strikes int // number of times the buffer was under-utilized
}

func getBuffer() *pooledBuffer {
	return bufferPool.Get().(*pooledBuffer)
}

func putBuffer(p *pooledBuffer) {
	// Recycle large buffers if sufficiently utilized.
	// If a buffer is under-utilized enough times sequentially,
	// then it is discarded, ensuring that a single large buffer
	// won't be kept alive by a continuous stream of small usages.
	switch {
	case cap(p.buf) <= 1<<16: // always recycle buffers smaller than 64KiB
		p.strikes = 0
	case cap(p.buf)/2 <= len(p.buf): // at least 50% utilization
		p.strikes = 0
	case p.strikes < 4:
		p.strikes++
	default:
		return // discard the buffer; too large and too often under-utilized
	}
	p.buf = p.buf[:0]
	bufferPool.Put(p)
}

In the worst-case scenario, we can ensure a bound on the minimal utilization based on the % threshold (e.g., 50% in the example above) and maximum number of sequential recycles below that threshold (e.g., 4x in the example above).

For the constants chosen above, the worst case sequence of utilization would be:

50%, 0%, 0%, 0%, 0%, 50%, 0%, 0%, 0%, 0%, 50%, 0%, 0%, 0%, 0%, ...

On average, that's a worst case utilization of 10%, which is far better than the theoretical worst-case of 0% if the code naively recycles buffers without any limits.

Advantages of the algorithm:

It's simple; easy to analyze and reason about.
It's fast; only requires basic integer arithmetic.
It doesn't depend on any global state where statistics are concurrently gathered.
For usages of all the same approximate size (regardless of how big or small), this ensures buffers are always recycled.

Happy to hear suggestions for improvements.

iand · 2020-12-05T11:59:59Z

@dsnet I like this. It's elegant and low cost. What is the reasoning behind dropping <64k buffers?

josharian · 2020-12-05T16:33:08Z

The idea of keeping statistics locally is a nice one. We do lose some amount of information every time we throw away an over-sized buffer. An alternative, which might be helpful if we kept more sophisticated statistics, would be to nil out buf and return the pooledBuffer object to the pool. Then on Get we could pre-allocate a new buf with a better initial size.

Even with these simple stats, it might even be worth zeroing and returning pooledBuffer to avoid an allocation. Maybe even include a small byte array struct field in pooledBuffer that we could initialize buf to use, providing an allocation-free path when a small buf is needed.

dsnet · 2020-12-05T22:35:40Z

@dsnet I like this. It's elegant and low cost. What is the reasoning behind dropping <64k buffers?

It's not dropping 64k buffers, but always choosing to recycle them in the pool. That specific code path isn't strictly necessary, but for buffers below a certain size it probably doesn't hurt to always recycle them regardless of the utilization.

An alternative, which might be helpful if we kept more sophisticated statistics, would be to nil out buf and return the pooledBuffer object to the pool. Then on Get we could pre-allocate a new buf with a better initial size.

Interesting idea. I think a useful statistic to obtain would perhaps be the maximum length encountered over the last N usages. If we evict a buffer due to low utilization, we can allocate a new one based on a recently seen maximum length.

lavalamp · 2023-02-13T18:18:37Z

@dsnet It's been awhile! Friendly ping, any change we can get this change on track for go 1.21? (https://go.dev/cl/455776)

dsnet · 2023-02-13T20:57:33Z

Thank you very much for implementing this and exploring it as a solution.

I was in the process of reviewing this a few weeks ago and was saddened by the amount of logic being added (300LOC for the logic, and another 300LOC for the tests). I believe that a segmented buffer is probably the right approach in the long term, but much of the complexity stems from the existing logic in json assuming that it's outputting to a bytes.Buffer. Thus, your segmented buffer is forced to implement non-trivial methods like Truncate, WriteByte, WriteString, and WriteTo. Had the rest of json been changed to only rely on append-to mechanisms (e.g., #53685), then the segmented buffer implementation can be reduced to <200LOC (for both logic and tests). This is due to no fault of your implementation, but rather the rest of the package that it's trying to interface with.

I see several possible paths forward:

Cleanup the rest of json to stop depending on so many methods of bytes.Buffer and instead rely mostly on append-to operations. However, @mvdan, @johanbrandhorst, and I have been discussing completely replacing the underlying syntactical encoder/decoder with a different implementation in the future, so this may not be worth the effort.
Just implement a simple capacity limit on buffers returned to the pool. Something like https://go-review.googlesource.com/c/go/+/136035, where we just accept the loss of performance for larger outputs.
Still use tiered sync.Pools, but maintain a single contiguous buffer (instead of segmented ones). When we grow, we copy the contents to a larger buffer twice the size. Such a buffer is much simpler to implement the methods of Truncate, WriteByte, or WriteString. Also, since the unused capacity is never larger than the length, we could consider just returning the results of Bytes without cloning it. This would avoid the final clone that we do before we return the buffer to the user in Marshal.

lavalamp · 2023-02-13T23:31:04Z

Seems much riskier (more logic changed, more chances for bugs) -- json output is much more complex than tracking how many bytes are in a buffer.
capping the size doesn't solve the problem in general, it helps a range of sizes but removes the benefits of the cache from other sizes.
In running benchmarks to select the buffer sizes used in my change, it became obvious that allocating really large buffers is sometimes really slow. This would be substantially worse performance variance than what I have implemented. (Ironically this was my original suggestion and the current approach was your counter-suggestion :) )

I found that the existing json tests exercised ~every code path already and the specific tests I added are just for ease in debugging and peace of mind.

The code could be made shorter by using generics to cover both Write and WriteString with one function; it is also probably overly optimized and could be made simpler and only slightly slower. OTOH the tests could be extended with a randomized test that proves same behavior as the stock bytes.Buffer.

For my use case (kube-apiserver) I really need to keep all the speed from caching and also to not be 5xing memory usage for certain request patterns, as the current code does. Is there anything I can do to this change to make it acceptable? My other options aren't great.

dsnet · 2023-02-20T00:33:47Z

I discussed it with @mvdan, and we think option 1 is the best path forward. I don't think it's that hard to switch the encoder to use append-like APIs. I'll send out a CL for that.

Thus, the segmented buffer will only need to implement the following methods:

Grow, AvailableBuffer, Write, Reset, Bytes

You'll notice that AvailableBuffer is not currently present on bytes.Buffer, but may be added in #53685.

You could optionally implement the following wrappers:

func (b *buffer) WriteString(s string) (int, error) {
    return b.Write(append(b.AvailableBuffer(), s...))
}
func (b *buffer) WriteByte(c byte) error {
    _, err := b.Write(append(b.AvailableBuffer(), c))
    return err
}

or they can instead be methods on encodeState.

gopherbot · 2023-02-20T01:26:23Z

Change https://go.dev/cl/469555 mentions this issue: encoding/json: use append for Compact and Indent

gopherbot · 2023-02-20T02:08:57Z

Change https://go.dev/cl/469556 mentions this issue: encoding/json: unify encodeState.string and encodeState.stringBytes

gopherbot · 2023-02-20T04:08:26Z

Change https://go.dev/cl/469558 mentions this issue: encoding/json: use append-like operations for encoding

lavalamp · 2023-02-21T17:00:47Z

I'm not sold on using AvailableBuffer like that here, because if the thing being written doesn't fit in the buffer there will be an allocation. But there is a fairly easy way to use it that doesn't have that problem, I think.

dsnet · 2023-02-21T17:16:18Z

You're correct that there could be an allocation outside of the segmented buffer, but the number of allocations asymptotically approaches zero so long as the buffer grows exponentially (which it does).

For the cases where the appended bytes could be large, we call Grow beforehand, which guarantees sufficient capacity to prevent an allocation. I did this for the logic prior to calling appendCompact and the appending of base64 encoded data since we know those sizes. We could also consider doing it for strings. All other writes are small (e.g., bools and numbers), where the probability of an allocation is small and low cost.

lavalamp · 2023-02-21T17:28:28Z

I don't think it's ideal but it can be made to work by truncating the existing buffer and allocating a one-off buffer of the requested size or bigger. Likewise when there are only a few bytes left in the current buffer, I would truncate it and make another when AvailableBuffer is called.

I found that the best performance was with many smallish buffers (< 64k IIRC) rather than continuing to double the size. So, the allocations are not exponentially rare when the output is big enough. (but, they can be mostly prevented by wasting some bytes as described.)

lavalamp · 2023-02-21T17:38:52Z

OK, I read the above CLs -- is there more coming or shall I rebase on top of those?

dsnet · 2023-02-21T17:48:56Z

Even if the buffer is not exponentially grown, so long as we explicitly call Grow before large writes, I believe the performance will still be fine. Writing of single-byte delimiters, bools, and numbers are sufficiently small that even with a 4k buffer, the probability of an occasional small allocation is okay.

OK, I read the above CLs -- is there more coming or shall I rebase on top of those?

That's all for what's relevant for your change.

lavalamp · 2023-02-23T20:06:03Z

I've rebased and it's working, but it's much slower than it was so I've got more work to do before I push a new revision.

lavalamp · 2023-02-23T21:50:11Z

I have perf back to baseline, but git codereview mail does not like the fact that I've pulled your changes locally and rebased on top of them (it thinks I'm trying to impersonate you :) ), how do I appease it?

lavalamp · 2023-02-24T18:27:49Z

OK I think I figured out my rebase problem and https://go.dev/cl/455776 is ready for a pass. It is not much smaller because the b.Write(append(b.AvailableBuffer(), s...)) style one-liners are significantly slower than the code I'd gotten to. WriteByte in particular is called so often that all the details matter a lot.

This is part of the effort to reduce direct reliance on bytes.Buffer so that we can use a buffer with better pooling characteristics. Avoid direct use of bytes.Buffer in Compact and Indent and instead modify the logic to rely only on append. This avoids reliance on the bytes.Buffer.Truncate method, which makes switching to a custom buffer implementation easier. Performance: name old time/op new time/op delta EncodeMarshaler 25.5ns ± 8% 25.7ns ± 9% ~ (p=0.724 n=10+10) name old alloc/op new alloc/op delta EncodeMarshaler 4.00B ± 0% 4.00B ± 0% ~ (all equal) name old allocs/op new allocs/op delta EncodeMarshaler 1.00 ± 0% 1.00 ± 0% ~ (all equal) Updates #27735 Change-Id: I8cded03fab7651d43b5a238ee721f3472530868e Reviewed-on: https://go-review.googlesource.com/c/go/+/469555 Run-TryBot: Joseph Tsai <joetsai@digital-static.net> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com> Auto-Submit: Joseph Tsai <joetsai@digital-static.net> Reviewed-by: Bryan Mills <bcmills@google.com>

This is part of the effort to reduce direct reliance on bytes.Buffer so that we can use a buffer with better pooling characteristics. Unify these two methods as a single version that uses generics to reduce duplicated logic. Unfortunately, we lack a generic version of utf8.DecodeRune (see #56948), so we cast []byte to string. The []byte variant is slightly slower for multi-byte unicode since casting results in a stack-allocated copy operation. Fortunately, this code path is used only for TextMarshalers. We can also delete TestStringBytes, which exists to ensure that the two duplicate implementations remain in sync. Performance: name old time/op new time/op delta CodeEncoder 399µs ± 2% 409µs ± 2% +2.59% (p=0.000 n=9+9) CodeEncoderError 450µs ± 1% 451µs ± 2% ~ (p=0.684 n=10+10) CodeMarshal 553µs ± 2% 562µs ± 3% ~ (p=0.075 n=10+10) CodeMarshalError 733µs ± 3% 737µs ± 2% ~ (p=0.400 n=9+10) EncodeMarshaler 24.9ns ±12% 24.1ns ±13% ~ (p=0.190 n=10+10) EncoderEncode 12.3ns ± 3% 14.7ns ±20% ~ (p=0.315 n=8+10) name old speed new speed delta CodeEncoder 4.87GB/s ± 2% 4.74GB/s ± 2% -2.53% (p=0.000 n=9+9) CodeEncoderError 4.31GB/s ± 1% 4.30GB/s ± 2% ~ (p=0.684 n=10+10) CodeMarshal 3.51GB/s ± 2% 3.46GB/s ± 3% ~ (p=0.075 n=10+10) CodeMarshalError 2.65GB/s ± 3% 2.63GB/s ± 2% ~ (p=0.400 n=9+10) name old alloc/op new alloc/op delta CodeEncoder 327B ±347% 447B ±232% +36.93% (p=0.034 n=9+10) CodeEncoderError 142B ± 1% 143B ± 0% ~ (p=1.000 n=8+7) CodeMarshal 1.96MB ± 2% 1.96MB ± 2% ~ (p=0.468 n=10+10) CodeMarshalError 2.04MB ± 3% 2.03MB ± 1% ~ (p=0.971 n=10+10) EncodeMarshaler 4.00B ± 0% 4.00B ± 0% ~ (all equal) EncoderEncode 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta CodeEncoder 0.00 0.00 ~ (all equal) CodeEncoderError 4.00 ± 0% 4.00 ± 0% ~ (all equal) CodeMarshal 1.00 ± 0% 1.00 ± 0% ~ (all equal) CodeMarshalError 6.00 ± 0% 6.00 ± 0% ~ (all equal) EncodeMarshaler 1.00 ± 0% 1.00 ± 0% ~ (all equal) EncoderEncode 0.00 0.00 ~ (all equal) There is a very slight performance degradation for CodeEncoder due to an increase in allocation sizes. However, the number of allocations did not change. This is likely due to remote effects of the growth rate differences between bytes.Buffer and the builtin append function. We shouldn't overly rely on the growth rate of bytes.Buffer anyways since that is subject to possibly change in #51462. As the benchtime increases, the alloc/op goes down indicating that the amortized memory cost is fixed. Updates #27735 Change-Id: Ie35e480e292fe082d7986e0a4d81212c1d4202b3 Reviewed-on: https://go-review.googlesource.com/c/go/+/469556 Run-TryBot: Joseph Tsai <joetsai@digital-static.net> Reviewed-by: Bryan Mills <bcmills@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Auto-Submit: Joseph Tsai <joetsai@digital-static.net>

As part of the effort to rely less on bytes.Buffer, switch most operations to use more natural append-like operations. This makes it easier to swap bytes.Buffer out with a buffer type that only needs to support a minimal subset of operations. As a simplification, we can remove the use of the scratch buffer and use the available capacity of the buffer itself as the scratch. Also, declare an inlineable mayAppendQuote function to conditionally append a double-quote if necessary. Performance: name old time/op new time/op delta CodeEncoder 405µs ± 2% 397µs ± 2% -1.94% (p=0.000 n=20+20) CodeEncoderError 453µs ± 1% 444µs ± 4% -1.83% (p=0.000 n=19+19) CodeMarshal 559µs ± 4% 548µs ± 2% -2.02% (p=0.001 n=19+17) CodeMarshalError 724µs ± 3% 716µs ± 2% -1.13% (p=0.030 n=19+20) EncodeMarshaler 24.9ns ±15% 22.9ns ± 5% ~ (p=0.086 n=20+17) EncoderEncode 14.0ns ±27% 15.0ns ±20% ~ (p=0.365 n=20+20) There is a slight performance gain across the board due to the elimination of the scratch buffer. Appends are done directly into the unused capacity of the underlying buffer, avoiding an additional copy. See #53685 Updates #27735 Change-Id: Icf6d612a7f7a51ecd10097af092762dd1225d49e Reviewed-on: https://go-review.googlesource.com/c/go/+/469558 Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Auto-Submit: Joseph Tsai <joetsai@digital-static.net> Reviewed-by: Bryan Mills <bcmills@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com> Run-TryBot: Joseph Tsai <joetsai@digital-static.net>

gopherbot · 2023-02-25T00:59:45Z

Change https://go.dev/cl/471200 mentions this issue: encoding/json: improve Marshal memory utilization

dsnet · 2023-02-25T01:08:01Z

I'm still a bit saddened by the amount of code being added. I sent out https://go.dev/cl/471200 as a simpler alternative implementation.

Here's the performance comparison:

name \ time/op             v0                v1             v2
LargeOutput                1.85s ± 3%        1.56s ± 3%     1.35s ± 9%
CodeEncoder                394µs ± 1%        408µs ± 2%     379µs ± 1%
CodeEncoderError           437µs ± 1%        450µs ± 1%     423µs ± 1%
AllocationWastage          396µs ± 2%        719µs ± 3%     482µs ± 4%
CodeMarshal                534µs ± 1%        722µs ± 7%     633µs ± 4%
CodeMarshalError           694µs ± 2%        905µs ± 4%     789µs ± 3%
MarshalBytes/32            170ns ± 3%        173ns ± 3%     162ns ± 2%
MarshalBytes/256           387ns ± 2%        456ns ± 1%     378ns ± 2%
MarshalBytes/4096         3.93µs ± 2%       3.98µs ± 1%    3.89µs ± 1%
MarshalBytesError/32       372µs ± 1%        361µs ± 1%     370µs ± 1%
MarshalBytesError/256      372µs ± 1%        363µs ± 1%     371µs ± 1%
MarshalBytesError/4096     379µs ± 1%        369µs ± 3%     377µs ± 2%
EncodeMarshaler           22.8ns ± 2%       25.1ns ±12%    24.7ns ± 5%
EncoderEncode             12.5ns ± 3%       13.2ns ± 1%    15.0ns ±19%

name \ wasted_b/op         v0                v1             v2
AllocationWastage          2.25M ± 0%        0.16M ± 0%     0.16M ± 0%

where:

We see that both v1 and v2 result in low allocation wastage compared to v0.
This is the problem we're trying to solve.

Comparing performance of v2 relative to v1, we get:

name                         old time/op      new time/op   delta
LargeOutput                  1.56s ± 3%       1.35s ± 9%   -13.49%  (p=0.000 n=8+10)
CodeEncoder                  408µs ± 2%       379µs ± 1%    -7.08%  (p=0.000 n=10+9)
CodeEncoderError             450µs ± 1%       423µs ± 1%    -5.92%  (p=0.000 n=8+9)
AllocationWastage            719µs ± 3%       482µs ± 4%   -32.87%  (p=0.000 n=9+10)
CodeMarshal                  722µs ± 7%       633µs ± 4%   -12.37%  (p=0.000 n=9+10)
CodeMarshalError             905µs ± 4%       789µs ± 3%   -12.80%  (p=0.000 n=10+9)
MarshalBytes/32              173ns ± 3%       162ns ± 2%    -6.37%  (p=0.000 n=10+9)
MarshalBytes/256             456ns ± 1%       378ns ± 2%   -17.03%  (p=0.000 n=10+10)
MarshalBytes/4096           3.98µs ± 1%      3.89µs ± 1%    -2.34%  (p=0.000 n=9+10)
MarshalBytesError/32         361µs ± 1%       370µs ± 1%    +2.59%  (p=0.000 n=9+10)
MarshalBytesError/256        363µs ± 1%       371µs ± 1%    +2.22%  (p=0.000 n=10+9)
MarshalBytesError/4096       369µs ± 3%       377µs ± 2%    +1.92%  (p=0.002 n=10+10)
EncodeMarshaler             25.1ns ±12%      24.7ns ± 5%      ~     (p=0.343 n=10+10)
EncoderEncode               13.2ns ± 1%      15.0ns ±19%   +13.39%  (p=0.003 n=8+10)

There's generally a performance improvement over the more complicated https://go.dev/cl/455776 implementation.
The apparent performance degradation of EncoderEncode seems to be largely in the noise.

I feel bad sending out an alternative CL after you (@lavalamp) clearly spent a lot of time on your CL. I'm okay with you taking attribution credit for the CL if you want. After all, my CL would not have come about until you spent the grunt work to prove that segmented buffers were a viable solution.

(The benchmarks for v2 were not run with the optimization to bytes.Join in https://go.dev/cl/456336)

dsnet · 2023-02-27T02:13:44Z

I found that the best performance was with many smallish buffers

I found this comment curious, so I spent some time investigating why this might be the case.

The cause is that as buffers grow larger, they are recycled through the sync.Pool more infrequently with longer time periods between each use. Consequently, there's a higher probability that the larger buffers have been GC'd before it even got a chance to be reused. At the same time, there's benefits to larger buffer sizes as hardware is often optimized for long sequential memory accesses. Thus, there are two goals that are competing against each other. We do want to grow the buffer sizes over time, but also not too quickly that larger buffers are likely to be GC'd too early.

lavalamp · 2023-02-27T21:33:20Z

I put my benchmark results in a comment on the CL, they are directionally the same.

I am happy with any solution as long as the problem gets solved :)

lavalamp · 2023-02-27T21:37:33Z

Thus, there are two goals that are competing against each other.

Yeah, the most striking thing was that variance was super high with large buffers, so it makes sense that there are competing effects.

mitar · 2023-10-21T19:33:52Z

@dsnet Have you seen the approach taken by https://github.com/valyala/bytebufferpool? It also attempts to collect some stats.

bcmills added Performance NeedsFix The path to resolution is known, but the work has not been done. labels Sep 19, 2018

bcmills added this to the Go1.12 milestone Sep 19, 2018

ianlancetaylor modified the milestones: Go1.12, Go1.13 Dec 12, 2018

andybons modified the milestones: Go1.13, Go1.14 Jul 8, 2019

flimzy mentioned this issue Sep 20, 2019

encoding/json: Encoder internally buffers full output #7872

Open

tmthrgd mentioned this issue Sep 23, 2019

pool the transient maps used for Msg Pack, Truncate and Len miekg/dns#1006

Merged

rsc modified the milestones: Go1.14, Backlog Oct 9, 2019

tmthrgd mentioned this issue Dec 21, 2019

codec/proto: reuse of marshal byte buffers grpc/grpc-go#3167

Merged

neild mentioned this issue Jun 2, 2021

x/net/http2: frameScratchBuffer can consume lots of memory for idle conns #38049

Closed

bradfitz mentioned this issue Jun 9, 2021

derp: Pool byte allocs tailscale/tailscale#2090

Closed

dsnet mentioned this issue Aug 13, 2021

I'm interested in this project, so I'd like to share some my knowledge go-json-experiment/json#1

Open

matthewdale mentioned this issue Jan 14, 2023

GODRIVER-2677 Improve memory pooling. mongodb/mongo-go-driver#1157

Merged

eric-millin mentioned this issue Apr 14, 2023

buffer pools can be leaky IBM/fluent-forward-go#58

Closed

matthewdale mentioned this issue Sep 5, 2023

GODRIVER-2914 bsoncodec/bsonrw: eliminate encoding allocations mongodb/mongo-go-driver#1323

Merged

mitar mentioned this issue Oct 21, 2023

Add TriggerLevelWriter rs/zerolog#602

Merged

nhooyr mentioned this issue Nov 6, 2023

docs: Rationalize wsjson buffer pooling and explain tradeoffs nhooyr/websocket#418

Open

dsnet mentioned this issue Mar 21, 2024

ipn/localapi: reuse buffer for encoding JSON in serveWatchIPNBus tailscale/tailscale#11483

Closed

encoding/json: incorrect usage of sync.Pool #27735

encoding/json: incorrect usage of sync.Pool #27735

Comments

dsnet commented Sep 18, 2018

gopherbot commented Sep 18, 2018

flimzy commented Aug 25, 2019 • edited

bcmills commented Aug 26, 2019

dsnet commented Aug 26, 2019

flimzy commented Aug 26, 2019

flimzy commented Aug 26, 2019 • edited

flimzy commented Aug 26, 2019

iand commented Aug 26, 2019

flimzy commented Aug 26, 2019

gopherbot commented Sep 13, 2019

flimzy commented Sep 13, 2019

dsnet commented Dec 5, 2020

iand commented Dec 5, 2020

josharian commented Dec 5, 2020

dsnet commented Dec 5, 2020

lavalamp commented Feb 13, 2023

dsnet commented Feb 13, 2023

lavalamp commented Feb 13, 2023

dsnet commented Feb 20, 2023

gopherbot commented Feb 20, 2023

gopherbot commented Feb 20, 2023

gopherbot commented Feb 20, 2023

lavalamp commented Feb 21, 2023

dsnet commented Feb 21, 2023

lavalamp commented Feb 21, 2023 • edited

lavalamp commented Feb 21, 2023

dsnet commented Feb 21, 2023

lavalamp commented Feb 23, 2023

lavalamp commented Feb 23, 2023

lavalamp commented Feb 24, 2023

gopherbot commented Feb 25, 2023

dsnet commented Feb 25, 2023 • edited

dsnet commented Feb 27, 2023

lavalamp commented Feb 27, 2023

lavalamp commented Feb 27, 2023

mitar commented Oct 21, 2023

flimzy commented Aug 25, 2019 •

edited

flimzy commented Aug 26, 2019 •

edited

lavalamp commented Feb 21, 2023 •

edited

dsnet commented Feb 25, 2023 •

edited