Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

io: optimize WriteString with a pool of buffer #28311

Open
pierrre opened this issue Oct 22, 2018 · 3 comments
Open

io: optimize WriteString with a pool of buffer #28311

pierrre opened this issue Oct 22, 2018 · 3 comments
Labels
NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. Performance
Milestone

Comments

@pierrre
Copy link

pierrre commented Oct 22, 2018

The current implementation of io.WriteString allocates a []byte if the writer doesn't implement StringWriter.
I think we can avoid this memory allocation by using a pool of buffer.

I wrote a proof of concept:

package writestring

import (
	"bytes"
	"io"
	"sync"
)

func WriteString(w io.Writer, s string) (n int, err error) {
	if sw, ok := w.(stringWriter); ok {
		return sw.WriteString(s)
	}
	if len(s) > maxBufSize {
		return w.Write([]byte(s))
	}
	buf := getBuf()
	_, _ = buf.WriteString(s)
	n, err = w.Write(buf.Bytes())
	putBuf(buf)
	return n, err
}

type stringWriter interface {
	WriteString(s string) (n int, err error)
}

const maxBufSize = 1 << 16

var bufPool = &sync.Pool{
	New: func() interface{} {
		return new(bytes.Buffer)
	},
}

func getBuf() *bytes.Buffer {
	buf := bufPool.Get().(*bytes.Buffer)
	buf.Reset()
	return buf
}

func putBuf(buf *bytes.Buffer) {
	if buf.Cap() <= maxBufSize {
		bufPool.Put(buf)
	}
}

A benchmark:

package writestring

import (
	"io"
	"strconv"
	"strings"
	"testing"
)

var nw io.Writer = new(nopWriter)

func BenchmarkWriteString(b *testing.B) {
	for p := 0; p <= 20; p++ {
		size := 1 << uint(p)
		s := strings.Repeat("a", size)
		b.Run(strconv.FormatInt(int64(size), 10), func(b *testing.B) {
			b.Run("Old", func(b *testing.B) {
				for i := 0; i < b.N; i++ {
					_, _ = io.WriteString(nw, s)
				}
			})
			b.Run("New", func(b *testing.B) {
				for i := 0; i < b.N; i++ {
					_, _ = WriteString(nw, s)
				}
			})
		})
	}
}

type nopWriter struct{}

func (w *nopWriter) Write(b []byte) (n int, err error) {
	return len(b), nil
}

The benchmark result:

➜  writestring go test -bench=. -benchmem
goos: linux
goarch: amd64
pkg: _test/writestring
BenchmarkWriteString/1/Old-12         	50000000	        31.1 ns/op	       8 B/op	       1 allocs/op
BenchmarkWriteString/1/New-12         	50000000	        33.5 ns/op	       0 B/op	       0 allocs/op
BenchmarkWriteString/2/Old-12         	50000000	        31.4 ns/op	       8 B/op	       1 allocs/op
BenchmarkWriteString/2/New-12         	50000000	        34.7 ns/op	       0 B/op	       0 allocs/op
BenchmarkWriteString/4/Old-12         	50000000	        30.1 ns/op	       8 B/op	       1 allocs/op
BenchmarkWriteString/4/New-12         	50000000	        36.0 ns/op	       0 B/op	       0 allocs/op
BenchmarkWriteString/8/Old-12         	50000000	        28.8 ns/op	       8 B/op	       1 allocs/op
BenchmarkWriteString/8/New-12         	50000000	        32.9 ns/op	       0 B/op	       0 allocs/op
BenchmarkWriteString/16/Old-12        	50000000	        49.7 ns/op	      16 B/op	       1 allocs/op
BenchmarkWriteString/16/New-12        	50000000	        33.4 ns/op	       0 B/op	       0 allocs/op
BenchmarkWriteString/32/Old-12        	50000000	        70.6 ns/op	      32 B/op	       1 allocs/op
BenchmarkWriteString/32/New-12        	30000000	        33.5 ns/op	       0 B/op	       0 allocs/op
BenchmarkWriteString/64/Old-12        	30000000	       110 ns/op	      64 B/op	       1 allocs/op
BenchmarkWriteString/64/New-12        	50000000	        34.3 ns/op	       0 B/op	       0 allocs/op
BenchmarkWriteString/128/Old-12       	30000000	       144 ns/op	     128 B/op	       1 allocs/op
BenchmarkWriteString/128/New-12       	50000000	        36.2 ns/op	       0 B/op	       0 allocs/op
BenchmarkWriteString/256/Old-12       	20000000	       198 ns/op	     256 B/op	       1 allocs/op
BenchmarkWriteString/256/New-12       	50000000	        39.3 ns/op	       0 B/op	       0 allocs/op
BenchmarkWriteString/512/Old-12       	10000000	       291 ns/op	     512 B/op	       1 allocs/op
BenchmarkWriteString/512/New-12       	30000000	        42.0 ns/op	       0 B/op	       0 allocs/op
BenchmarkWriteString/1024/Old-12      	 5000000	       490 ns/op	    1024 B/op	       1 allocs/op
BenchmarkWriteString/1024/New-12      	30000000	        45.6 ns/op	       0 B/op	       0 allocs/op
BenchmarkWriteString/2048/Old-12      	 3000000	       500 ns/op	    2048 B/op	       1 allocs/op
BenchmarkWriteString/2048/New-12      	20000000	        52.3 ns/op	       0 B/op	       0 allocs/op
BenchmarkWriteString/4096/Old-12      	 2000000	      1091 ns/op	    4096 B/op	       1 allocs/op
BenchmarkWriteString/4096/New-12      	20000000	        68.3 ns/op	       0 B/op	       0 allocs/op
BenchmarkWriteString/8192/Old-12      	 1000000	      1709 ns/op	    8192 B/op	       1 allocs/op
BenchmarkWriteString/8192/New-12      	20000000	       103 ns/op	       0 B/op	       0 allocs/op
BenchmarkWriteString/16384/Old-12     	 1000000	      3446 ns/op	   16384 B/op	       1 allocs/op
BenchmarkWriteString/16384/New-12     	 5000000	       328 ns/op	       0 B/op	       0 allocs/op
BenchmarkWriteString/32768/Old-12     	  500000	      3281 ns/op	   32768 B/op	       1 allocs/op
BenchmarkWriteString/32768/New-12     	 2000000	       653 ns/op	       0 B/op	       0 allocs/op
BenchmarkWriteString/65536/Old-12     	  300000	      5022 ns/op	   65536 B/op	       1 allocs/op
BenchmarkWriteString/65536/New-12     	 1000000	      1566 ns/op	       0 B/op	       0 allocs/op
BenchmarkWriteString/131072/Old-12    	  100000	     25294 ns/op	  131072 B/op	       1 allocs/op
BenchmarkWriteString/131072/New-12    	   50000	     25332 ns/op	  131072 B/op	       1 allocs/op
BenchmarkWriteString/262144/Old-12    	   30000	     50864 ns/op	  262144 B/op	       1 allocs/op
BenchmarkWriteString/262144/New-12    	   30000	     50697 ns/op	  262144 B/op	       1 allocs/op
BenchmarkWriteString/524288/Old-12    	   10000	    103603 ns/op	  524288 B/op	       1 allocs/op
BenchmarkWriteString/524288/New-12    	   10000	    104370 ns/op	  524288 B/op	       1 allocs/op
BenchmarkWriteString/1048576/Old-12   	   10000	    205753 ns/op	 1048576 B/op	       1 allocs/op
BenchmarkWriteString/1048576/New-12   	   10000	    206086 ns/op	 1048576 B/op	       1 allocs/op
PASS
ok  	_test/writestring	92.576s

I know we can't import bytes in io.
A real implementation would require to copy a private implementation of bytes.Buffer to io.

@agnivade
Copy link
Contributor

Your new implementation degrades performance for small string sizes. And we must be careful not to use sync.Pool everywhere as it is very workload dependent. Maybe this can live as a separate library which has workloads suitable for a sync.Pool ?

/cc @ianlancetaylor @bradfitz for decision.

FYI - you can use benchstat to compare benchmarks.

@agnivade agnivade added the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Oct 22, 2018
@agnivade agnivade added this to the Unplanned milestone Oct 22, 2018
@pierrre
Copy link
Author

pierrre commented Oct 22, 2018

Your new implementation degrades performance for small string sizes.

Yes, slightly.
Maybe I can add a check:

	if len(s) < threshold || len(s) > maxBufSize {
		return w.Write([]byte(s))
	}

According to my benchmark, the threshold could be 16.

@ianlancetaylor
Copy link
Contributor

Seems like this is taking code that the compiler could potentially optimize and turning it into code that the compiler can't optimize.

I also think that this should be postponed until after some decision is made on generics, as that may affect this part of the standard library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. Performance
Projects
None yet
Development

No branches or pull requests

3 participants