Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bytes: WriteString, WriteByte and Write vary in performance #26264

Closed
mvdan opened this issue Jul 7, 2018 · 11 comments
Closed

bytes: WriteString, WriteByte and Write vary in performance #26264

mvdan opened this issue Jul 7, 2018 · 11 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@mvdan
Copy link
Member

mvdan commented Jul 7, 2018

This is probably a compiler enhancement issue, but since I haven't done much digging, I'm conservatively filing it against the bytes package.

$ cat f_test.go
package p

import (
	"bytes"
	"testing"
)

func BenchmarkBytes(b *testing.B) {
	var buf bytes.Buffer
	for i := 0; i < b.N; i++ {
		if i%1024 == 0 {
			buf.Reset()
		}
		buf.Write([]byte("ab"))
	}
}

func BenchmarkString(b *testing.B) {
	var buf bytes.Buffer
	for i := 0; i < b.N; i++ {
		if i%1024 == 0 {
			buf.Reset()
		}
		buf.WriteString("ab")
	}
}

func BenchmarkByteByte(b *testing.B) {
	var buf bytes.Buffer
	for i := 0; i < b.N; i++ {
		if i%1024 == 0 {
			buf.Reset()
		}
		buf.WriteByte('a')
		buf.WriteByte('b')
	}
}
$ go test -bench=. -benchmem
goos: linux
goarch: amd64
pkg: mvdan.cc/p
BenchmarkBytes-4        100000000               18.0 ns/op             0 B/op          0 allocs/op
BenchmarkString-4       100000000               10.5 ns/op             0 B/op          0 allocs/op
BenchmarkByteByte-4     200000000                8.44 ns/op            0 B/op          0 allocs/op

The first surprise here is that WriteByte('a'); WriteByte('b') is slightly faster than WriteString("ab"). The latter is much nicer to read and write, so it would be nice if it was at least as fast. This was encountered while optimizing the encoding.json encoder: https://go-review.googlesource.com/c/go/+/122460/1/src/encoding/json/encode.go#650

The difference is just a few nanoseconds or ~20%, but in a hot loop like when JSON is encoding struct fields, this can be noticeable.

The second surprise is how WriteString is much faster than Write. I haven't come across this directly, but I'd also assume that both would be comparable in performance.

If I replace bytes.Buffer with strings.Builder the numbers are somewhat similar, but String is faster than ByteByte.

BenchmarkBytes-4        100000000               15.2 ns/op             7 B/op          0 allocs/op
BenchmarkString-4       300000000                4.49 ns/op            7 B/op          0 allocs/op
BenchmarkByteByte-4     200000000                6.87 ns/op            7 B/op          0 allocs/op

/cc @randall77 @josharian @kevinburke

@mvdan mvdan added Performance NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Jul 7, 2018
@FMNSSun
Copy link

FMNSSun commented Jul 11, 2018

One thing I've noticed is that sometimes inlining a variable manually makes a difference.
For your example using buf.Write([]byte("ab")) is slower (by a lot) compared to having a variable s := []byte("ab") and use buf.Write(s) but using buf.Write([]byte{'a','b'}) is the fastest using buf.Write. We never thought about doing multiple single-byte writes or use WriteString and as it turns out buf.Write([]byte{'a','b'}) is comparable to buf.WriteString("ab") although still a bit slower.

func BenchmarkBytes2(b *testing.B) {
	var buf bytes.Buffer
	for i := 0; i < b.N; i++ {
		if i%1024 == 0 {
			buf.Reset()
		}
		buf.Write([]byte{'a','b'})
	}
}

is faster than doing:

func BenchmarkBytes3(b *testing.B) {
	var buf bytes.Buffer
	s := []byte("ab")
	for i := 0; i < b.N; i++ {
		if i%1024 == 0 {
			buf.Reset()
		}
		buf.Write(s)
	}
}

not by a lot but it's still consistently faster by about 3 to 5% however, and of course if you do multiple writes it adds up.

@FMNSSun
Copy link

FMNSSun commented Jul 11, 2018

The Benchmark here may be flawed because []byte("ab") doesn't appear to get constant folded and on each iteration there's a call to runtime.stringtoslicebyte so you're not only measuring the performance of the Write call but also the conversion from "ab" string to []byte.

0x0054 00084 (f_test.go:14)	MOVQ	$2, 16(SP)
0x005d 00093 (f_test.go:14)	PCDATA	$0, $1
0x005d 00093 (f_test.go:14)	CALL	runtime.stringtoslicebyte(SB)
0x0062 00098 (f_test.go:14)	MOVQ	40(SP), AX
0x0067 00103 (f_test.go:14)	MOVQ	32(SP), CX

It also doesn't look like WriteByte gets inlined:

0x003f 00063 (f_test.go:79)	MOVQ	AX, (SP)
0x0043 00067 (f_test.go:79)	MOVB	$97, 8(SP)
0x0048 00072 (f_test.go:79)	PCDATA	$0, $1
0x0048 00072 (f_test.go:79)	CALL	bytes.(*Buffer).WriteByte(SB)
0x004d 00077 (f_test.go:79)	MOVQ	bytes.b·1+40(SP), AX
0x0052 00082 (f_test.go:80)	MOVQ	AX, (SP)
0x0056 00086 (f_test.go:80)	MOVB	$98, 8(SP)
0x005b 00091 (f_test.go:80)	PCDATA	$0, $1
0x005b 00091 (f_test.go:80)	CALL	bytes.(*Buffer).WriteByte(SB)
0x0060 00096 (f_test.go:80)	MOVQ	"".i+32(SP), AX
0x0065 00101 (f_test.go:75)	LEAQ	1(AX), CX

@FMNSSun
Copy link

FMNSSun commented Jul 11, 2018

Generally speaking:
[]byte("..") and string([]byte{...}) are not constant folded. The actual difference between .Write and .WriteString is tiny:

$ go test -bench=.
goos: linux
goarch: amd64
pkg: github.com/FMNSSun/udpdb/d/t
BenchmarkBytes-8      	100000000	        15.6 ns/op
BenchmarkBytes2-8     	200000000	         8.86 ns/op
BenchmarkBytes3-8     	200000000	         8.86 ns/op
BenchmarkBytes4-8     	200000000	         8.35 ns/op
BenchmarkString-8     	200000000	         8.40 ns/op
BenchmarkString2-8    	200000000	         8.37 ns/op
func BenchmarkBytes3(b *testing.B) {
	var buf bytes.Buffer
	for i := 0; i < b.N; i++ {
		if i%1024 == 0 {
			buf.Reset()
		}
		buf.Write([]byte{'a','b','c','d','e','f'})
	}
}

func BenchmarkBytes4(b *testing.B) {
	var buf bytes.Buffer
	s := []byte{'a','b','c','d','e','f'}
	for i := 0; i < b.N; i++ {
		if i%1024 == 0 {
			buf.Reset()
		}
		buf.Write(s)
	}
}

On my machine WriteString with two characters is 2-3% faster but as soon as you increase this to "abcdef"/[]byte{'a','b','c','d','e','f'} it turns around and Write is becoming slightly faster than WriteString. Whether it's faster to use s := ... and Write(s) or directly doing Write([]byte{...}) for some reason also seems to depend on the actual size.

Edit: If you average across a lot of runs using different string/slice sizes the difference is practically "not there" (at least on my machine).

As far as to why two WriteByte is faster than a single Write. Maybe the two len and the one copy within Write/WriteString are just costly enough that two WriteBytes just slightly comes ahead of doing Write/WriteString? It definitely doesn't work anymore with >=3 calls to WriteByte in a row.

BTW: For some reason benchmem always returns 0B and 0 allocs even if I explicitly use make() because I'd expected there to be some allocations because for strings larger than 32 in size stringtoslicebyte imo can't use a tmpBuf and must does allocate one in the call to rawbyteslice which invokes mallocgc so I should see at least something >0 in the benchmark output :(.

@mvdan
Copy link
Member Author

mvdan commented Aug 22, 2018

See #26264 for a case where using Write was noticeably faster than WriteString. That CL wasn't merged though, as the win was too small to warrant the extra quirkiness.

@FMNSSun
Copy link

FMNSSun commented Aug 22, 2018

@mvdan you've linked to this issue? But my observation is still that there are differences between these functions in specific circumstances and it even makes a difference whether you inline it or not but the difference is tiny and the benchmarks are so flaky that sometimes inlining it is better and sometimes not inlining it is better. Might have something to do with the size of the function that uses write or whatever. I haven't yet been able to pinpoint anything conclusive except what I've already mentioned above.

@FMNSSun
Copy link

FMNSSun commented Aug 22, 2018

Flaky because look for example at:

func Foo(i int) int {
	return i * 2
}

func Bar(i int) int {
	return i * 2
}

func BenchmarkFoo(b *testing.B) {
	var buf bytes.Buffer
	for i := 0; i < b.N; i++ {
		if i % 1024 == 0 {
			buf.Reset()
		}
		buf.WriteByte(byte(Foo(3)))
	}
}

func BenchmarkBar(b *testing.B) {
	var buf bytes.Buffer
	for i := 0; i < b.N; i++ {
		if i % 1024 == 0 {
			buf.Reset()
		}
		buf.WriteByte(byte(Bar(3)))
	}
}

I can run this a dozen times on my machine and BenchmarkBar is consistently slower by 2-4% but both the tests and the functions are completely identical but there's nothing inherently slower about Foo/Bar and these Benchmarks. This might be simply some microarchitectural/layout/cache thing.

However, this also depends on if there are other benchmarks or other code in the b_test.go file I'm using. Just adding an unused dummy function is already enough to suddenly make BenchmarkFoo consistently faster by a few percents. A difference of a few percents might be caused by pretty much anything.

@quasilyte
Copy link
Contributor

quasilyte commented Sep 3, 2018

CL125796 does change the situation a little bit:

name        old time/op    new time/op    delta
Bytes-8       26.4ns ± 0%    14.9ns ± 0%  -43.56%  (p=0.002 n=7+8)
String-8      13.9ns ± 0%    14.2ns ± 0%   +1.87%  (p=0.000 n=10+10)
ByteByte-8    13.4ns ± 0%    13.4ns ± 0%     ~     (all equal)

That CL makes []byte("ab") identical to []byte{'a', 'b'}, removing the main reason of Bytes benchmark being slower than the other two. It makes difference for given example negligible.

Not sure why String benchmark became slightly slower. Need to dig into it. Although its generated code haven't changed, might be a side-effect.

@gopherbot
Copy link

Change https://golang.org/cl/125796 mentions this issue: cmd/compile/internal/gc: optimize []byte(stringlit)

@mvdan
Copy link
Member Author

mvdan commented Sep 9, 2018

@FMNSSun apologies that I haven't replied to your comments directly so far. I had read them, I just hadn't sat down to write proper replies.

You're right that my thinking about Write vs WriteString was flawed - and that's fixed by @quasilyte's CL above.

Two WriteByte calls is still faster I think, but likely not enough to warrant keeping this issue open. After all, I haven't done the legwork to investigate the reason why, or if the difference can be removed.

So all in all, happy to close the issue once the CL has been merged.

@ianlancetaylor
Copy link
Contributor

@mvdan Seems like the CL is merged; should this issue be closed? Thanks.

@mvdan
Copy link
Member Author

mvdan commented Nov 28, 2018

@ianlancetaylor thanks for the ping; I indeed think this can be closed. It doesn't look like there's much else to do for now.

@mvdan mvdan closed this as completed Nov 28, 2018
@golang golang locked and limited conversation to collaborators Nov 28, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
None yet
Development

No branches or pull requests

5 participants