Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/textproto: the DotReader/DotWriter is extremely slow #56047

Open
rixtox opened this issue Oct 4, 2022 · 4 comments
Open

net/textproto: the DotReader/DotWriter is extremely slow #56047

rixtox opened this issue Oct 4, 2022 · 4 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@rixtox
Copy link

rixtox commented Oct 4, 2022

I didn't expect the bottle neck of my program to came from the standard lib, but this seems to be an exception.

The DotReader/DotWriter from the net/textproto library uses bufio.ReadByte/WriteByte on each input/output byte, which is over 300 times slower than bufio.Read/Write functions for the same amount of data. This is causing significant impact on data throughput for applications using these interfaces.

package my_test

import (
	"bufio"
	"io"
	"net/textproto"
	"testing"
)

type FillReader byte

func (r FillReader) Read(b []byte) (n int, err error) {
	n = len(b)
	if n > 0 {
		b[0] = byte(r)
		for i := 1; i < n; i *= 2 {
			copy(b[i:], b[:i])
		}
	}
	return n, nil
}

func BenchmarkIONull(b *testing.B) {
	size := int64(1024 * 1024 * 4)
	b.SetBytes(size)
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		io.Copy(io.Discard, io.LimitReader(FillReader('.'), size))
	}
}

func BenchmarkIOBufioWrite(b *testing.B) {
	size := int64(1024 * 1024 * 4)
	b.SetBytes(size)
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		r := io.LimitReader(FillReader('.'), size)
		io.Copy(io.Discard, r)
	}
}

func BenchmarkIOBufioWriteByte(b *testing.B) {
	size := int64(1024 * 1024 * 4)
	b.SetBytes(size)
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		w := bufio.NewWriter(io.Discard)
		for j := int64(0); j < size; j++ {
			w.WriteByte(0)
		}
	}
}

func BenchmarkIODotWriter(b *testing.B) {
	size := int64(1024 * 1024 * 4)
	b.SetBytes(size)
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		r := io.LimitReader(FillReader('.'), size)
		w := textproto.NewWriter(bufio.NewWriter(io.Discard)).DotWriter()
		io.Copy(w, r)
	}
}

func BenchmarkIOBufioRead(b *testing.B) {
	size := int64(1024 * 1024 * 4)
	b.SetBytes(size)
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		r := bufio.NewReader(io.LimitReader(FillReader('.'), size))
		io.Copy(io.Discard, r)
	}
}

func BenchmarkIOBufioReadByte(b *testing.B) {
	size := int64(1024 * 1024 * 4)
	b.SetBytes(size)
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		r := bufio.NewReader(io.LimitReader(FillReader('.'), size))
		for j := int64(0); j < size; j++ {
			r.ReadByte()
		}
	}
}

func BenchmarkIODotReader(b *testing.B) {
	size := int64(1024 * 1024 * 4)
	b.SetBytes(size)
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		r := textproto.NewReader(bufio.NewReader(io.LimitReader(FillReader('.'), size))).DotReader()
		io.Copy(io.Discard, r)
	}
}
❯ go test -cpuprofile cpu.prof -memprofile mem.prof -benchmem -bench IO
goos: windows
goarch: amd64
pkg: mytest
cpu: Intel(R) Core(TM) i7-8086K CPU @ 4.00GHz
BenchmarkIONull-12                         13660             91626 ns/op        45776.23 MB/s         24 B/op          1 allocs/op
BenchmarkIOBufioWrite-12                   19095             55199 ns/op        75984.94 MB/s         24 B/op          1 allocs/op
BenchmarkIOBufioWriteByte-12                  99          10311620 ns/op         406.76 MB/s        4178 B/op          1 allocs/op
BenchmarkIODotWriter-12                       67          16815748 ns/op         249.43 MB/s       36984 B/op          6 allocs/op
BenchmarkIOBufioRead-12                    20808             58743 ns/op        71401.44 MB/s       4223 B/op          3 allocs/op
BenchmarkIOBufioReadByte-12                   99          10507806 ns/op         399.16 MB/s        4202 B/op          2 allocs/op
BenchmarkIODotReader-12                       60          19456187 ns/op         215.58 MB/s        4305 B/op          5 allocs/op
PASS
ok      mytest  11.788s
@panjf2000 panjf2000 added Performance NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Oct 5, 2022
@panjf2000
Copy link
Member

Could you run go pprof and upload the profiling data? thank~

@rixtox
Copy link
Author

rixtox commented Oct 5, 2022

❯ go test -cpuprofile cpu.prof -memprofile mem.prof -benchmem -bench IO
goos: windows
goarch: amd64
pkg: mytest
cpu: Intel(R) Core(TM) i7-8086K CPU @ 4.00GHz
BenchmarkIONull-12                         13660             91626 ns/op        45776.23 MB/s         24 B/op          1 allocs/op
BenchmarkIOBufioWrite-12                   19095             55199 ns/op        75984.94 MB/s         24 B/op          1 allocs/op
BenchmarkIOBufioWriteByte-12                  99          10311620 ns/op         406.76 MB/s        4178 B/op          1 allocs/op
BenchmarkIODotWriter-12                       67          16815748 ns/op         249.43 MB/s       36984 B/op          6 allocs/op
BenchmarkIOBufioRead-12                    20808             58743 ns/op        71401.44 MB/s       4223 B/op          3 allocs/op
BenchmarkIOBufioReadByte-12                   99          10507806 ns/op         399.16 MB/s        4202 B/op          2 allocs/op
BenchmarkIODotReader-12                       60          19456187 ns/op         215.58 MB/s        4305 B/op          5 allocs/op
PASS
ok      mytest  11.788s

all_perf.zip

❯ go test -cpuprofile dotreader_cpu.prof -memprofile dotreader_mem.prof -benchmem -bench DotReader
goos: windows
goarch: amd64
pkg: mytest
cpu: Intel(R) Core(TM) i7-8086K CPU @ 4.00GHz
BenchmarkIODotReader-12               56          20478373 ns/op         204.82 MB/s        4770 B/op          5 allocs/op
PASS
ok      mytest  1.306s

dotreader_perf.zip

❯ go test -cpuprofile dotwriter_cpu.prof -memprofile dotwriter_mem.prof -benchmem -bench DotWriter
goos: windows
goarch: amd64
pkg: mytest
cpu: Intel(R) Core(TM) i7-8086K CPU @ 4.00GHz
BenchmarkIODotWriter-12               75          17587541 ns/op         238.48 MB/s       37218 B/op          6 allocs/op
PASS
ok      mytest  1.511s

dotwriter_perf.zip

@panjf2000 panjf2000 added this to the Backlog milestone Oct 5, 2022
@rixtox
Copy link
Author

rixtox commented Oct 7, 2022

I optimized these routines here: https://github.com/go-textproto/textproto

BenchmarkDotReader/Legacy-12                  62          21610894 ns/op         194.08 MB/s
BenchmarkDotReader/Optimized-12             1486            821764 ns/op        5104.02 MB/s
BenchmarkDotWriter/Legacy-12                  56          19386652 ns/op         216.35 MB/s
BenchmarkDotWriter/Optimized-12             1694            700181 ns/op        5990.31 MB/s

But I'm not sure if it's suitable to be submitted to the Golang's stdlib. I used a hack to "rewind" arbitrary bytes read from a bufio.Reader and I'm not sure how to do that more safely.

I could however write a safer and more optimized version if we use ReaderFrom and WriterTo interfaces instead.

But anyway, this shows that the implementation in the stdlib has a lot of room for performance improvement.

@panjf2000
Copy link
Member

We encourage developers to send out high-quality CL(PR) for Go, if you find a way to improve the stdlib and it's acceptable, feel free to send a CL, following this tutorial.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
None yet
Development

No branches or pull requests

2 participants