Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

io: ReadAll buffer resizing is inefficient #50774

Open
bboreham opened this issue Jan 24, 2022 · 11 comments
Open

io: ReadAll buffer resizing is inefficient #50774

bboreham opened this issue Jan 24, 2022 · 11 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@bboreham
Copy link
Contributor

What version of Go are you using (go version)?

$ go version
go version go1.17.5 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/vagrant/.cache/go-build"
GOENV="/home/vagrant/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/vagrant/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/vagrant"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.17.5"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/home/vagrant/src/github.com/cortexproject/cortex/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build2696308846=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Called ioutil.ReadAll to read everything from an http response.

What did you expect to see?

Reasonable memory and CPU usage.

What did you see instead?

~60% more memory is allocated when using ioutil.ReadAll compared with bytes.Buffer.ReadFrom.

See prometheus/client_golang#976.
#46564 is making the same complaint, although it was solved in a different way.

The underlying reason is that ReadAll() uses append to resize its buffer, which goes up by a factor of 1.25 (slightly more in latest Go, but asymptotically the same) each time.

One might measure these alternatives on various axes:

  • number of bytes allocated and freed
  • max size of heap required to read a given size
  • amount of copying while resizing
  • percentage waste in returned buffer

The current implementation is better on the last one, and much worse on all the others.

@mvdan mvdan added Performance NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Jan 24, 2022
@ianlancetaylor ianlancetaylor changed the title ioutil: ReadAll buffer resizing is inefficient os: ReadAll buffer resizing is inefficient Jan 25, 2022
@ianlancetaylor ianlancetaylor added this to the Backlog milestone Jan 25, 2022
@earthboundkid

This comment has been minimized.

@randall77 randall77 changed the title os: ReadAll buffer resizing is inefficient io: ReadAll buffer resizing is inefficient Jan 25, 2022
@andig
Copy link
Contributor

andig commented Feb 10, 2022

Is the underlying question here if append should resize x2? It feels as if „memory-growing slice-like structures“ should potentially behave similarly throughout Go?

@earthboundkid
Copy link
Contributor

bytes.Buffer.grow(int) uses a different algorithm, where it grows by 2*currentCap + requested n. It looks like this ends up being more efficient in certain cases.

@bboreham
Copy link
Contributor Author

Consistency is a fine thing, but I would expect blogs, books, etc., to advise “never call ReadAll“ if it remains as is.

Stretching my imagination, some cases can avoid memory-growing: for instance when reading the body of an http.Request with ContentLength, the implementation of ReadAll could use a special interface to obtain the expected size. Or, more simply, there could be a ReadAllWithExpectedSize() helper.

@CAFxX
Copy link
Contributor

CAFxX commented May 18, 2022

We could maybe make ReadAll forward to Copy with something like this?

func ReadAll(r Reader) ([]byte, error) {
	sw := new(sliceWriter)
	_, err := Copy(sw, r)
	return *sw, err
}

type sliceWriter []byte

func (sw *sliceWriter) Write(b []byte) (int, error) {
	*sw = append(*sw, b...)
	return len(b), nil
}

This would also add support for the WriterTo logic implemented in Copy (so something like a source bytes.Buffer or bufio.Reader should be able to just write directly to the destination buffer).

Alternatively, if forwarding to Copy is undesirable because of the larger temporary buffer1, we could just add WriterTo forwarding in the current logic:

func ReadAll(r Reader) ([]byte, error) {
    if wt, ok := r.(WriterTo); ok {
        sw := new(sliceWriter)
        _, err := wt.WriteTo(sw)
        return *sw, err
    }
    // ...existing ReadAll logic...

Footnotes

  1. ReadAll does not really make any guarantee in this sense, so maybe this is not really a valid concern.

@bboreham
Copy link
Contributor Author

bboreham commented Aug 5, 2022

@CAFxX can you clarify how this helps with the resizing of the destination buffer?

@CAFxX
Copy link
Contributor

CAFxX commented Aug 6, 2022

@bboreham sure, it boils down to io.Copy being able to optimize the copy by forwarding to io.ReaderFrom on the destination io.Writer, or to io.WriterTo on the source io.Reader.

In this specific case, the source io.Reader implements io.WriterTo, so the copy is optimized - and this completely sidesteps the resizing issue in the "reading the whole response body" case. Incidentally, this is almost the same thing that you did manually by calling bytes.Buffer.ReadFrom.

name         time/op
Original-16  236µs ± 1%
Patched-16   172µs ± 2%

name         alloc/op
Original-16  291kB ± 0%
Patched-16   153kB ± 0%

name         allocs/op
Original-16   78.3 ± 1%
Patched-16    65.0 ± 0%
Benchmark source code
package issue50774

import (
	"bytes"
	"io"
	"net/http"
	"net/http/httptest"
	"testing"
)

func benchmark(b *testing.B, readall func(io.Reader) ([]byte, error)) {
	body := bytes.Repeat([]byte("x"), 1<<16)
	s := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		w.Write(body)
	}))
	defer s.Close()

	var buf []byte
	for i := 0; i < b.N; i++ {
		res, err := http.Get(s.URL)
		if err != nil {
			b.Fatal(err)
		}
		buf, err = readall(res.Body)
		if err != nil {
			b.Fatal(err)
		}
		res.Body.Close()
	}
	_ = buf
}

func BenchmarkOriginal(b *testing.B) {
	benchmark(b, io.ReadAll)
}

func BenchmarkPatched(b *testing.B) {
	benchmark(b, ReadAll)
}

func ReadAll(r io.Reader) ([]byte, error) {
	sw := new(sliceWriter)
	_, err := io.Copy(sw, r)
	return *sw, err
}

type sliceWriter []byte

func (sw *sliceWriter) Write(b []byte) (int, error) {
	*sw = append(*sw, b...)
	return len(b), nil
}

update: just for confirmation, I also tried with bytes.Buffer.ReadFrom, and it turns out my approach is even faster than that:

name               time/op
Original-16        248µs ± 4%
Patched-16         173µs ± 2%
BufferReadFrom-16  215µs ± 3%

name               alloc/op
Original-16        291kB ± 0%
Patched-16         153kB ± 0%
BufferReadFrom-16  266kB ± 0%

name               allocs/op
Original-16         79.0 ± 0%
Patched-16          65.0 ± 0%
BufferReadFrom-16   70.0 ± 0%
func BenchmarkBufferReadFrom(b *testing.B) {
	benchmark(b, func(r io.Reader) ([]byte, error) {
		b := &bytes.Buffer{}
		_, err := b.ReadFrom(r)
		if err != nil {
			return nil, err
		}
		return b.Bytes(), nil
	})
}

@bboreham
Copy link
Contributor Author

bboreham commented Aug 6, 2022

I think your approach is taking advantage of being able to read the entire buffer in one go.
This is not possible when reading from a socket, as in the problem that spawned this issue.

I pasted your ReadAll into that code here

Before:

% go test -run xxx -bench . -benchmem ./api            
goos: darwin
goarch: arm64
pkg: github.com/prometheus/client_golang/api
BenchmarkClient/4KB-8              28846             35189 ns/op           20543 B/op         73 allocs/op
BenchmarkClient/50KB-8             17594             68666 ns/op          135782 B/op         99 allocs/op
BenchmarkClient/1000KB-8            1515            960378 ns/op         2111125 B/op        593 allocs/op
BenchmarkClient/2000KB-8             894           1343809 ns/op         4212548 B/op       1095 allocs/op
PASS
ok      github.com/prometheus/client_golang/api 6.333s

After:

% go test -run xxx -bench . -benchmem ./api
goos: darwin
goarch: arm64
pkg: github.com/prometheus/client_golang/api
BenchmarkClient/4KB-8              30478             36265 ns/op           43158 B/op         72 allocs/op
BenchmarkClient/50KB-8             16678             67617 ns/op          129620 B/op         96 allocs/op
BenchmarkClient/1000KB-8            1338            871225 ns/op         5466073 B/op        604 allocs/op
BenchmarkClient/2000KB-8             724           1783090 ns/op        11197990 B/op       1113 allocs/op
PASS
ok      github.com/prometheus/client_golang/api 6.187s

@CAFxX
Copy link
Contributor

CAFxX commented Aug 6, 2022

@bboreham My benchmark is using httptest.NewServer to simulate reading the http response body from a real socket. You can also confirm this by inspecting the benchmark CPU profile and noticing how it's spending most of its time in read syscalls deep into the net package.

image

On the other hand, your benchmark does not seem to make sense as you are instantiating a bytes.Buffer, but not using it. And you're throwing the actual results of ReadAll away...

@bboreham
Copy link
Contributor Author

bboreham commented Aug 6, 2022

Sorry, going a bit too fast. I corrected the Prometheus client-library version to return the body, and the benchmark results are the same.

Maybe it's just that your test response is quite small at 64K (if I read that right)?

@CAFxX
Copy link
Contributor

CAFxX commented Aug 6, 2022

Yup. And you're right, with singnficantly larger response sizes (1MB or more) the larger buffer growth of bytes.Buffer will make bytes.Buffer.ReadFrom faster. I will try in the next days to see if there's something that can be done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
None yet
Development

No branches or pull requests

6 participants