Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/binary: cache dataSize result across invocations of Read and Write for slice of structs #66253

Closed
kwakubiney opened this issue Mar 11, 2024 · 2 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@kwakubiney
Copy link

kwakubiney commented Mar 11, 2024

This patch aims to extend this optimization c9d89f6 to slice of structs.
I have written some benchmarks in the encoding/binary package to measure the impact of the change when encoding a slice of structs.

Profiling those benchmarks show that there is an absurd number of allocations when writing a slice of structs caused by reflection. Allocations at reflect.(*structType).Field account for ~75% of the total allocs made as shown below.

Allocations without patch

root@ubuntu-s-2vcpu-2gb-fra1-01:~/go/src/encoding/binary# ../../../bin/go test -run='^$' -memprofile memprofile.out -benchmem -bench BenchmarkWriteSlice1000Structs -count=20
root@ubuntu-s-2vcpu-2gb-fra1-01:~/go/src/encoding/binary# ../../../bin/go tool pprof --alloc_objects memprofile.out
warning: GOPATH set to GOROOT (/root/go) has no effect
File: binary.test
Type: alloc_objects
Time: Mar 11, 2024 at 8:09pm (UTC)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top 10 -cum
Showing nodes accounting for 130688, 100% of 130735 total
Dropped 4 nodes (cum <= 653)
      flat  flat%   sum%        cum   cum%
        94 0.072% 0.072%     130735   100%  encoding/binary.BenchmarkWriteSlice1000Structs
         0     0% 0.072%     130735   100%  testing.(*B).runN
         0     0% 0.072%     130659 99.94%  testing.(*B).launch
     32289 24.70% 24.77%     130641 99.93%  encoding/binary.Write
         0     0% 24.77%      98305 75.19%  encoding/binary.dataSize
         0     0% 24.77%      98305 75.19%  encoding/binary.sizeof
         0     0% 24.77%      98305 75.19%  reflect.(*rtype).Field
     98305 75.19%   100%      98305 75.19%  reflect.(*structType).Field

This patch essentially also caches the result of binary.dataSize in the same sync.Map being used in the struct case. This leads to huge allocation savings in writing a slice of structs.

Allocations with patch

root@ubuntu-s-2vcpu-2gb-fra1-01:~/go/src/encoding/binary# ../../../bin/go test -run='^$' -memprofile memprofile.out -benchmem -bench BenchmarkWriteSlice1000Structs -count=20
root@ubuntu-s-2vcpu-2gb-fra1-01:~/go/src/encoding/binary# ../../../bin/go tool pprof --alloc_objects memprofile.out
File: binary.test
Type: alloc_objects
Time: Mar 11, 2024 at 6:31pm (UTC)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top 10 -cum
Showing nodes accounting for 32228, 99.81% of 32289 total
Dropped 4 nodes (cum <= 161)
      flat  flat%   sum%        cum   cum%
        56  0.17%  0.17%      32289   100%  encoding/binary.BenchmarkWriteSlice1000Structs
         0     0%  0.17%      32289   100%  testing.(*B).runN
     32172 99.64% 99.81%      32233 99.83%  encoding/binary.Write
         0     0% 99.81%      32224 99.80%  testing.(*B).launch
(pprof) 

Running benchstat on both implementations shows these savings for writes and reads.

root@ubuntu-s-2vcpu-2gb-fra1-01:~/go/src/encoding/binary# benchstat old.txt new.txt
goos: linux
goarch: amd64
pkg: encoding/binary
cpu: DO-Regular
                        │   old.txt   │            new.txt            │
                        │   sec/op    │   sec/op     vs base          │
WriteSlice1000Structs-2   846.7µ ± 4%   856.4µ ± 3%  ~ (p=0.602 n=20)

                        │   old.txt    │            new.txt             │
                        │     B/s      │     B/s       vs base          │
WriteSlice1000Structs-2   84.48Mi ± 4%   83.52Mi ± 3%  ~ (p=0.602 n=20)

                        │   old.txt    │               new.txt               │
                        │     B/op     │     B/op      vs base               │
WriteSlice1000Structs-2   80.18Ki ± 0%   80.06Ki ± 0%  -0.15% (p=0.000 n=20)

                        │   old.txt   │              new.txt               │
                        │  allocs/op  │ allocs/op   vs base                │
WriteSlice1000Structs-2   16.000 ± 0%   1.000 ± 0%  -93.75% (p=0.000 n=20)
root@ubuntu-s-2vcpu-2gb-fra1-01:~/go/src/encoding/binary# benchstat old.txt new.txt
goos: linux
goarch: amd64
pkg: encoding/binary
cpu: DO-Regular
                       │   old.txt   │              new.txt               │
                       │   sec/op    │   sec/op     vs base               │
ReadSlice1000Structs-2   847.4µ ± 4%   821.1µ ± 3%  -3.10% (p=0.012 n=20)

                       │   old.txt    │               new.txt               │
                       │     B/s      │     B/s       vs base               │
ReadSlice1000Structs-2   84.40Mi ± 4%   87.11Mi ± 3%  +3.20% (p=0.012 n=20)

                       │   old.txt    │               new.txt               │
                       │     B/op     │     B/op      vs base               │
ReadSlice1000Structs-2   80.12Ki ± 0%   80.00Ki ± 0%  -0.15% (p=0.000 n=20)

                       │   old.txt   │              new.txt               │
                       │  allocs/op  │ allocs/op   vs base                │
ReadSlice1000Structs-2   16.000 ± 0%   1.000 ± 0%  -93.75% (p=0.000 n=20)
@gopherbot
Copy link

Change https://go.dev/cl/570855 mentions this issue: encoding/binary: cache struct sizes to speed up Read and Write for slice of structs.

@ianlancetaylor
Copy link
Contributor

There is no API change here so taking this out of the proposal process.

@ianlancetaylor ianlancetaylor changed the title encoding/binary: proposal: cache dataSize result across invocations of Read and Write for slice of structs encoding/binary: cache dataSize result across invocations of Read and Write for slice of structs Mar 11, 2024
@ianlancetaylor ianlancetaylor added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. and removed Proposal labels Mar 11, 2024
@ianlancetaylor ianlancetaylor added this to the Backlog milestone Mar 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

3 participants