Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/ipv{4,6}: ReadBatch memory usage #26838

Closed
mhr3 opened this issue Aug 7, 2018 · 5 comments
Closed

x/net/ipv{4,6}: ReadBatch memory usage #26838

mhr3 opened this issue Aug 7, 2018 · 5 comments

Comments

@mhr3
Copy link

mhr3 commented Aug 7, 2018

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go version go1.10.1 linux/amd64

Does this issue reproduce with the latest release?

Yes

What did you do?

package reader

import (
	"io"
	"net"
	"syscall"
	"testing"

	"golang.org/x/net/ipv4"
)

func getConn() net.PacketConn {
	if addr, err := net.ResolveUDPAddr("udp", net.JoinHostPort("127.0.0.1", "40006")); err != nil {
		panic(err)
	} else if l, err := net.ListenUDP("udp", addr); err != nil {
		panic(err)
	} else if f, err := l.File(); err != nil {
		panic(err)
	} else if conn, err := net.FilePacketConn(f); err != nil {
		panic(err)
	} else {
		return conn
	}
}

func BenchmarkMem(b *testing.B) {
	pc := ipv4.NewPacketConn(getConn())

	const numMsgs = 4
	msgs := [numMsgs]ipv4.Message{}
	for i := 0; i < 4; i++ {
		msgs[i].Buffers = [][]byte{make([]byte, 4096)}
	}

	process := func(data []byte, addr net.Addr) {}

	for i := 0; i < b.N; i++ {
		n, packetErr := pc.ReadBatch(msgs[:], syscall.MSG_WAITFORONE)

		if packetErr == io.EOF {
			return
		} else if packetErr != nil {
			return
		}

		for j := 0; j < n; j++ {
			msg := msgs[j]
			process(msg.Buffers[0][:msg.N], msg.Addr)
		}
	}
}

What did you expect to see?

I was benchmarking the program above (with -benchmem) and expected 0 allocations per iteration - as using ReadBatch is supposed to help with performance on heavily-utilized sockets.

What did you see instead?

I saw that each extra message in the msgs array adds 2 extra allocations for each iteration of the benchmark, negating the extra performance of ReadBatch() vs traditional Read().

@gopherbot gopherbot added this to the Unreleased milestone Aug 7, 2018
@mikioh
Copy link
Contributor

mikioh commented Aug 8, 2018

I agree that there are areas for performance improvement on {Read,Write}Batch methods. Feel free to send a CL. Well, the attached benchmark is not self-contained; no traffic generator code. You may use a benchmark in golang.org/x/net/internal/socket_go1_9_test.go.

expected 0 allocations per iteration [...] 2 extra allocations for each iteration

I guess that the allocations are for storage of iovec and sockaddr.

@mhr3
Copy link
Author

mhr3 commented Aug 8, 2018

Right, I used external traffic generator to make sure it doesn't skew the benchmark results (iperf -c 127.0.0.1 -p 40006 -u -b 100M -t 30)

I also used allocfreetrace to confirm your guess:

x/net/internal/socket/mmsghdr_unix.go:15 - vs := make([]iovec, len(ms[i].Buffers))
x/net/internal/socket/mmsghdr_unix.go:18 - sa = make([]byte, sizeofSockaddrInet6)
x/net/internal/socket/rawconn_mmsg.go:25 - var operr error
x/net/internal/socket/rawconn_mmsg.go:26 - var n int
x/net/internal/socket/rawconn_mmsg.go:27 - fn := func(s uintptr) bool { ...

As you suspected, it is the iovec and socketaddr which grow linearly with the number of items in the batch, but there's also a few other allocs (though at least those remain constant).

Not really sure how to fix this without changing the signature of ReadBatch()

@mikioh mikioh changed the title x/net: ReadBatch memory usage x/net/ipv{4,6}: ReadBatch memory usage Aug 12, 2018
@mikioh
Copy link
Contributor

mikioh commented Aug 12, 2018

Not really sure how to fix this without changing the signature of ReadBatch()

You may think about a few ways to remove allocations on IO code path, for example, having a free list for temporary stuff. However, it's better to address your issue on the IO methods first; in general, comparing ReadBatch vs. Read doesn't make sense on connectionless datagram-based transport protocols because of the differences in the transport protocol characteristics, for example, any-to-any vs. one-to-one mapping conversation style.

@gopherbot
Copy link

Change https://golang.org/cl/315589 mentions this issue: internal/socket: reuse buffers in recv/sendMsgs

github-actions bot pushed a commit to matzf/scion that referenced this issue May 4, 2021
Avoid allocating scionPacketProcessor; re-organized packet processing
functions. The scionPacketProcessor is now the main entry point for
packet processing and contains all the pre-allocated per-packet state.

Avoid allocation of message buffer for writing with WriteBatch.
We could just pre-allocate a buffer, but as we always write a batch of
size one, just use WriteTo instead.
For writing single packets, this is both more convenient and faster.
Additional benefit: Read/WriteBatch make rather large allocations
internally (see golang/go#26838) which is
avoided by using WriteTo.

The largest remaining allocations when processing SCION packets are:
- ReadBatch allocates temporary storage internally (as mentioned above).
  This is by far the biggest culprit, it accounts for almost 80% (after
  the changes in this patch).
- slayers/path.NewPath called during slayers.SCION.DecodeFromBytes:
  Could be avoided by reusing Path objects, but requires larger changes.
- slayers/path/scion.Raw.GetCurrentHopField and GetCurrentInfoField, called during parsePath:
  Both return pointer types. Should be possible to change this to value
  types instead, but requires larger changes.
- DataPlane.resolveLocalDst: allocates IP and UDPAddr. Can only really
  be fixed with better address types.
- slayers/path.FullMAC. Allocates both input and output. The function
  interface could be changed to allow reuse of the input buffer.
  The output buffer cannot be reused with this CMAC library.
lukedirtwalker pushed a commit to lukedirtwalker/scion that referenced this issue May 6, 2021
Avoid allocating scionPacketProcessor; re-organized packet processing
functions. The scionPacketProcessor is now the main entry point for
packet processing and contains all the pre-allocated per-packet state.

Avoid allocation of message buffer for writing with WriteBatch.
We could just pre-allocate a buffer, but as we always write a batch of
size one, just use WriteTo instead.
For writing single packets, this is both more convenient and faster.
Additional benefit: Read/WriteBatch make rather large allocations
internally (see golang/go#26838) which is
avoided by using WriteTo.

The largest remaining allocations when processing SCION packets are:
- ReadBatch allocates temporary storage internally (as mentioned above).
  This is by far the biggest culprit, it accounts for almost 80% (after
  the changes in this patch).
- slayers/path.NewPath called during slayers.SCION.DecodeFromBytes:
  Could be avoided by reusing Path objects, but requires larger changes.
- slayers/path/scion.Raw.GetCurrentHopField and GetCurrentInfoField, called during parsePath:
  Both return pointer types. Should be possible to change this to value
  types instead, but requires larger changes.
- DataPlane.resolveLocalDst: allocates IP and UDPAddr. Can only really
  be fixed with better address types.
- slayers/path.FullMAC. Allocates both input and output. The function
  interface could be changed to allow reuse of the input buffer.
  The output buffer cannot be reused with this CMAC library.

Depends on (or actually includes) scionproto#4029.

Closes scionproto#4030

GitOrigin-RevId: 72eb8918a54a25f74159ef7398ef3d98d756e501
marcfrei pushed a commit to netsec-ethz/scion that referenced this issue May 26, 2021
Avoid allocating scionPacketProcessor; re-organized packet processing
functions. The scionPacketProcessor is now the main entry point for
packet processing and contains all the pre-allocated per-packet state.

Avoid allocation of message buffer for writing with WriteBatch.
We could just pre-allocate a buffer, but as we always write a batch of
size one, just use WriteTo instead.
For writing single packets, this is both more convenient and faster.
Additional benefit: Read/WriteBatch make rather large allocations
internally (see golang/go#26838) which is
avoided by using WriteTo.

The largest remaining allocations when processing SCION packets are:
- ReadBatch allocates temporary storage internally (as mentioned above).
  This is by far the biggest culprit, it accounts for almost 80% (after
  the changes in this patch).
- slayers/path.NewPath called during slayers.SCION.DecodeFromBytes:
  Could be avoided by reusing Path objects, but requires larger changes.
- slayers/path/scion.Raw.GetCurrentHopField and GetCurrentInfoField, called during parsePath:
  Both return pointer types. Should be possible to change this to value
  types instead, but requires larger changes.
- DataPlane.resolveLocalDst: allocates IP and UDPAddr. Can only really
  be fixed with better address types.
- slayers/path.FullMAC. Allocates both input and output. The function
  interface could be changed to allow reuse of the input buffer.
  The output buffer cannot be reused with this CMAC library.

Depends on (or actually includes) scionproto#4029.

Closes scionproto#4030

GitOrigin-RevId: 72eb8918a54a25f74159ef7398ef3d98d756e501
marcfrei pushed a commit to netsec-ethz/scion that referenced this issue Jun 3, 2021
Avoid allocating scionPacketProcessor; re-organized packet processing
functions. The scionPacketProcessor is now the main entry point for
packet processing and contains all the pre-allocated per-packet state.

Avoid allocation of message buffer for writing with WriteBatch.
We could just pre-allocate a buffer, but as we always write a batch of
size one, just use WriteTo instead.
For writing single packets, this is both more convenient and faster.
Additional benefit: Read/WriteBatch make rather large allocations
internally (see golang/go#26838) which is
avoided by using WriteTo.

The largest remaining allocations when processing SCION packets are:
- ReadBatch allocates temporary storage internally (as mentioned above).
  This is by far the biggest culprit, it accounts for almost 80% (after
  the changes in this patch).
- slayers/path.NewPath called during slayers.SCION.DecodeFromBytes:
  Could be avoided by reusing Path objects, but requires larger changes.
- slayers/path/scion.Raw.GetCurrentHopField and GetCurrentInfoField, called during parsePath:
  Both return pointer types. Should be possible to change this to value
  types instead, but requires larger changes.
- DataPlane.resolveLocalDst: allocates IP and UDPAddr. Can only really
  be fixed with better address types.
- slayers/path.FullMAC. Allocates both input and output. The function
  interface could be changed to allow reuse of the input buffer.
  The output buffer cannot be reused with this CMAC library.

Depends on (or actually includes) scionproto#4029.

Closes scionproto#4030

GitOrigin-RevId: 72eb8918a54a25f74159ef7398ef3d98d756e501
matzf added a commit to matzf/net that referenced this issue Jan 26, 2022
The closure for the callback to RawConn.Read/Write is responsible for a
multiple allocations per call to RecvMmsg and SendMmsg.
The batched read and write are used primarily to avoid per-call
overhead, so any such overhead negates the advantage of using these
functions.

This change introduces a struct type holding all the variables
captured by the closure passed to RawConn.Read/Write. The struct is
reused to amortize the allocations by means of a sync.Pool.
A suitable global sync.Pool instance already existed, for buffers used
to pack mmsg headers. This pool has been extended to hold both

This change allows to reuse all allocations in WriteBatch. In ReadBatch,
only the returned net.Addr instances are allocated for each message;
this cannot be further improved without fundamental changes to the
package interface.

```
name             old time/op    new time/op    delta
UDP/Batch-1-8      5.34µs ± 1%    5.40µs ± 3%     ~     (p=0.173 n=8+10)
UDP/Batch-2-8      9.74µs ± 1%    9.24µs ± 9%   -5.21%  (p=0.035 n=9+10)
UDP/Batch-4-8      16.2µs ± 4%    16.2µs ± 1%     ~     (p=0.758 n=9+7)
UDP/Batch-8-8      30.0µs ± 4%    30.0µs ± 4%     ~     (p=0.971 n=10+10)
UDP/Batch-16-8     57.3µs ± 3%    60.9µs ±16%   +6.43%  (p=0.031 n=9+9)
UDP/Batch-32-8      115µs ± 5%     119µs ± 6%   +3.15%  (p=0.043 n=10+10)
UDP/Batch-64-8      234µs ±16%     237µs ± 4%     ~     (p=0.173 n=10+8)
UDP/Batch-128-8     447µs ± 4%     470µs ± 7%   +5.22%  (p=0.002 n=10+10)
UDP/Batch-256-8     960µs ±10%     966µs ±19%     ~     (p=0.853 n=10+10)
UDP/Batch-512-8    1.00ms ± 7%    0.99ms ± 7%     ~     (p=0.387 n=9+9)

name             old alloc/op   new alloc/op   delta
UDP/Batch-1-8        232B ± 0%       52B ± 0%  -77.59%  (p=0.000 n=10+10)
UDP/Batch-2-8        280B ± 0%      104B ± 0%  -62.86%  (p=0.000 n=10+10)
UDP/Batch-4-8        384B ± 0%      208B ± 0%  -45.83%  (p=0.000 n=10+10)
UDP/Batch-8-8        592B ± 0%      416B ± 0%  -29.73%  (p=0.000 n=10+10)
UDP/Batch-16-8     1.01kB ± 0%    0.83kB ± 0%  -17.46%  (p=0.000 n=10+10)
UDP/Batch-32-8     1.84kB ± 0%    1.66kB ± 0%   -9.57%  (p=0.002 n=8+10)
UDP/Batch-64-8     3.51kB ± 0%    3.33kB ± 0%   -5.00%  (p=0.000 n=10+8)
UDP/Batch-128-8    6.84kB ± 0%    6.66kB ± 0%   -2.57%  (p=0.001 n=7+7)
UDP/Batch-256-8    13.5kB ± 0%    13.3kB ± 0%   -1.33%  (p=0.000 n=10+10)
UDP/Batch-512-8    14.7kB ± 0%    14.5kB ± 0%   -1.19%  (p=0.000 n=8+8)

name             old allocs/op  new allocs/op  delta
UDP/Batch-1-8        8.00 ± 0%      2.00 ± 0%  -75.00%  (p=0.000 n=10+10)
UDP/Batch-2-8        10.0 ± 0%       4.0 ± 0%  -60.00%  (p=0.000 n=10+10)
UDP/Batch-4-8        14.0 ± 0%       8.0 ± 0%  -42.86%  (p=0.000 n=10+10)
UDP/Batch-8-8        22.0 ± 0%      16.0 ± 0%  -27.27%  (p=0.000 n=10+10)
UDP/Batch-16-8       38.0 ± 0%      32.0 ± 0%  -15.79%  (p=0.000 n=10+10)
UDP/Batch-32-8       70.0 ± 0%      64.0 ± 0%   -8.57%  (p=0.000 n=10+10)
UDP/Batch-64-8        134 ± 0%       128 ± 0%   -4.48%  (p=0.000 n=10+10)
UDP/Batch-128-8       262 ± 0%       256 ± 0%   -2.29%  (p=0.000 n=10+10)
UDP/Batch-256-8       518 ± 0%       512 ± 0%   -1.16%  (p=0.000 n=10+10)
UDP/Batch-512-8       562 ± 0%       556 ± 0%   -1.07%  (p=0.000 n=10+10)
```

Contributes to golang/go#26838
matzf added a commit to matzf/net that referenced this issue Jan 26, 2022
The closure for the callback to RawConn.Read/Write is responsible for a
multiple allocations per call to RecvMmsg and SendMmsg.
The batched read and write are used primarily to avoid per-call
overhead, so any such overhead negates the advantage of using these
functions.

This change introduces a struct type holding all the variables
captured by the closure passed to RawConn.Read/Write. The struct is
reused to amortize the allocations by means of a sync.Pool.
A suitable global sync.Pool instance already existed, for buffers used
to pack mmsg headers. This pool has been extended to hold both

This change allows to reuse all allocations in WriteBatch. In ReadBatch,
only the returned net.Addr instances still need to be allocated for each
message, which cannot be avoided without fundamental changes to the
package interface.

```
name             old time/op    new time/op    delta
UDP/Batch-1-8      5.34µs ± 1%    5.40µs ± 3%     ~     (p=0.173 n=8+10)
UDP/Batch-2-8      9.74µs ± 1%    9.24µs ± 9%   -5.21%  (p=0.035 n=9+10)
UDP/Batch-4-8      16.2µs ± 4%    16.2µs ± 1%     ~     (p=0.758 n=9+7)
UDP/Batch-8-8      30.0µs ± 4%    30.0µs ± 4%     ~     (p=0.971 n=10+10)
UDP/Batch-16-8     57.3µs ± 3%    60.9µs ±16%   +6.43%  (p=0.031 n=9+9)
UDP/Batch-32-8      115µs ± 5%     119µs ± 6%   +3.15%  (p=0.043 n=10+10)
UDP/Batch-64-8      234µs ±16%     237µs ± 4%     ~     (p=0.173 n=10+8)
UDP/Batch-128-8     447µs ± 4%     470µs ± 7%   +5.22%  (p=0.002 n=10+10)
UDP/Batch-256-8     960µs ±10%     966µs ±19%     ~     (p=0.853 n=10+10)
UDP/Batch-512-8    1.00ms ± 7%    0.99ms ± 7%     ~     (p=0.387 n=9+9)

name             old alloc/op   new alloc/op   delta
UDP/Batch-1-8        232B ± 0%       52B ± 0%  -77.59%  (p=0.000 n=10+10)
UDP/Batch-2-8        280B ± 0%      104B ± 0%  -62.86%  (p=0.000 n=10+10)
UDP/Batch-4-8        384B ± 0%      208B ± 0%  -45.83%  (p=0.000 n=10+10)
UDP/Batch-8-8        592B ± 0%      416B ± 0%  -29.73%  (p=0.000 n=10+10)
UDP/Batch-16-8     1.01kB ± 0%    0.83kB ± 0%  -17.46%  (p=0.000 n=10+10)
UDP/Batch-32-8     1.84kB ± 0%    1.66kB ± 0%   -9.57%  (p=0.002 n=8+10)
UDP/Batch-64-8     3.51kB ± 0%    3.33kB ± 0%   -5.00%  (p=0.000 n=10+8)
UDP/Batch-128-8    6.84kB ± 0%    6.66kB ± 0%   -2.57%  (p=0.001 n=7+7)
UDP/Batch-256-8    13.5kB ± 0%    13.3kB ± 0%   -1.33%  (p=0.000 n=10+10)
UDP/Batch-512-8    14.7kB ± 0%    14.5kB ± 0%   -1.19%  (p=0.000 n=8+8)

name             old allocs/op  new allocs/op  delta
UDP/Batch-1-8        8.00 ± 0%      2.00 ± 0%  -75.00%  (p=0.000 n=10+10)
UDP/Batch-2-8        10.0 ± 0%       4.0 ± 0%  -60.00%  (p=0.000 n=10+10)
UDP/Batch-4-8        14.0 ± 0%       8.0 ± 0%  -42.86%  (p=0.000 n=10+10)
UDP/Batch-8-8        22.0 ± 0%      16.0 ± 0%  -27.27%  (p=0.000 n=10+10)
UDP/Batch-16-8       38.0 ± 0%      32.0 ± 0%  -15.79%  (p=0.000 n=10+10)
UDP/Batch-32-8       70.0 ± 0%      64.0 ± 0%   -8.57%  (p=0.000 n=10+10)
UDP/Batch-64-8        134 ± 0%       128 ± 0%   -4.48%  (p=0.000 n=10+10)
UDP/Batch-128-8       262 ± 0%       256 ± 0%   -2.29%  (p=0.000 n=10+10)
UDP/Batch-256-8       518 ± 0%       512 ± 0%   -1.16%  (p=0.000 n=10+10)
UDP/Batch-512-8       562 ± 0%       556 ± 0%   -1.07%  (p=0.000 n=10+10)
```

Contributes to golang/go#26838
matzf added a commit to matzf/net that referenced this issue Jan 26, 2022
The closure for the callback to RawConn.Read/Write is responsible for
multiple allocations per call to RecvMmsg and SendMmsg.
The batched read and write are used primarily to avoid per-call
overhead, so any such overhead negates the advantage of using these
functions.

This change introduces a struct type holding all the variables
captured by the closure passed to RawConn.Read/Write. The struct is
reused to amortize the allocations by means of a sync.Pool.
A suitable global sync.Pool instance already existed, for buffers used
to pack mmsg headers. This pool has been extended to hold both

This change allows to reuse all allocations in WriteBatch. In ReadBatch,
only the returned net.Addr instances still need to be allocated for each
message, which cannot be avoided without fundamental changes to the
package interface.

```
name             old time/op    new time/op    delta
UDP/Batch-1-8      5.34µs ± 1%    5.40µs ± 3%     ~     (p=0.173 n=8+10)
UDP/Batch-2-8      9.74µs ± 1%    9.24µs ± 9%   -5.21%  (p=0.035 n=9+10)
UDP/Batch-4-8      16.2µs ± 4%    16.2µs ± 1%     ~     (p=0.758 n=9+7)
UDP/Batch-8-8      30.0µs ± 4%    30.0µs ± 4%     ~     (p=0.971 n=10+10)
UDP/Batch-16-8     57.3µs ± 3%    60.9µs ±16%   +6.43%  (p=0.031 n=9+9)
UDP/Batch-32-8      115µs ± 5%     119µs ± 6%   +3.15%  (p=0.043 n=10+10)
UDP/Batch-64-8      234µs ±16%     237µs ± 4%     ~     (p=0.173 n=10+8)
UDP/Batch-128-8     447µs ± 4%     470µs ± 7%   +5.22%  (p=0.002 n=10+10)
UDP/Batch-256-8     960µs ±10%     966µs ±19%     ~     (p=0.853 n=10+10)
UDP/Batch-512-8    1.00ms ± 7%    0.99ms ± 7%     ~     (p=0.387 n=9+9)

name             old alloc/op   new alloc/op   delta
UDP/Batch-1-8        232B ± 0%       52B ± 0%  -77.59%  (p=0.000 n=10+10)
UDP/Batch-2-8        280B ± 0%      104B ± 0%  -62.86%  (p=0.000 n=10+10)
UDP/Batch-4-8        384B ± 0%      208B ± 0%  -45.83%  (p=0.000 n=10+10)
UDP/Batch-8-8        592B ± 0%      416B ± 0%  -29.73%  (p=0.000 n=10+10)
UDP/Batch-16-8     1.01kB ± 0%    0.83kB ± 0%  -17.46%  (p=0.000 n=10+10)
UDP/Batch-32-8     1.84kB ± 0%    1.66kB ± 0%   -9.57%  (p=0.002 n=8+10)
UDP/Batch-64-8     3.51kB ± 0%    3.33kB ± 0%   -5.00%  (p=0.000 n=10+8)
UDP/Batch-128-8    6.84kB ± 0%    6.66kB ± 0%   -2.57%  (p=0.001 n=7+7)
UDP/Batch-256-8    13.5kB ± 0%    13.3kB ± 0%   -1.33%  (p=0.000 n=10+10)
UDP/Batch-512-8    14.7kB ± 0%    14.5kB ± 0%   -1.19%  (p=0.000 n=8+8)

name             old allocs/op  new allocs/op  delta
UDP/Batch-1-8        8.00 ± 0%      2.00 ± 0%  -75.00%  (p=0.000 n=10+10)
UDP/Batch-2-8        10.0 ± 0%       4.0 ± 0%  -60.00%  (p=0.000 n=10+10)
UDP/Batch-4-8        14.0 ± 0%       8.0 ± 0%  -42.86%  (p=0.000 n=10+10)
UDP/Batch-8-8        22.0 ± 0%      16.0 ± 0%  -27.27%  (p=0.000 n=10+10)
UDP/Batch-16-8       38.0 ± 0%      32.0 ± 0%  -15.79%  (p=0.000 n=10+10)
UDP/Batch-32-8       70.0 ± 0%      64.0 ± 0%   -8.57%  (p=0.000 n=10+10)
UDP/Batch-64-8        134 ± 0%       128 ± 0%   -4.48%  (p=0.000 n=10+10)
UDP/Batch-128-8       262 ± 0%       256 ± 0%   -2.29%  (p=0.000 n=10+10)
UDP/Batch-256-8       518 ± 0%       512 ± 0%   -1.16%  (p=0.000 n=10+10)
UDP/Batch-512-8       562 ± 0%       556 ± 0%   -1.07%  (p=0.000 n=10+10)
```

Contributes to golang/go#26838
matzf added a commit to matzf/net that referenced this issue Jan 26, 2022
The closure for the callback to RawConn.Read/Write is responsible for
multiple allocations per call to RecvMmsg and SendMmsg.
The batched read and write are used primarily to avoid per-call
overhead, so any such overhead negates the advantage of using these
functions.

This change introduces a struct type holding all the variables
captured by the closure passed to RawConn.Read/Write. The struct is
reused to amortize the allocations by means of a sync.Pool.
A suitable global sync.Pool instance already existed, for buffers used
to pack mmsg headers.

This change allows to reuse all allocations in WriteBatch. In ReadBatch,
only the returned net.Addr instances still need to be allocated for each
message, which cannot be avoided without fundamental changes to the
package interface.

```
name             old time/op    new time/op    delta
UDP/Batch-1-8      5.34µs ± 1%    5.40µs ± 3%     ~     (p=0.173 n=8+10)
UDP/Batch-2-8      9.74µs ± 1%    9.24µs ± 9%   -5.21%  (p=0.035 n=9+10)
UDP/Batch-4-8      16.2µs ± 4%    16.2µs ± 1%     ~     (p=0.758 n=9+7)
UDP/Batch-8-8      30.0µs ± 4%    30.0µs ± 4%     ~     (p=0.971 n=10+10)
UDP/Batch-16-8     57.3µs ± 3%    60.9µs ±16%   +6.43%  (p=0.031 n=9+9)
UDP/Batch-32-8      115µs ± 5%     119µs ± 6%   +3.15%  (p=0.043 n=10+10)
UDP/Batch-64-8      234µs ±16%     237µs ± 4%     ~     (p=0.173 n=10+8)
UDP/Batch-128-8     447µs ± 4%     470µs ± 7%   +5.22%  (p=0.002 n=10+10)
UDP/Batch-256-8     960µs ±10%     966µs ±19%     ~     (p=0.853 n=10+10)
UDP/Batch-512-8    1.00ms ± 7%    0.99ms ± 7%     ~     (p=0.387 n=9+9)

name             old alloc/op   new alloc/op   delta
UDP/Batch-1-8        232B ± 0%       52B ± 0%  -77.59%  (p=0.000 n=10+10)
UDP/Batch-2-8        280B ± 0%      104B ± 0%  -62.86%  (p=0.000 n=10+10)
UDP/Batch-4-8        384B ± 0%      208B ± 0%  -45.83%  (p=0.000 n=10+10)
UDP/Batch-8-8        592B ± 0%      416B ± 0%  -29.73%  (p=0.000 n=10+10)
UDP/Batch-16-8     1.01kB ± 0%    0.83kB ± 0%  -17.46%  (p=0.000 n=10+10)
UDP/Batch-32-8     1.84kB ± 0%    1.66kB ± 0%   -9.57%  (p=0.002 n=8+10)
UDP/Batch-64-8     3.51kB ± 0%    3.33kB ± 0%   -5.00%  (p=0.000 n=10+8)
UDP/Batch-128-8    6.84kB ± 0%    6.66kB ± 0%   -2.57%  (p=0.001 n=7+7)
UDP/Batch-256-8    13.5kB ± 0%    13.3kB ± 0%   -1.33%  (p=0.000 n=10+10)
UDP/Batch-512-8    14.7kB ± 0%    14.5kB ± 0%   -1.19%  (p=0.000 n=8+8)

name             old allocs/op  new allocs/op  delta
UDP/Batch-1-8        8.00 ± 0%      2.00 ± 0%  -75.00%  (p=0.000 n=10+10)
UDP/Batch-2-8        10.0 ± 0%       4.0 ± 0%  -60.00%  (p=0.000 n=10+10)
UDP/Batch-4-8        14.0 ± 0%       8.0 ± 0%  -42.86%  (p=0.000 n=10+10)
UDP/Batch-8-8        22.0 ± 0%      16.0 ± 0%  -27.27%  (p=0.000 n=10+10)
UDP/Batch-16-8       38.0 ± 0%      32.0 ± 0%  -15.79%  (p=0.000 n=10+10)
UDP/Batch-32-8       70.0 ± 0%      64.0 ± 0%   -8.57%  (p=0.000 n=10+10)
UDP/Batch-64-8        134 ± 0%       128 ± 0%   -4.48%  (p=0.000 n=10+10)
UDP/Batch-128-8       262 ± 0%       256 ± 0%   -2.29%  (p=0.000 n=10+10)
UDP/Batch-256-8       518 ± 0%       512 ± 0%   -1.16%  (p=0.000 n=10+10)
UDP/Batch-512-8       562 ± 0%       556 ± 0%   -1.07%  (p=0.000 n=10+10)
```

Contributes to golang/go#26838
@gopherbot
Copy link

Change https://golang.org/cl/380934 mentions this issue: internal/socket: reuse closure in Recv/SendMmsg

gopherbot pushed a commit to golang/net that referenced this issue Jan 27, 2022
The closure for the callback to RawConn.Read/Write is responsible for
multiple allocations per call to RecvMmsg and SendMmsg.
The batched read and write are used primarily to avoid per-call
overhead, so any such overhead negates the advantage of using these
functions.

This change introduces a struct type holding all the variables
captured by the closure passed to RawConn.Read/Write. The struct is
reused to amortize the allocations by means of a sync.Pool.
A suitable global sync.Pool instance already existed, for buffers used
to pack mmsg headers.

This change allows to reuse all allocations in WriteBatch. In ReadBatch,
only the returned net.Addr instances still need to be allocated for each
message, which cannot be avoided without fundamental changes to the
package interface.

```
name             old time/op    new time/op    delta
UDP/Batch-1-8      5.34µs ± 1%    5.40µs ± 3%     ~     (p=0.173 n=8+10)
UDP/Batch-2-8      9.74µs ± 1%    9.24µs ± 9%   -5.21%  (p=0.035 n=9+10)
UDP/Batch-4-8      16.2µs ± 4%    16.2µs ± 1%     ~     (p=0.758 n=9+7)
UDP/Batch-8-8      30.0µs ± 4%    30.0µs ± 4%     ~     (p=0.971 n=10+10)
UDP/Batch-16-8     57.3µs ± 3%    60.9µs ±16%   +6.43%  (p=0.031 n=9+9)
UDP/Batch-32-8      115µs ± 5%     119µs ± 6%   +3.15%  (p=0.043 n=10+10)
UDP/Batch-64-8      234µs ±16%     237µs ± 4%     ~     (p=0.173 n=10+8)
UDP/Batch-128-8     447µs ± 4%     470µs ± 7%   +5.22%  (p=0.002 n=10+10)
UDP/Batch-256-8     960µs ±10%     966µs ±19%     ~     (p=0.853 n=10+10)
UDP/Batch-512-8    1.00ms ± 7%    0.99ms ± 7%     ~     (p=0.387 n=9+9)

name             old alloc/op   new alloc/op   delta
UDP/Batch-1-8        232B ± 0%       52B ± 0%  -77.59%  (p=0.000 n=10+10)
UDP/Batch-2-8        280B ± 0%      104B ± 0%  -62.86%  (p=0.000 n=10+10)
UDP/Batch-4-8        384B ± 0%      208B ± 0%  -45.83%  (p=0.000 n=10+10)
UDP/Batch-8-8        592B ± 0%      416B ± 0%  -29.73%  (p=0.000 n=10+10)
UDP/Batch-16-8     1.01kB ± 0%    0.83kB ± 0%  -17.46%  (p=0.000 n=10+10)
UDP/Batch-32-8     1.84kB ± 0%    1.66kB ± 0%   -9.57%  (p=0.002 n=8+10)
UDP/Batch-64-8     3.51kB ± 0%    3.33kB ± 0%   -5.00%  (p=0.000 n=10+8)
UDP/Batch-128-8    6.84kB ± 0%    6.66kB ± 0%   -2.57%  (p=0.001 n=7+7)
UDP/Batch-256-8    13.5kB ± 0%    13.3kB ± 0%   -1.33%  (p=0.000 n=10+10)
UDP/Batch-512-8    14.7kB ± 0%    14.5kB ± 0%   -1.19%  (p=0.000 n=8+8)

name             old allocs/op  new allocs/op  delta
UDP/Batch-1-8        8.00 ± 0%      2.00 ± 0%  -75.00%  (p=0.000 n=10+10)
UDP/Batch-2-8        10.0 ± 0%       4.0 ± 0%  -60.00%  (p=0.000 n=10+10)
UDP/Batch-4-8        14.0 ± 0%       8.0 ± 0%  -42.86%  (p=0.000 n=10+10)
UDP/Batch-8-8        22.0 ± 0%      16.0 ± 0%  -27.27%  (p=0.000 n=10+10)
UDP/Batch-16-8       38.0 ± 0%      32.0 ± 0%  -15.79%  (p=0.000 n=10+10)
UDP/Batch-32-8       70.0 ± 0%      64.0 ± 0%   -8.57%  (p=0.000 n=10+10)
UDP/Batch-64-8        134 ± 0%       128 ± 0%   -4.48%  (p=0.000 n=10+10)
UDP/Batch-128-8       262 ± 0%       256 ± 0%   -2.29%  (p=0.000 n=10+10)
UDP/Batch-256-8       518 ± 0%       512 ± 0%   -1.16%  (p=0.000 n=10+10)
UDP/Batch-512-8       562 ± 0%       556 ± 0%   -1.07%  (p=0.000 n=10+10)
```

Contributes to golang/go#26838

Change-Id: I16ecfc38dbb5a4d9b1ceacd1dd99fda38f346807
GitHub-Last-Rev: d1dda93
GitHub-Pull-Request: #126
Reviewed-on: https://go-review.googlesource.com/c/net/+/380934
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Trust: Michael Knyszek <mknyszek@google.com>
WeiminShang added a commit to WeiminShang/net that referenced this issue Nov 16, 2022
The closure for the callback to RawConn.Read/Write is responsible for
multiple allocations per call to RecvMmsg and SendMmsg.
The batched read and write are used primarily to avoid per-call
overhead, so any such overhead negates the advantage of using these
functions.

This change introduces a struct type holding all the variables
captured by the closure passed to RawConn.Read/Write. The struct is
reused to amortize the allocations by means of a sync.Pool.
A suitable global sync.Pool instance already existed, for buffers used
to pack mmsg headers.

This change allows to reuse all allocations in WriteBatch. In ReadBatch,
only the returned net.Addr instances still need to be allocated for each
message, which cannot be avoided without fundamental changes to the
package interface.

```
name             old time/op    new time/op    delta
UDP/Batch-1-8      5.34µs ± 1%    5.40µs ± 3%     ~     (p=0.173 n=8+10)
UDP/Batch-2-8      9.74µs ± 1%    9.24µs ± 9%   -5.21%  (p=0.035 n=9+10)
UDP/Batch-4-8      16.2µs ± 4%    16.2µs ± 1%     ~     (p=0.758 n=9+7)
UDP/Batch-8-8      30.0µs ± 4%    30.0µs ± 4%     ~     (p=0.971 n=10+10)
UDP/Batch-16-8     57.3µs ± 3%    60.9µs ±16%   +6.43%  (p=0.031 n=9+9)
UDP/Batch-32-8      115µs ± 5%     119µs ± 6%   +3.15%  (p=0.043 n=10+10)
UDP/Batch-64-8      234µs ±16%     237µs ± 4%     ~     (p=0.173 n=10+8)
UDP/Batch-128-8     447µs ± 4%     470µs ± 7%   +5.22%  (p=0.002 n=10+10)
UDP/Batch-256-8     960µs ±10%     966µs ±19%     ~     (p=0.853 n=10+10)
UDP/Batch-512-8    1.00ms ± 7%    0.99ms ± 7%     ~     (p=0.387 n=9+9)

name             old alloc/op   new alloc/op   delta
UDP/Batch-1-8        232B ± 0%       52B ± 0%  -77.59%  (p=0.000 n=10+10)
UDP/Batch-2-8        280B ± 0%      104B ± 0%  -62.86%  (p=0.000 n=10+10)
UDP/Batch-4-8        384B ± 0%      208B ± 0%  -45.83%  (p=0.000 n=10+10)
UDP/Batch-8-8        592B ± 0%      416B ± 0%  -29.73%  (p=0.000 n=10+10)
UDP/Batch-16-8     1.01kB ± 0%    0.83kB ± 0%  -17.46%  (p=0.000 n=10+10)
UDP/Batch-32-8     1.84kB ± 0%    1.66kB ± 0%   -9.57%  (p=0.002 n=8+10)
UDP/Batch-64-8     3.51kB ± 0%    3.33kB ± 0%   -5.00%  (p=0.000 n=10+8)
UDP/Batch-128-8    6.84kB ± 0%    6.66kB ± 0%   -2.57%  (p=0.001 n=7+7)
UDP/Batch-256-8    13.5kB ± 0%    13.3kB ± 0%   -1.33%  (p=0.000 n=10+10)
UDP/Batch-512-8    14.7kB ± 0%    14.5kB ± 0%   -1.19%  (p=0.000 n=8+8)

name             old allocs/op  new allocs/op  delta
UDP/Batch-1-8        8.00 ± 0%      2.00 ± 0%  -75.00%  (p=0.000 n=10+10)
UDP/Batch-2-8        10.0 ± 0%       4.0 ± 0%  -60.00%  (p=0.000 n=10+10)
UDP/Batch-4-8        14.0 ± 0%       8.0 ± 0%  -42.86%  (p=0.000 n=10+10)
UDP/Batch-8-8        22.0 ± 0%      16.0 ± 0%  -27.27%  (p=0.000 n=10+10)
UDP/Batch-16-8       38.0 ± 0%      32.0 ± 0%  -15.79%  (p=0.000 n=10+10)
UDP/Batch-32-8       70.0 ± 0%      64.0 ± 0%   -8.57%  (p=0.000 n=10+10)
UDP/Batch-64-8        134 ± 0%       128 ± 0%   -4.48%  (p=0.000 n=10+10)
UDP/Batch-128-8       262 ± 0%       256 ± 0%   -2.29%  (p=0.000 n=10+10)
UDP/Batch-256-8       518 ± 0%       512 ± 0%   -1.16%  (p=0.000 n=10+10)
UDP/Batch-512-8       562 ± 0%       556 ± 0%   -1.07%  (p=0.000 n=10+10)
```

Contributes to golang/go#26838

Change-Id: I16ecfc38dbb5a4d9b1ceacd1dd99fda38f346807
GitHub-Last-Rev: d1dda931f61bd08cab782fa50406574d5e227154
GitHub-Pull-Request: golang/net#126
Reviewed-on: https://go-review.googlesource.com/c/net/+/380934
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Trust: Michael Knyszek <mknyszek@google.com>
@golang golang locked and limited conversation to collaborators Jan 26, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants