x/net/ipv{4,6}: ReadBatch memory usage #26838

mhr3 · 2018-08-07T10:56:09Z

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (`go version`)?

go version go1.10.1 linux/amd64

Does this issue reproduce with the latest release?

Yes

What did you do?

package reader

import (
	"io"
	"net"
	"syscall"
	"testing"

	"golang.org/x/net/ipv4"
)

func getConn() net.PacketConn {
	if addr, err := net.ResolveUDPAddr("udp", net.JoinHostPort("127.0.0.1", "40006")); err != nil {
		panic(err)
	} else if l, err := net.ListenUDP("udp", addr); err != nil {
		panic(err)
	} else if f, err := l.File(); err != nil {
		panic(err)
	} else if conn, err := net.FilePacketConn(f); err != nil {
		panic(err)
	} else {
		return conn
	}
}

func BenchmarkMem(b *testing.B) {
	pc := ipv4.NewPacketConn(getConn())

	const numMsgs = 4
	msgs := [numMsgs]ipv4.Message{}
	for i := 0; i < 4; i++ {
		msgs[i].Buffers = [][]byte{make([]byte, 4096)}
	}

	process := func(data []byte, addr net.Addr) {}

	for i := 0; i < b.N; i++ {
		n, packetErr := pc.ReadBatch(msgs[:], syscall.MSG_WAITFORONE)

		if packetErr == io.EOF {
			return
		} else if packetErr != nil {
			return
		}

		for j := 0; j < n; j++ {
			msg := msgs[j]
			process(msg.Buffers[0][:msg.N], msg.Addr)
		}
	}
}

What did you expect to see?

I was benchmarking the program above (with -benchmem) and expected 0 allocations per iteration - as using ReadBatch is supposed to help with performance on heavily-utilized sockets.

What did you see instead?

I saw that each extra message in the msgs array adds 2 extra allocations for each iteration of the benchmark, negating the extra performance of ReadBatch() vs traditional Read().

The text was updated successfully, but these errors were encountered:

mikioh · 2018-08-08T07:22:34Z

I agree that there are areas for performance improvement on {Read,Write}Batch methods. Feel free to send a CL. Well, the attached benchmark is not self-contained; no traffic generator code. You may use a benchmark in golang.org/x/net/internal/socket_go1_9_test.go.

expected 0 allocations per iteration [...] 2 extra allocations for each iteration

I guess that the allocations are for storage of iovec and sockaddr.

mhr3 · 2018-08-08T09:02:54Z

Right, I used external traffic generator to make sure it doesn't skew the benchmark results (iperf -c 127.0.0.1 -p 40006 -u -b 100M -t 30)

I also used allocfreetrace to confirm your guess:

x/net/internal/socket/mmsghdr_unix.go:15 - vs := make([]iovec, len(ms[i].Buffers))
x/net/internal/socket/mmsghdr_unix.go:18 - sa = make([]byte, sizeofSockaddrInet6)
x/net/internal/socket/rawconn_mmsg.go:25 - var operr error
x/net/internal/socket/rawconn_mmsg.go:26 - var n int
x/net/internal/socket/rawconn_mmsg.go:27 - fn := func(s uintptr) bool { ...

As you suspected, it is the iovec and socketaddr which grow linearly with the number of items in the batch, but there's also a few other allocs (though at least those remain constant).

Not really sure how to fix this without changing the signature of ReadBatch()

mikioh · 2018-08-12T00:29:23Z

Not really sure how to fix this without changing the signature of ReadBatch()

You may think about a few ways to remove allocations on IO code path, for example, having a free list for temporary stuff. However, it's better to address your issue on the IO methods first; in general, comparing ReadBatch vs. Read doesn't make sense on connectionless datagram-based transport protocols because of the differences in the transport protocol characteristics, for example, any-to-any vs. one-to-one mapping conversation style.

gopherbot · 2021-05-01T19:47:09Z

Change https://golang.org/cl/315589 mentions this issue: internal/socket: reuse buffers in recv/sendMsgs

Avoid allocating scionPacketProcessor; re-organized packet processing functions. The scionPacketProcessor is now the main entry point for packet processing and contains all the pre-allocated per-packet state. Avoid allocation of message buffer for writing with WriteBatch. We could just pre-allocate a buffer, but as we always write a batch of size one, just use WriteTo instead. For writing single packets, this is both more convenient and faster. Additional benefit: Read/WriteBatch make rather large allocations internally (see golang/go#26838) which is avoided by using WriteTo. The largest remaining allocations when processing SCION packets are: - ReadBatch allocates temporary storage internally (as mentioned above). This is by far the biggest culprit, it accounts for almost 80% (after the changes in this patch). - slayers/path.NewPath called during slayers.SCION.DecodeFromBytes: Could be avoided by reusing Path objects, but requires larger changes. - slayers/path/scion.Raw.GetCurrentHopField and GetCurrentInfoField, called during parsePath: Both return pointer types. Should be possible to change this to value types instead, but requires larger changes. - DataPlane.resolveLocalDst: allocates IP and UDPAddr. Can only really be fixed with better address types. - slayers/path.FullMAC. Allocates both input and output. The function interface could be changed to allow reuse of the input buffer. The output buffer cannot be reused with this CMAC library.

Avoid allocating scionPacketProcessor; re-organized packet processing functions. The scionPacketProcessor is now the main entry point for packet processing and contains all the pre-allocated per-packet state. Avoid allocation of message buffer for writing with WriteBatch. We could just pre-allocate a buffer, but as we always write a batch of size one, just use WriteTo instead. For writing single packets, this is both more convenient and faster. Additional benefit: Read/WriteBatch make rather large allocations internally (see golang/go#26838) which is avoided by using WriteTo. The largest remaining allocations when processing SCION packets are: - ReadBatch allocates temporary storage internally (as mentioned above). This is by far the biggest culprit, it accounts for almost 80% (after the changes in this patch). - slayers/path.NewPath called during slayers.SCION.DecodeFromBytes: Could be avoided by reusing Path objects, but requires larger changes. - slayers/path/scion.Raw.GetCurrentHopField and GetCurrentInfoField, called during parsePath: Both return pointer types. Should be possible to change this to value types instead, but requires larger changes. - DataPlane.resolveLocalDst: allocates IP and UDPAddr. Can only really be fixed with better address types. - slayers/path.FullMAC. Allocates both input and output. The function interface could be changed to allow reuse of the input buffer. The output buffer cannot be reused with this CMAC library. Depends on (or actually includes) scionproto#4029. Closes scionproto#4030 GitOrigin-RevId: 72eb8918a54a25f74159ef7398ef3d98d756e501

The closure for the callback to RawConn.Read/Write is responsible for a multiple allocations per call to RecvMmsg and SendMmsg. The batched read and write are used primarily to avoid per-call overhead, so any such overhead negates the advantage of using these functions. This change introduces a struct type holding all the variables captured by the closure passed to RawConn.Read/Write. The struct is reused to amortize the allocations by means of a sync.Pool. A suitable global sync.Pool instance already existed, for buffers used to pack mmsg headers. This pool has been extended to hold both This change allows to reuse all allocations in WriteBatch. In ReadBatch, only the returned net.Addr instances are allocated for each message; this cannot be further improved without fundamental changes to the package interface. ``` name old time/op new time/op delta UDP/Batch-1-8 5.34µs ± 1% 5.40µs ± 3% ~ (p=0.173 n=8+10) UDP/Batch-2-8 9.74µs ± 1% 9.24µs ± 9% -5.21% (p=0.035 n=9+10) UDP/Batch-4-8 16.2µs ± 4% 16.2µs ± 1% ~ (p=0.758 n=9+7) UDP/Batch-8-8 30.0µs ± 4% 30.0µs ± 4% ~ (p=0.971 n=10+10) UDP/Batch-16-8 57.3µs ± 3% 60.9µs ±16% +6.43% (p=0.031 n=9+9) UDP/Batch-32-8 115µs ± 5% 119µs ± 6% +3.15% (p=0.043 n=10+10) UDP/Batch-64-8 234µs ±16% 237µs ± 4% ~ (p=0.173 n=10+8) UDP/Batch-128-8 447µs ± 4% 470µs ± 7% +5.22% (p=0.002 n=10+10) UDP/Batch-256-8 960µs ±10% 966µs ±19% ~ (p=0.853 n=10+10) UDP/Batch-512-8 1.00ms ± 7% 0.99ms ± 7% ~ (p=0.387 n=9+9) name old alloc/op new alloc/op delta UDP/Batch-1-8 232B ± 0% 52B ± 0% -77.59% (p=0.000 n=10+10) UDP/Batch-2-8 280B ± 0% 104B ± 0% -62.86% (p=0.000 n=10+10) UDP/Batch-4-8 384B ± 0% 208B ± 0% -45.83% (p=0.000 n=10+10) UDP/Batch-8-8 592B ± 0% 416B ± 0% -29.73% (p=0.000 n=10+10) UDP/Batch-16-8 1.01kB ± 0% 0.83kB ± 0% -17.46% (p=0.000 n=10+10) UDP/Batch-32-8 1.84kB ± 0% 1.66kB ± 0% -9.57% (p=0.002 n=8+10) UDP/Batch-64-8 3.51kB ± 0% 3.33kB ± 0% -5.00% (p=0.000 n=10+8) UDP/Batch-128-8 6.84kB ± 0% 6.66kB ± 0% -2.57% (p=0.001 n=7+7) UDP/Batch-256-8 13.5kB ± 0% 13.3kB ± 0% -1.33% (p=0.000 n=10+10) UDP/Batch-512-8 14.7kB ± 0% 14.5kB ± 0% -1.19% (p=0.000 n=8+8) name old allocs/op new allocs/op delta UDP/Batch-1-8 8.00 ± 0% 2.00 ± 0% -75.00% (p=0.000 n=10+10) UDP/Batch-2-8 10.0 ± 0% 4.0 ± 0% -60.00% (p=0.000 n=10+10) UDP/Batch-4-8 14.0 ± 0% 8.0 ± 0% -42.86% (p=0.000 n=10+10) UDP/Batch-8-8 22.0 ± 0% 16.0 ± 0% -27.27% (p=0.000 n=10+10) UDP/Batch-16-8 38.0 ± 0% 32.0 ± 0% -15.79% (p=0.000 n=10+10) UDP/Batch-32-8 70.0 ± 0% 64.0 ± 0% -8.57% (p=0.000 n=10+10) UDP/Batch-64-8 134 ± 0% 128 ± 0% -4.48% (p=0.000 n=10+10) UDP/Batch-128-8 262 ± 0% 256 ± 0% -2.29% (p=0.000 n=10+10) UDP/Batch-256-8 518 ± 0% 512 ± 0% -1.16% (p=0.000 n=10+10) UDP/Batch-512-8 562 ± 0% 556 ± 0% -1.07% (p=0.000 n=10+10) ``` Contributes to golang/go#26838

The closure for the callback to RawConn.Read/Write is responsible for a multiple allocations per call to RecvMmsg and SendMmsg. The batched read and write are used primarily to avoid per-call overhead, so any such overhead negates the advantage of using these functions. This change introduces a struct type holding all the variables captured by the closure passed to RawConn.Read/Write. The struct is reused to amortize the allocations by means of a sync.Pool. A suitable global sync.Pool instance already existed, for buffers used to pack mmsg headers. This pool has been extended to hold both This change allows to reuse all allocations in WriteBatch. In ReadBatch, only the returned net.Addr instances still need to be allocated for each message, which cannot be avoided without fundamental changes to the package interface. ``` name old time/op new time/op delta UDP/Batch-1-8 5.34µs ± 1% 5.40µs ± 3% ~ (p=0.173 n=8+10) UDP/Batch-2-8 9.74µs ± 1% 9.24µs ± 9% -5.21% (p=0.035 n=9+10) UDP/Batch-4-8 16.2µs ± 4% 16.2µs ± 1% ~ (p=0.758 n=9+7) UDP/Batch-8-8 30.0µs ± 4% 30.0µs ± 4% ~ (p=0.971 n=10+10) UDP/Batch-16-8 57.3µs ± 3% 60.9µs ±16% +6.43% (p=0.031 n=9+9) UDP/Batch-32-8 115µs ± 5% 119µs ± 6% +3.15% (p=0.043 n=10+10) UDP/Batch-64-8 234µs ±16% 237µs ± 4% ~ (p=0.173 n=10+8) UDP/Batch-128-8 447µs ± 4% 470µs ± 7% +5.22% (p=0.002 n=10+10) UDP/Batch-256-8 960µs ±10% 966µs ±19% ~ (p=0.853 n=10+10) UDP/Batch-512-8 1.00ms ± 7% 0.99ms ± 7% ~ (p=0.387 n=9+9) name old alloc/op new alloc/op delta UDP/Batch-1-8 232B ± 0% 52B ± 0% -77.59% (p=0.000 n=10+10) UDP/Batch-2-8 280B ± 0% 104B ± 0% -62.86% (p=0.000 n=10+10) UDP/Batch-4-8 384B ± 0% 208B ± 0% -45.83% (p=0.000 n=10+10) UDP/Batch-8-8 592B ± 0% 416B ± 0% -29.73% (p=0.000 n=10+10) UDP/Batch-16-8 1.01kB ± 0% 0.83kB ± 0% -17.46% (p=0.000 n=10+10) UDP/Batch-32-8 1.84kB ± 0% 1.66kB ± 0% -9.57% (p=0.002 n=8+10) UDP/Batch-64-8 3.51kB ± 0% 3.33kB ± 0% -5.00% (p=0.000 n=10+8) UDP/Batch-128-8 6.84kB ± 0% 6.66kB ± 0% -2.57% (p=0.001 n=7+7) UDP/Batch-256-8 13.5kB ± 0% 13.3kB ± 0% -1.33% (p=0.000 n=10+10) UDP/Batch-512-8 14.7kB ± 0% 14.5kB ± 0% -1.19% (p=0.000 n=8+8) name old allocs/op new allocs/op delta UDP/Batch-1-8 8.00 ± 0% 2.00 ± 0% -75.00% (p=0.000 n=10+10) UDP/Batch-2-8 10.0 ± 0% 4.0 ± 0% -60.00% (p=0.000 n=10+10) UDP/Batch-4-8 14.0 ± 0% 8.0 ± 0% -42.86% (p=0.000 n=10+10) UDP/Batch-8-8 22.0 ± 0% 16.0 ± 0% -27.27% (p=0.000 n=10+10) UDP/Batch-16-8 38.0 ± 0% 32.0 ± 0% -15.79% (p=0.000 n=10+10) UDP/Batch-32-8 70.0 ± 0% 64.0 ± 0% -8.57% (p=0.000 n=10+10) UDP/Batch-64-8 134 ± 0% 128 ± 0% -4.48% (p=0.000 n=10+10) UDP/Batch-128-8 262 ± 0% 256 ± 0% -2.29% (p=0.000 n=10+10) UDP/Batch-256-8 518 ± 0% 512 ± 0% -1.16% (p=0.000 n=10+10) UDP/Batch-512-8 562 ± 0% 556 ± 0% -1.07% (p=0.000 n=10+10) ``` Contributes to golang/go#26838

The closure for the callback to RawConn.Read/Write is responsible for multiple allocations per call to RecvMmsg and SendMmsg. The batched read and write are used primarily to avoid per-call overhead, so any such overhead negates the advantage of using these functions. This change introduces a struct type holding all the variables captured by the closure passed to RawConn.Read/Write. The struct is reused to amortize the allocations by means of a sync.Pool. A suitable global sync.Pool instance already existed, for buffers used to pack mmsg headers. This pool has been extended to hold both This change allows to reuse all allocations in WriteBatch. In ReadBatch, only the returned net.Addr instances still need to be allocated for each message, which cannot be avoided without fundamental changes to the package interface. ``` name old time/op new time/op delta UDP/Batch-1-8 5.34µs ± 1% 5.40µs ± 3% ~ (p=0.173 n=8+10) UDP/Batch-2-8 9.74µs ± 1% 9.24µs ± 9% -5.21% (p=0.035 n=9+10) UDP/Batch-4-8 16.2µs ± 4% 16.2µs ± 1% ~ (p=0.758 n=9+7) UDP/Batch-8-8 30.0µs ± 4% 30.0µs ± 4% ~ (p=0.971 n=10+10) UDP/Batch-16-8 57.3µs ± 3% 60.9µs ±16% +6.43% (p=0.031 n=9+9) UDP/Batch-32-8 115µs ± 5% 119µs ± 6% +3.15% (p=0.043 n=10+10) UDP/Batch-64-8 234µs ±16% 237µs ± 4% ~ (p=0.173 n=10+8) UDP/Batch-128-8 447µs ± 4% 470µs ± 7% +5.22% (p=0.002 n=10+10) UDP/Batch-256-8 960µs ±10% 966µs ±19% ~ (p=0.853 n=10+10) UDP/Batch-512-8 1.00ms ± 7% 0.99ms ± 7% ~ (p=0.387 n=9+9) name old alloc/op new alloc/op delta UDP/Batch-1-8 232B ± 0% 52B ± 0% -77.59% (p=0.000 n=10+10) UDP/Batch-2-8 280B ± 0% 104B ± 0% -62.86% (p=0.000 n=10+10) UDP/Batch-4-8 384B ± 0% 208B ± 0% -45.83% (p=0.000 n=10+10) UDP/Batch-8-8 592B ± 0% 416B ± 0% -29.73% (p=0.000 n=10+10) UDP/Batch-16-8 1.01kB ± 0% 0.83kB ± 0% -17.46% (p=0.000 n=10+10) UDP/Batch-32-8 1.84kB ± 0% 1.66kB ± 0% -9.57% (p=0.002 n=8+10) UDP/Batch-64-8 3.51kB ± 0% 3.33kB ± 0% -5.00% (p=0.000 n=10+8) UDP/Batch-128-8 6.84kB ± 0% 6.66kB ± 0% -2.57% (p=0.001 n=7+7) UDP/Batch-256-8 13.5kB ± 0% 13.3kB ± 0% -1.33% (p=0.000 n=10+10) UDP/Batch-512-8 14.7kB ± 0% 14.5kB ± 0% -1.19% (p=0.000 n=8+8) name old allocs/op new allocs/op delta UDP/Batch-1-8 8.00 ± 0% 2.00 ± 0% -75.00% (p=0.000 n=10+10) UDP/Batch-2-8 10.0 ± 0% 4.0 ± 0% -60.00% (p=0.000 n=10+10) UDP/Batch-4-8 14.0 ± 0% 8.0 ± 0% -42.86% (p=0.000 n=10+10) UDP/Batch-8-8 22.0 ± 0% 16.0 ± 0% -27.27% (p=0.000 n=10+10) UDP/Batch-16-8 38.0 ± 0% 32.0 ± 0% -15.79% (p=0.000 n=10+10) UDP/Batch-32-8 70.0 ± 0% 64.0 ± 0% -8.57% (p=0.000 n=10+10) UDP/Batch-64-8 134 ± 0% 128 ± 0% -4.48% (p=0.000 n=10+10) UDP/Batch-128-8 262 ± 0% 256 ± 0% -2.29% (p=0.000 n=10+10) UDP/Batch-256-8 518 ± 0% 512 ± 0% -1.16% (p=0.000 n=10+10) UDP/Batch-512-8 562 ± 0% 556 ± 0% -1.07% (p=0.000 n=10+10) ``` Contributes to golang/go#26838

The closure for the callback to RawConn.Read/Write is responsible for multiple allocations per call to RecvMmsg and SendMmsg. The batched read and write are used primarily to avoid per-call overhead, so any such overhead negates the advantage of using these functions. This change introduces a struct type holding all the variables captured by the closure passed to RawConn.Read/Write. The struct is reused to amortize the allocations by means of a sync.Pool. A suitable global sync.Pool instance already existed, for buffers used to pack mmsg headers. This change allows to reuse all allocations in WriteBatch. In ReadBatch, only the returned net.Addr instances still need to be allocated for each message, which cannot be avoided without fundamental changes to the package interface. ``` name old time/op new time/op delta UDP/Batch-1-8 5.34µs ± 1% 5.40µs ± 3% ~ (p=0.173 n=8+10) UDP/Batch-2-8 9.74µs ± 1% 9.24µs ± 9% -5.21% (p=0.035 n=9+10) UDP/Batch-4-8 16.2µs ± 4% 16.2µs ± 1% ~ (p=0.758 n=9+7) UDP/Batch-8-8 30.0µs ± 4% 30.0µs ± 4% ~ (p=0.971 n=10+10) UDP/Batch-16-8 57.3µs ± 3% 60.9µs ±16% +6.43% (p=0.031 n=9+9) UDP/Batch-32-8 115µs ± 5% 119µs ± 6% +3.15% (p=0.043 n=10+10) UDP/Batch-64-8 234µs ±16% 237µs ± 4% ~ (p=0.173 n=10+8) UDP/Batch-128-8 447µs ± 4% 470µs ± 7% +5.22% (p=0.002 n=10+10) UDP/Batch-256-8 960µs ±10% 966µs ±19% ~ (p=0.853 n=10+10) UDP/Batch-512-8 1.00ms ± 7% 0.99ms ± 7% ~ (p=0.387 n=9+9) name old alloc/op new alloc/op delta UDP/Batch-1-8 232B ± 0% 52B ± 0% -77.59% (p=0.000 n=10+10) UDP/Batch-2-8 280B ± 0% 104B ± 0% -62.86% (p=0.000 n=10+10) UDP/Batch-4-8 384B ± 0% 208B ± 0% -45.83% (p=0.000 n=10+10) UDP/Batch-8-8 592B ± 0% 416B ± 0% -29.73% (p=0.000 n=10+10) UDP/Batch-16-8 1.01kB ± 0% 0.83kB ± 0% -17.46% (p=0.000 n=10+10) UDP/Batch-32-8 1.84kB ± 0% 1.66kB ± 0% -9.57% (p=0.002 n=8+10) UDP/Batch-64-8 3.51kB ± 0% 3.33kB ± 0% -5.00% (p=0.000 n=10+8) UDP/Batch-128-8 6.84kB ± 0% 6.66kB ± 0% -2.57% (p=0.001 n=7+7) UDP/Batch-256-8 13.5kB ± 0% 13.3kB ± 0% -1.33% (p=0.000 n=10+10) UDP/Batch-512-8 14.7kB ± 0% 14.5kB ± 0% -1.19% (p=0.000 n=8+8) name old allocs/op new allocs/op delta UDP/Batch-1-8 8.00 ± 0% 2.00 ± 0% -75.00% (p=0.000 n=10+10) UDP/Batch-2-8 10.0 ± 0% 4.0 ± 0% -60.00% (p=0.000 n=10+10) UDP/Batch-4-8 14.0 ± 0% 8.0 ± 0% -42.86% (p=0.000 n=10+10) UDP/Batch-8-8 22.0 ± 0% 16.0 ± 0% -27.27% (p=0.000 n=10+10) UDP/Batch-16-8 38.0 ± 0% 32.0 ± 0% -15.79% (p=0.000 n=10+10) UDP/Batch-32-8 70.0 ± 0% 64.0 ± 0% -8.57% (p=0.000 n=10+10) UDP/Batch-64-8 134 ± 0% 128 ± 0% -4.48% (p=0.000 n=10+10) UDP/Batch-128-8 262 ± 0% 256 ± 0% -2.29% (p=0.000 n=10+10) UDP/Batch-256-8 518 ± 0% 512 ± 0% -1.16% (p=0.000 n=10+10) UDP/Batch-512-8 562 ± 0% 556 ± 0% -1.07% (p=0.000 n=10+10) ``` Contributes to golang/go#26838

gopherbot · 2022-01-26T14:49:21Z

Change https://golang.org/cl/380934 mentions this issue: internal/socket: reuse closure in Recv/SendMmsg

The closure for the callback to RawConn.Read/Write is responsible for multiple allocations per call to RecvMmsg and SendMmsg. The batched read and write are used primarily to avoid per-call overhead, so any such overhead negates the advantage of using these functions. This change introduces a struct type holding all the variables captured by the closure passed to RawConn.Read/Write. The struct is reused to amortize the allocations by means of a sync.Pool. A suitable global sync.Pool instance already existed, for buffers used to pack mmsg headers. This change allows to reuse all allocations in WriteBatch. In ReadBatch, only the returned net.Addr instances still need to be allocated for each message, which cannot be avoided without fundamental changes to the package interface. ``` name old time/op new time/op delta UDP/Batch-1-8 5.34µs ± 1% 5.40µs ± 3% ~ (p=0.173 n=8+10) UDP/Batch-2-8 9.74µs ± 1% 9.24µs ± 9% -5.21% (p=0.035 n=9+10) UDP/Batch-4-8 16.2µs ± 4% 16.2µs ± 1% ~ (p=0.758 n=9+7) UDP/Batch-8-8 30.0µs ± 4% 30.0µs ± 4% ~ (p=0.971 n=10+10) UDP/Batch-16-8 57.3µs ± 3% 60.9µs ±16% +6.43% (p=0.031 n=9+9) UDP/Batch-32-8 115µs ± 5% 119µs ± 6% +3.15% (p=0.043 n=10+10) UDP/Batch-64-8 234µs ±16% 237µs ± 4% ~ (p=0.173 n=10+8) UDP/Batch-128-8 447µs ± 4% 470µs ± 7% +5.22% (p=0.002 n=10+10) UDP/Batch-256-8 960µs ±10% 966µs ±19% ~ (p=0.853 n=10+10) UDP/Batch-512-8 1.00ms ± 7% 0.99ms ± 7% ~ (p=0.387 n=9+9) name old alloc/op new alloc/op delta UDP/Batch-1-8 232B ± 0% 52B ± 0% -77.59% (p=0.000 n=10+10) UDP/Batch-2-8 280B ± 0% 104B ± 0% -62.86% (p=0.000 n=10+10) UDP/Batch-4-8 384B ± 0% 208B ± 0% -45.83% (p=0.000 n=10+10) UDP/Batch-8-8 592B ± 0% 416B ± 0% -29.73% (p=0.000 n=10+10) UDP/Batch-16-8 1.01kB ± 0% 0.83kB ± 0% -17.46% (p=0.000 n=10+10) UDP/Batch-32-8 1.84kB ± 0% 1.66kB ± 0% -9.57% (p=0.002 n=8+10) UDP/Batch-64-8 3.51kB ± 0% 3.33kB ± 0% -5.00% (p=0.000 n=10+8) UDP/Batch-128-8 6.84kB ± 0% 6.66kB ± 0% -2.57% (p=0.001 n=7+7) UDP/Batch-256-8 13.5kB ± 0% 13.3kB ± 0% -1.33% (p=0.000 n=10+10) UDP/Batch-512-8 14.7kB ± 0% 14.5kB ± 0% -1.19% (p=0.000 n=8+8) name old allocs/op new allocs/op delta UDP/Batch-1-8 8.00 ± 0% 2.00 ± 0% -75.00% (p=0.000 n=10+10) UDP/Batch-2-8 10.0 ± 0% 4.0 ± 0% -60.00% (p=0.000 n=10+10) UDP/Batch-4-8 14.0 ± 0% 8.0 ± 0% -42.86% (p=0.000 n=10+10) UDP/Batch-8-8 22.0 ± 0% 16.0 ± 0% -27.27% (p=0.000 n=10+10) UDP/Batch-16-8 38.0 ± 0% 32.0 ± 0% -15.79% (p=0.000 n=10+10) UDP/Batch-32-8 70.0 ± 0% 64.0 ± 0% -8.57% (p=0.000 n=10+10) UDP/Batch-64-8 134 ± 0% 128 ± 0% -4.48% (p=0.000 n=10+10) UDP/Batch-128-8 262 ± 0% 256 ± 0% -2.29% (p=0.000 n=10+10) UDP/Batch-256-8 518 ± 0% 512 ± 0% -1.16% (p=0.000 n=10+10) UDP/Batch-512-8 562 ± 0% 556 ± 0% -1.07% (p=0.000 n=10+10) ``` Contributes to golang/go#26838 Change-Id: I16ecfc38dbb5a4d9b1ceacd1dd99fda38f346807 GitHub-Last-Rev: d1dda93 GitHub-Pull-Request: #126 Reviewed-on: https://go-review.googlesource.com/c/net/+/380934 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> Trust: Michael Knyszek <mknyszek@google.com>

The closure for the callback to RawConn.Read/Write is responsible for multiple allocations per call to RecvMmsg and SendMmsg. The batched read and write are used primarily to avoid per-call overhead, so any such overhead negates the advantage of using these functions. This change introduces a struct type holding all the variables captured by the closure passed to RawConn.Read/Write. The struct is reused to amortize the allocations by means of a sync.Pool. A suitable global sync.Pool instance already existed, for buffers used to pack mmsg headers. This change allows to reuse all allocations in WriteBatch. In ReadBatch, only the returned net.Addr instances still need to be allocated for each message, which cannot be avoided without fundamental changes to the package interface. ``` name old time/op new time/op delta UDP/Batch-1-8 5.34µs ± 1% 5.40µs ± 3% ~ (p=0.173 n=8+10) UDP/Batch-2-8 9.74µs ± 1% 9.24µs ± 9% -5.21% (p=0.035 n=9+10) UDP/Batch-4-8 16.2µs ± 4% 16.2µs ± 1% ~ (p=0.758 n=9+7) UDP/Batch-8-8 30.0µs ± 4% 30.0µs ± 4% ~ (p=0.971 n=10+10) UDP/Batch-16-8 57.3µs ± 3% 60.9µs ±16% +6.43% (p=0.031 n=9+9) UDP/Batch-32-8 115µs ± 5% 119µs ± 6% +3.15% (p=0.043 n=10+10) UDP/Batch-64-8 234µs ±16% 237µs ± 4% ~ (p=0.173 n=10+8) UDP/Batch-128-8 447µs ± 4% 470µs ± 7% +5.22% (p=0.002 n=10+10) UDP/Batch-256-8 960µs ±10% 966µs ±19% ~ (p=0.853 n=10+10) UDP/Batch-512-8 1.00ms ± 7% 0.99ms ± 7% ~ (p=0.387 n=9+9) name old alloc/op new alloc/op delta UDP/Batch-1-8 232B ± 0% 52B ± 0% -77.59% (p=0.000 n=10+10) UDP/Batch-2-8 280B ± 0% 104B ± 0% -62.86% (p=0.000 n=10+10) UDP/Batch-4-8 384B ± 0% 208B ± 0% -45.83% (p=0.000 n=10+10) UDP/Batch-8-8 592B ± 0% 416B ± 0% -29.73% (p=0.000 n=10+10) UDP/Batch-16-8 1.01kB ± 0% 0.83kB ± 0% -17.46% (p=0.000 n=10+10) UDP/Batch-32-8 1.84kB ± 0% 1.66kB ± 0% -9.57% (p=0.002 n=8+10) UDP/Batch-64-8 3.51kB ± 0% 3.33kB ± 0% -5.00% (p=0.000 n=10+8) UDP/Batch-128-8 6.84kB ± 0% 6.66kB ± 0% -2.57% (p=0.001 n=7+7) UDP/Batch-256-8 13.5kB ± 0% 13.3kB ± 0% -1.33% (p=0.000 n=10+10) UDP/Batch-512-8 14.7kB ± 0% 14.5kB ± 0% -1.19% (p=0.000 n=8+8) name old allocs/op new allocs/op delta UDP/Batch-1-8 8.00 ± 0% 2.00 ± 0% -75.00% (p=0.000 n=10+10) UDP/Batch-2-8 10.0 ± 0% 4.0 ± 0% -60.00% (p=0.000 n=10+10) UDP/Batch-4-8 14.0 ± 0% 8.0 ± 0% -42.86% (p=0.000 n=10+10) UDP/Batch-8-8 22.0 ± 0% 16.0 ± 0% -27.27% (p=0.000 n=10+10) UDP/Batch-16-8 38.0 ± 0% 32.0 ± 0% -15.79% (p=0.000 n=10+10) UDP/Batch-32-8 70.0 ± 0% 64.0 ± 0% -8.57% (p=0.000 n=10+10) UDP/Batch-64-8 134 ± 0% 128 ± 0% -4.48% (p=0.000 n=10+10) UDP/Batch-128-8 262 ± 0% 256 ± 0% -2.29% (p=0.000 n=10+10) UDP/Batch-256-8 518 ± 0% 512 ± 0% -1.16% (p=0.000 n=10+10) UDP/Batch-512-8 562 ± 0% 556 ± 0% -1.07% (p=0.000 n=10+10) ``` Contributes to golang/go#26838 Change-Id: I16ecfc38dbb5a4d9b1ceacd1dd99fda38f346807 GitHub-Last-Rev: d1dda931f61bd08cab782fa50406574d5e227154 GitHub-Pull-Request: golang/net#126 Reviewed-on: https://go-review.googlesource.com/c/net/+/380934 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> Trust: Michael Knyszek <mknyszek@google.com>

gopherbot added this to the Unreleased milestone Aug 7, 2018

mikioh added the Performance label Aug 8, 2018

mikioh changed the title ~~x/net: ReadBatch memory usage~~ x/net/ipv{4,6}: ReadBatch memory usage Aug 12, 2018

matzf mentioned this issue Apr 27, 2021

router: reduce per-packet allocations scionproto/scion#4030

Closed

matzf mentioned this issue Apr 30, 2021

internal/socket: reuse buffers in recv/sendMsgs golang/net#102

Closed

gopherbot closed this as completed in golang/net@f8dd838 May 1, 2021

matzf mentioned this issue Jan 26, 2022

internal/socket: reuse closure in Recv/SendMmsg golang/net#126

Closed

matzf mentioned this issue Sep 6, 2022

x/net/ipv{4,6}: adopt net/netip address types #54883

Open

golang locked and limited conversation to collaborators Jan 26, 2023

gopherbot added the FrozenDueToAge label Jan 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

x/net/ipv{4,6}: ReadBatch memory usage #26838

x/net/ipv{4,6}: ReadBatch memory usage #26838

mhr3 commented Aug 7, 2018

mikioh commented Aug 8, 2018 •

edited

mhr3 commented Aug 8, 2018

mikioh commented Aug 12, 2018

gopherbot commented May 1, 2021

gopherbot commented Jan 26, 2022

x/net/ipv{4,6}: ReadBatch memory usage #26838

x/net/ipv{4,6}: ReadBatch memory usage #26838

Comments

mhr3 commented Aug 7, 2018

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What did you do?

What did you expect to see?

What did you see instead?

mikioh commented Aug 8, 2018 • edited

mhr3 commented Aug 8, 2018

mikioh commented Aug 12, 2018

gopherbot commented May 1, 2021

gopherbot commented Jan 26, 2022

What version of Go are you using (`go version`)?

mikioh commented Aug 8, 2018 •

edited