New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x/net/ipv{4,6}: ReadBatch memory usage #26838
Comments
I agree that there are areas for performance improvement on {Read,Write}Batch methods. Feel free to send a CL. Well, the attached benchmark is not self-contained; no traffic generator code. You may use a benchmark in golang.org/x/net/internal/socket_go1_9_test.go.
I guess that the allocations are for storage of iovec and sockaddr. |
Right, I used external traffic generator to make sure it doesn't skew the benchmark results ( I also used allocfreetrace to confirm your guess:
As you suspected, it is the iovec and socketaddr which grow linearly with the number of items in the batch, but there's also a few other allocs (though at least those remain constant). Not really sure how to fix this without changing the signature of ReadBatch() |
You may think about a few ways to remove allocations on IO code path, for example, having a free list for temporary stuff. However, it's better to address your issue on the IO methods first; in general, comparing ReadBatch vs. Read doesn't make sense on connectionless datagram-based transport protocols because of the differences in the transport protocol characteristics, for example, any-to-any vs. one-to-one mapping conversation style. |
Change https://golang.org/cl/315589 mentions this issue: |
Avoid allocating scionPacketProcessor; re-organized packet processing functions. The scionPacketProcessor is now the main entry point for packet processing and contains all the pre-allocated per-packet state. Avoid allocation of message buffer for writing with WriteBatch. We could just pre-allocate a buffer, but as we always write a batch of size one, just use WriteTo instead. For writing single packets, this is both more convenient and faster. Additional benefit: Read/WriteBatch make rather large allocations internally (see golang/go#26838) which is avoided by using WriteTo. The largest remaining allocations when processing SCION packets are: - ReadBatch allocates temporary storage internally (as mentioned above). This is by far the biggest culprit, it accounts for almost 80% (after the changes in this patch). - slayers/path.NewPath called during slayers.SCION.DecodeFromBytes: Could be avoided by reusing Path objects, but requires larger changes. - slayers/path/scion.Raw.GetCurrentHopField and GetCurrentInfoField, called during parsePath: Both return pointer types. Should be possible to change this to value types instead, but requires larger changes. - DataPlane.resolveLocalDst: allocates IP and UDPAddr. Can only really be fixed with better address types. - slayers/path.FullMAC. Allocates both input and output. The function interface could be changed to allow reuse of the input buffer. The output buffer cannot be reused with this CMAC library.
Avoid allocating scionPacketProcessor; re-organized packet processing functions. The scionPacketProcessor is now the main entry point for packet processing and contains all the pre-allocated per-packet state. Avoid allocation of message buffer for writing with WriteBatch. We could just pre-allocate a buffer, but as we always write a batch of size one, just use WriteTo instead. For writing single packets, this is both more convenient and faster. Additional benefit: Read/WriteBatch make rather large allocations internally (see golang/go#26838) which is avoided by using WriteTo. The largest remaining allocations when processing SCION packets are: - ReadBatch allocates temporary storage internally (as mentioned above). This is by far the biggest culprit, it accounts for almost 80% (after the changes in this patch). - slayers/path.NewPath called during slayers.SCION.DecodeFromBytes: Could be avoided by reusing Path objects, but requires larger changes. - slayers/path/scion.Raw.GetCurrentHopField and GetCurrentInfoField, called during parsePath: Both return pointer types. Should be possible to change this to value types instead, but requires larger changes. - DataPlane.resolveLocalDst: allocates IP and UDPAddr. Can only really be fixed with better address types. - slayers/path.FullMAC. Allocates both input and output. The function interface could be changed to allow reuse of the input buffer. The output buffer cannot be reused with this CMAC library. Depends on (or actually includes) scionproto#4029. Closes scionproto#4030 GitOrigin-RevId: 72eb8918a54a25f74159ef7398ef3d98d756e501
Avoid allocating scionPacketProcessor; re-organized packet processing functions. The scionPacketProcessor is now the main entry point for packet processing and contains all the pre-allocated per-packet state. Avoid allocation of message buffer for writing with WriteBatch. We could just pre-allocate a buffer, but as we always write a batch of size one, just use WriteTo instead. For writing single packets, this is both more convenient and faster. Additional benefit: Read/WriteBatch make rather large allocations internally (see golang/go#26838) which is avoided by using WriteTo. The largest remaining allocations when processing SCION packets are: - ReadBatch allocates temporary storage internally (as mentioned above). This is by far the biggest culprit, it accounts for almost 80% (after the changes in this patch). - slayers/path.NewPath called during slayers.SCION.DecodeFromBytes: Could be avoided by reusing Path objects, but requires larger changes. - slayers/path/scion.Raw.GetCurrentHopField and GetCurrentInfoField, called during parsePath: Both return pointer types. Should be possible to change this to value types instead, but requires larger changes. - DataPlane.resolveLocalDst: allocates IP and UDPAddr. Can only really be fixed with better address types. - slayers/path.FullMAC. Allocates both input and output. The function interface could be changed to allow reuse of the input buffer. The output buffer cannot be reused with this CMAC library. Depends on (or actually includes) scionproto#4029. Closes scionproto#4030 GitOrigin-RevId: 72eb8918a54a25f74159ef7398ef3d98d756e501
Avoid allocating scionPacketProcessor; re-organized packet processing functions. The scionPacketProcessor is now the main entry point for packet processing and contains all the pre-allocated per-packet state. Avoid allocation of message buffer for writing with WriteBatch. We could just pre-allocate a buffer, but as we always write a batch of size one, just use WriteTo instead. For writing single packets, this is both more convenient and faster. Additional benefit: Read/WriteBatch make rather large allocations internally (see golang/go#26838) which is avoided by using WriteTo. The largest remaining allocations when processing SCION packets are: - ReadBatch allocates temporary storage internally (as mentioned above). This is by far the biggest culprit, it accounts for almost 80% (after the changes in this patch). - slayers/path.NewPath called during slayers.SCION.DecodeFromBytes: Could be avoided by reusing Path objects, but requires larger changes. - slayers/path/scion.Raw.GetCurrentHopField and GetCurrentInfoField, called during parsePath: Both return pointer types. Should be possible to change this to value types instead, but requires larger changes. - DataPlane.resolveLocalDst: allocates IP and UDPAddr. Can only really be fixed with better address types. - slayers/path.FullMAC. Allocates both input and output. The function interface could be changed to allow reuse of the input buffer. The output buffer cannot be reused with this CMAC library. Depends on (or actually includes) scionproto#4029. Closes scionproto#4030 GitOrigin-RevId: 72eb8918a54a25f74159ef7398ef3d98d756e501
The closure for the callback to RawConn.Read/Write is responsible for a multiple allocations per call to RecvMmsg and SendMmsg. The batched read and write are used primarily to avoid per-call overhead, so any such overhead negates the advantage of using these functions. This change introduces a struct type holding all the variables captured by the closure passed to RawConn.Read/Write. The struct is reused to amortize the allocations by means of a sync.Pool. A suitable global sync.Pool instance already existed, for buffers used to pack mmsg headers. This pool has been extended to hold both This change allows to reuse all allocations in WriteBatch. In ReadBatch, only the returned net.Addr instances are allocated for each message; this cannot be further improved without fundamental changes to the package interface. ``` name old time/op new time/op delta UDP/Batch-1-8 5.34µs ± 1% 5.40µs ± 3% ~ (p=0.173 n=8+10) UDP/Batch-2-8 9.74µs ± 1% 9.24µs ± 9% -5.21% (p=0.035 n=9+10) UDP/Batch-4-8 16.2µs ± 4% 16.2µs ± 1% ~ (p=0.758 n=9+7) UDP/Batch-8-8 30.0µs ± 4% 30.0µs ± 4% ~ (p=0.971 n=10+10) UDP/Batch-16-8 57.3µs ± 3% 60.9µs ±16% +6.43% (p=0.031 n=9+9) UDP/Batch-32-8 115µs ± 5% 119µs ± 6% +3.15% (p=0.043 n=10+10) UDP/Batch-64-8 234µs ±16% 237µs ± 4% ~ (p=0.173 n=10+8) UDP/Batch-128-8 447µs ± 4% 470µs ± 7% +5.22% (p=0.002 n=10+10) UDP/Batch-256-8 960µs ±10% 966µs ±19% ~ (p=0.853 n=10+10) UDP/Batch-512-8 1.00ms ± 7% 0.99ms ± 7% ~ (p=0.387 n=9+9) name old alloc/op new alloc/op delta UDP/Batch-1-8 232B ± 0% 52B ± 0% -77.59% (p=0.000 n=10+10) UDP/Batch-2-8 280B ± 0% 104B ± 0% -62.86% (p=0.000 n=10+10) UDP/Batch-4-8 384B ± 0% 208B ± 0% -45.83% (p=0.000 n=10+10) UDP/Batch-8-8 592B ± 0% 416B ± 0% -29.73% (p=0.000 n=10+10) UDP/Batch-16-8 1.01kB ± 0% 0.83kB ± 0% -17.46% (p=0.000 n=10+10) UDP/Batch-32-8 1.84kB ± 0% 1.66kB ± 0% -9.57% (p=0.002 n=8+10) UDP/Batch-64-8 3.51kB ± 0% 3.33kB ± 0% -5.00% (p=0.000 n=10+8) UDP/Batch-128-8 6.84kB ± 0% 6.66kB ± 0% -2.57% (p=0.001 n=7+7) UDP/Batch-256-8 13.5kB ± 0% 13.3kB ± 0% -1.33% (p=0.000 n=10+10) UDP/Batch-512-8 14.7kB ± 0% 14.5kB ± 0% -1.19% (p=0.000 n=8+8) name old allocs/op new allocs/op delta UDP/Batch-1-8 8.00 ± 0% 2.00 ± 0% -75.00% (p=0.000 n=10+10) UDP/Batch-2-8 10.0 ± 0% 4.0 ± 0% -60.00% (p=0.000 n=10+10) UDP/Batch-4-8 14.0 ± 0% 8.0 ± 0% -42.86% (p=0.000 n=10+10) UDP/Batch-8-8 22.0 ± 0% 16.0 ± 0% -27.27% (p=0.000 n=10+10) UDP/Batch-16-8 38.0 ± 0% 32.0 ± 0% -15.79% (p=0.000 n=10+10) UDP/Batch-32-8 70.0 ± 0% 64.0 ± 0% -8.57% (p=0.000 n=10+10) UDP/Batch-64-8 134 ± 0% 128 ± 0% -4.48% (p=0.000 n=10+10) UDP/Batch-128-8 262 ± 0% 256 ± 0% -2.29% (p=0.000 n=10+10) UDP/Batch-256-8 518 ± 0% 512 ± 0% -1.16% (p=0.000 n=10+10) UDP/Batch-512-8 562 ± 0% 556 ± 0% -1.07% (p=0.000 n=10+10) ``` Contributes to golang/go#26838
The closure for the callback to RawConn.Read/Write is responsible for a multiple allocations per call to RecvMmsg and SendMmsg. The batched read and write are used primarily to avoid per-call overhead, so any such overhead negates the advantage of using these functions. This change introduces a struct type holding all the variables captured by the closure passed to RawConn.Read/Write. The struct is reused to amortize the allocations by means of a sync.Pool. A suitable global sync.Pool instance already existed, for buffers used to pack mmsg headers. This pool has been extended to hold both This change allows to reuse all allocations in WriteBatch. In ReadBatch, only the returned net.Addr instances still need to be allocated for each message, which cannot be avoided without fundamental changes to the package interface. ``` name old time/op new time/op delta UDP/Batch-1-8 5.34µs ± 1% 5.40µs ± 3% ~ (p=0.173 n=8+10) UDP/Batch-2-8 9.74µs ± 1% 9.24µs ± 9% -5.21% (p=0.035 n=9+10) UDP/Batch-4-8 16.2µs ± 4% 16.2µs ± 1% ~ (p=0.758 n=9+7) UDP/Batch-8-8 30.0µs ± 4% 30.0µs ± 4% ~ (p=0.971 n=10+10) UDP/Batch-16-8 57.3µs ± 3% 60.9µs ±16% +6.43% (p=0.031 n=9+9) UDP/Batch-32-8 115µs ± 5% 119µs ± 6% +3.15% (p=0.043 n=10+10) UDP/Batch-64-8 234µs ±16% 237µs ± 4% ~ (p=0.173 n=10+8) UDP/Batch-128-8 447µs ± 4% 470µs ± 7% +5.22% (p=0.002 n=10+10) UDP/Batch-256-8 960µs ±10% 966µs ±19% ~ (p=0.853 n=10+10) UDP/Batch-512-8 1.00ms ± 7% 0.99ms ± 7% ~ (p=0.387 n=9+9) name old alloc/op new alloc/op delta UDP/Batch-1-8 232B ± 0% 52B ± 0% -77.59% (p=0.000 n=10+10) UDP/Batch-2-8 280B ± 0% 104B ± 0% -62.86% (p=0.000 n=10+10) UDP/Batch-4-8 384B ± 0% 208B ± 0% -45.83% (p=0.000 n=10+10) UDP/Batch-8-8 592B ± 0% 416B ± 0% -29.73% (p=0.000 n=10+10) UDP/Batch-16-8 1.01kB ± 0% 0.83kB ± 0% -17.46% (p=0.000 n=10+10) UDP/Batch-32-8 1.84kB ± 0% 1.66kB ± 0% -9.57% (p=0.002 n=8+10) UDP/Batch-64-8 3.51kB ± 0% 3.33kB ± 0% -5.00% (p=0.000 n=10+8) UDP/Batch-128-8 6.84kB ± 0% 6.66kB ± 0% -2.57% (p=0.001 n=7+7) UDP/Batch-256-8 13.5kB ± 0% 13.3kB ± 0% -1.33% (p=0.000 n=10+10) UDP/Batch-512-8 14.7kB ± 0% 14.5kB ± 0% -1.19% (p=0.000 n=8+8) name old allocs/op new allocs/op delta UDP/Batch-1-8 8.00 ± 0% 2.00 ± 0% -75.00% (p=0.000 n=10+10) UDP/Batch-2-8 10.0 ± 0% 4.0 ± 0% -60.00% (p=0.000 n=10+10) UDP/Batch-4-8 14.0 ± 0% 8.0 ± 0% -42.86% (p=0.000 n=10+10) UDP/Batch-8-8 22.0 ± 0% 16.0 ± 0% -27.27% (p=0.000 n=10+10) UDP/Batch-16-8 38.0 ± 0% 32.0 ± 0% -15.79% (p=0.000 n=10+10) UDP/Batch-32-8 70.0 ± 0% 64.0 ± 0% -8.57% (p=0.000 n=10+10) UDP/Batch-64-8 134 ± 0% 128 ± 0% -4.48% (p=0.000 n=10+10) UDP/Batch-128-8 262 ± 0% 256 ± 0% -2.29% (p=0.000 n=10+10) UDP/Batch-256-8 518 ± 0% 512 ± 0% -1.16% (p=0.000 n=10+10) UDP/Batch-512-8 562 ± 0% 556 ± 0% -1.07% (p=0.000 n=10+10) ``` Contributes to golang/go#26838
The closure for the callback to RawConn.Read/Write is responsible for multiple allocations per call to RecvMmsg and SendMmsg. The batched read and write are used primarily to avoid per-call overhead, so any such overhead negates the advantage of using these functions. This change introduces a struct type holding all the variables captured by the closure passed to RawConn.Read/Write. The struct is reused to amortize the allocations by means of a sync.Pool. A suitable global sync.Pool instance already existed, for buffers used to pack mmsg headers. This pool has been extended to hold both This change allows to reuse all allocations in WriteBatch. In ReadBatch, only the returned net.Addr instances still need to be allocated for each message, which cannot be avoided without fundamental changes to the package interface. ``` name old time/op new time/op delta UDP/Batch-1-8 5.34µs ± 1% 5.40µs ± 3% ~ (p=0.173 n=8+10) UDP/Batch-2-8 9.74µs ± 1% 9.24µs ± 9% -5.21% (p=0.035 n=9+10) UDP/Batch-4-8 16.2µs ± 4% 16.2µs ± 1% ~ (p=0.758 n=9+7) UDP/Batch-8-8 30.0µs ± 4% 30.0µs ± 4% ~ (p=0.971 n=10+10) UDP/Batch-16-8 57.3µs ± 3% 60.9µs ±16% +6.43% (p=0.031 n=9+9) UDP/Batch-32-8 115µs ± 5% 119µs ± 6% +3.15% (p=0.043 n=10+10) UDP/Batch-64-8 234µs ±16% 237µs ± 4% ~ (p=0.173 n=10+8) UDP/Batch-128-8 447µs ± 4% 470µs ± 7% +5.22% (p=0.002 n=10+10) UDP/Batch-256-8 960µs ±10% 966µs ±19% ~ (p=0.853 n=10+10) UDP/Batch-512-8 1.00ms ± 7% 0.99ms ± 7% ~ (p=0.387 n=9+9) name old alloc/op new alloc/op delta UDP/Batch-1-8 232B ± 0% 52B ± 0% -77.59% (p=0.000 n=10+10) UDP/Batch-2-8 280B ± 0% 104B ± 0% -62.86% (p=0.000 n=10+10) UDP/Batch-4-8 384B ± 0% 208B ± 0% -45.83% (p=0.000 n=10+10) UDP/Batch-8-8 592B ± 0% 416B ± 0% -29.73% (p=0.000 n=10+10) UDP/Batch-16-8 1.01kB ± 0% 0.83kB ± 0% -17.46% (p=0.000 n=10+10) UDP/Batch-32-8 1.84kB ± 0% 1.66kB ± 0% -9.57% (p=0.002 n=8+10) UDP/Batch-64-8 3.51kB ± 0% 3.33kB ± 0% -5.00% (p=0.000 n=10+8) UDP/Batch-128-8 6.84kB ± 0% 6.66kB ± 0% -2.57% (p=0.001 n=7+7) UDP/Batch-256-8 13.5kB ± 0% 13.3kB ± 0% -1.33% (p=0.000 n=10+10) UDP/Batch-512-8 14.7kB ± 0% 14.5kB ± 0% -1.19% (p=0.000 n=8+8) name old allocs/op new allocs/op delta UDP/Batch-1-8 8.00 ± 0% 2.00 ± 0% -75.00% (p=0.000 n=10+10) UDP/Batch-2-8 10.0 ± 0% 4.0 ± 0% -60.00% (p=0.000 n=10+10) UDP/Batch-4-8 14.0 ± 0% 8.0 ± 0% -42.86% (p=0.000 n=10+10) UDP/Batch-8-8 22.0 ± 0% 16.0 ± 0% -27.27% (p=0.000 n=10+10) UDP/Batch-16-8 38.0 ± 0% 32.0 ± 0% -15.79% (p=0.000 n=10+10) UDP/Batch-32-8 70.0 ± 0% 64.0 ± 0% -8.57% (p=0.000 n=10+10) UDP/Batch-64-8 134 ± 0% 128 ± 0% -4.48% (p=0.000 n=10+10) UDP/Batch-128-8 262 ± 0% 256 ± 0% -2.29% (p=0.000 n=10+10) UDP/Batch-256-8 518 ± 0% 512 ± 0% -1.16% (p=0.000 n=10+10) UDP/Batch-512-8 562 ± 0% 556 ± 0% -1.07% (p=0.000 n=10+10) ``` Contributes to golang/go#26838
The closure for the callback to RawConn.Read/Write is responsible for multiple allocations per call to RecvMmsg and SendMmsg. The batched read and write are used primarily to avoid per-call overhead, so any such overhead negates the advantage of using these functions. This change introduces a struct type holding all the variables captured by the closure passed to RawConn.Read/Write. The struct is reused to amortize the allocations by means of a sync.Pool. A suitable global sync.Pool instance already existed, for buffers used to pack mmsg headers. This change allows to reuse all allocations in WriteBatch. In ReadBatch, only the returned net.Addr instances still need to be allocated for each message, which cannot be avoided without fundamental changes to the package interface. ``` name old time/op new time/op delta UDP/Batch-1-8 5.34µs ± 1% 5.40µs ± 3% ~ (p=0.173 n=8+10) UDP/Batch-2-8 9.74µs ± 1% 9.24µs ± 9% -5.21% (p=0.035 n=9+10) UDP/Batch-4-8 16.2µs ± 4% 16.2µs ± 1% ~ (p=0.758 n=9+7) UDP/Batch-8-8 30.0µs ± 4% 30.0µs ± 4% ~ (p=0.971 n=10+10) UDP/Batch-16-8 57.3µs ± 3% 60.9µs ±16% +6.43% (p=0.031 n=9+9) UDP/Batch-32-8 115µs ± 5% 119µs ± 6% +3.15% (p=0.043 n=10+10) UDP/Batch-64-8 234µs ±16% 237µs ± 4% ~ (p=0.173 n=10+8) UDP/Batch-128-8 447µs ± 4% 470µs ± 7% +5.22% (p=0.002 n=10+10) UDP/Batch-256-8 960µs ±10% 966µs ±19% ~ (p=0.853 n=10+10) UDP/Batch-512-8 1.00ms ± 7% 0.99ms ± 7% ~ (p=0.387 n=9+9) name old alloc/op new alloc/op delta UDP/Batch-1-8 232B ± 0% 52B ± 0% -77.59% (p=0.000 n=10+10) UDP/Batch-2-8 280B ± 0% 104B ± 0% -62.86% (p=0.000 n=10+10) UDP/Batch-4-8 384B ± 0% 208B ± 0% -45.83% (p=0.000 n=10+10) UDP/Batch-8-8 592B ± 0% 416B ± 0% -29.73% (p=0.000 n=10+10) UDP/Batch-16-8 1.01kB ± 0% 0.83kB ± 0% -17.46% (p=0.000 n=10+10) UDP/Batch-32-8 1.84kB ± 0% 1.66kB ± 0% -9.57% (p=0.002 n=8+10) UDP/Batch-64-8 3.51kB ± 0% 3.33kB ± 0% -5.00% (p=0.000 n=10+8) UDP/Batch-128-8 6.84kB ± 0% 6.66kB ± 0% -2.57% (p=0.001 n=7+7) UDP/Batch-256-8 13.5kB ± 0% 13.3kB ± 0% -1.33% (p=0.000 n=10+10) UDP/Batch-512-8 14.7kB ± 0% 14.5kB ± 0% -1.19% (p=0.000 n=8+8) name old allocs/op new allocs/op delta UDP/Batch-1-8 8.00 ± 0% 2.00 ± 0% -75.00% (p=0.000 n=10+10) UDP/Batch-2-8 10.0 ± 0% 4.0 ± 0% -60.00% (p=0.000 n=10+10) UDP/Batch-4-8 14.0 ± 0% 8.0 ± 0% -42.86% (p=0.000 n=10+10) UDP/Batch-8-8 22.0 ± 0% 16.0 ± 0% -27.27% (p=0.000 n=10+10) UDP/Batch-16-8 38.0 ± 0% 32.0 ± 0% -15.79% (p=0.000 n=10+10) UDP/Batch-32-8 70.0 ± 0% 64.0 ± 0% -8.57% (p=0.000 n=10+10) UDP/Batch-64-8 134 ± 0% 128 ± 0% -4.48% (p=0.000 n=10+10) UDP/Batch-128-8 262 ± 0% 256 ± 0% -2.29% (p=0.000 n=10+10) UDP/Batch-256-8 518 ± 0% 512 ± 0% -1.16% (p=0.000 n=10+10) UDP/Batch-512-8 562 ± 0% 556 ± 0% -1.07% (p=0.000 n=10+10) ``` Contributes to golang/go#26838
Change https://golang.org/cl/380934 mentions this issue: |
The closure for the callback to RawConn.Read/Write is responsible for multiple allocations per call to RecvMmsg and SendMmsg. The batched read and write are used primarily to avoid per-call overhead, so any such overhead negates the advantage of using these functions. This change introduces a struct type holding all the variables captured by the closure passed to RawConn.Read/Write. The struct is reused to amortize the allocations by means of a sync.Pool. A suitable global sync.Pool instance already existed, for buffers used to pack mmsg headers. This change allows to reuse all allocations in WriteBatch. In ReadBatch, only the returned net.Addr instances still need to be allocated for each message, which cannot be avoided without fundamental changes to the package interface. ``` name old time/op new time/op delta UDP/Batch-1-8 5.34µs ± 1% 5.40µs ± 3% ~ (p=0.173 n=8+10) UDP/Batch-2-8 9.74µs ± 1% 9.24µs ± 9% -5.21% (p=0.035 n=9+10) UDP/Batch-4-8 16.2µs ± 4% 16.2µs ± 1% ~ (p=0.758 n=9+7) UDP/Batch-8-8 30.0µs ± 4% 30.0µs ± 4% ~ (p=0.971 n=10+10) UDP/Batch-16-8 57.3µs ± 3% 60.9µs ±16% +6.43% (p=0.031 n=9+9) UDP/Batch-32-8 115µs ± 5% 119µs ± 6% +3.15% (p=0.043 n=10+10) UDP/Batch-64-8 234µs ±16% 237µs ± 4% ~ (p=0.173 n=10+8) UDP/Batch-128-8 447µs ± 4% 470µs ± 7% +5.22% (p=0.002 n=10+10) UDP/Batch-256-8 960µs ±10% 966µs ±19% ~ (p=0.853 n=10+10) UDP/Batch-512-8 1.00ms ± 7% 0.99ms ± 7% ~ (p=0.387 n=9+9) name old alloc/op new alloc/op delta UDP/Batch-1-8 232B ± 0% 52B ± 0% -77.59% (p=0.000 n=10+10) UDP/Batch-2-8 280B ± 0% 104B ± 0% -62.86% (p=0.000 n=10+10) UDP/Batch-4-8 384B ± 0% 208B ± 0% -45.83% (p=0.000 n=10+10) UDP/Batch-8-8 592B ± 0% 416B ± 0% -29.73% (p=0.000 n=10+10) UDP/Batch-16-8 1.01kB ± 0% 0.83kB ± 0% -17.46% (p=0.000 n=10+10) UDP/Batch-32-8 1.84kB ± 0% 1.66kB ± 0% -9.57% (p=0.002 n=8+10) UDP/Batch-64-8 3.51kB ± 0% 3.33kB ± 0% -5.00% (p=0.000 n=10+8) UDP/Batch-128-8 6.84kB ± 0% 6.66kB ± 0% -2.57% (p=0.001 n=7+7) UDP/Batch-256-8 13.5kB ± 0% 13.3kB ± 0% -1.33% (p=0.000 n=10+10) UDP/Batch-512-8 14.7kB ± 0% 14.5kB ± 0% -1.19% (p=0.000 n=8+8) name old allocs/op new allocs/op delta UDP/Batch-1-8 8.00 ± 0% 2.00 ± 0% -75.00% (p=0.000 n=10+10) UDP/Batch-2-8 10.0 ± 0% 4.0 ± 0% -60.00% (p=0.000 n=10+10) UDP/Batch-4-8 14.0 ± 0% 8.0 ± 0% -42.86% (p=0.000 n=10+10) UDP/Batch-8-8 22.0 ± 0% 16.0 ± 0% -27.27% (p=0.000 n=10+10) UDP/Batch-16-8 38.0 ± 0% 32.0 ± 0% -15.79% (p=0.000 n=10+10) UDP/Batch-32-8 70.0 ± 0% 64.0 ± 0% -8.57% (p=0.000 n=10+10) UDP/Batch-64-8 134 ± 0% 128 ± 0% -4.48% (p=0.000 n=10+10) UDP/Batch-128-8 262 ± 0% 256 ± 0% -2.29% (p=0.000 n=10+10) UDP/Batch-256-8 518 ± 0% 512 ± 0% -1.16% (p=0.000 n=10+10) UDP/Batch-512-8 562 ± 0% 556 ± 0% -1.07% (p=0.000 n=10+10) ``` Contributes to golang/go#26838 Change-Id: I16ecfc38dbb5a4d9b1ceacd1dd99fda38f346807 GitHub-Last-Rev: d1dda93 GitHub-Pull-Request: #126 Reviewed-on: https://go-review.googlesource.com/c/net/+/380934 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> Trust: Michael Knyszek <mknyszek@google.com>
The closure for the callback to RawConn.Read/Write is responsible for multiple allocations per call to RecvMmsg and SendMmsg. The batched read and write are used primarily to avoid per-call overhead, so any such overhead negates the advantage of using these functions. This change introduces a struct type holding all the variables captured by the closure passed to RawConn.Read/Write. The struct is reused to amortize the allocations by means of a sync.Pool. A suitable global sync.Pool instance already existed, for buffers used to pack mmsg headers. This change allows to reuse all allocations in WriteBatch. In ReadBatch, only the returned net.Addr instances still need to be allocated for each message, which cannot be avoided without fundamental changes to the package interface. ``` name old time/op new time/op delta UDP/Batch-1-8 5.34µs ± 1% 5.40µs ± 3% ~ (p=0.173 n=8+10) UDP/Batch-2-8 9.74µs ± 1% 9.24µs ± 9% -5.21% (p=0.035 n=9+10) UDP/Batch-4-8 16.2µs ± 4% 16.2µs ± 1% ~ (p=0.758 n=9+7) UDP/Batch-8-8 30.0µs ± 4% 30.0µs ± 4% ~ (p=0.971 n=10+10) UDP/Batch-16-8 57.3µs ± 3% 60.9µs ±16% +6.43% (p=0.031 n=9+9) UDP/Batch-32-8 115µs ± 5% 119µs ± 6% +3.15% (p=0.043 n=10+10) UDP/Batch-64-8 234µs ±16% 237µs ± 4% ~ (p=0.173 n=10+8) UDP/Batch-128-8 447µs ± 4% 470µs ± 7% +5.22% (p=0.002 n=10+10) UDP/Batch-256-8 960µs ±10% 966µs ±19% ~ (p=0.853 n=10+10) UDP/Batch-512-8 1.00ms ± 7% 0.99ms ± 7% ~ (p=0.387 n=9+9) name old alloc/op new alloc/op delta UDP/Batch-1-8 232B ± 0% 52B ± 0% -77.59% (p=0.000 n=10+10) UDP/Batch-2-8 280B ± 0% 104B ± 0% -62.86% (p=0.000 n=10+10) UDP/Batch-4-8 384B ± 0% 208B ± 0% -45.83% (p=0.000 n=10+10) UDP/Batch-8-8 592B ± 0% 416B ± 0% -29.73% (p=0.000 n=10+10) UDP/Batch-16-8 1.01kB ± 0% 0.83kB ± 0% -17.46% (p=0.000 n=10+10) UDP/Batch-32-8 1.84kB ± 0% 1.66kB ± 0% -9.57% (p=0.002 n=8+10) UDP/Batch-64-8 3.51kB ± 0% 3.33kB ± 0% -5.00% (p=0.000 n=10+8) UDP/Batch-128-8 6.84kB ± 0% 6.66kB ± 0% -2.57% (p=0.001 n=7+7) UDP/Batch-256-8 13.5kB ± 0% 13.3kB ± 0% -1.33% (p=0.000 n=10+10) UDP/Batch-512-8 14.7kB ± 0% 14.5kB ± 0% -1.19% (p=0.000 n=8+8) name old allocs/op new allocs/op delta UDP/Batch-1-8 8.00 ± 0% 2.00 ± 0% -75.00% (p=0.000 n=10+10) UDP/Batch-2-8 10.0 ± 0% 4.0 ± 0% -60.00% (p=0.000 n=10+10) UDP/Batch-4-8 14.0 ± 0% 8.0 ± 0% -42.86% (p=0.000 n=10+10) UDP/Batch-8-8 22.0 ± 0% 16.0 ± 0% -27.27% (p=0.000 n=10+10) UDP/Batch-16-8 38.0 ± 0% 32.0 ± 0% -15.79% (p=0.000 n=10+10) UDP/Batch-32-8 70.0 ± 0% 64.0 ± 0% -8.57% (p=0.000 n=10+10) UDP/Batch-64-8 134 ± 0% 128 ± 0% -4.48% (p=0.000 n=10+10) UDP/Batch-128-8 262 ± 0% 256 ± 0% -2.29% (p=0.000 n=10+10) UDP/Batch-256-8 518 ± 0% 512 ± 0% -1.16% (p=0.000 n=10+10) UDP/Batch-512-8 562 ± 0% 556 ± 0% -1.07% (p=0.000 n=10+10) ``` Contributes to golang/go#26838 Change-Id: I16ecfc38dbb5a4d9b1ceacd1dd99fda38f346807 GitHub-Last-Rev: d1dda931f61bd08cab782fa50406574d5e227154 GitHub-Pull-Request: golang/net#126 Reviewed-on: https://go-review.googlesource.com/c/net/+/380934 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> Trust: Michael Knyszek <mknyszek@google.com>
Please answer these questions before submitting your issue. Thanks!
What version of Go are you using (
go version
)?go version go1.10.1 linux/amd64
Does this issue reproduce with the latest release?
Yes
What did you do?
What did you expect to see?
I was benchmarking the program above (with
-benchmem
) and expected 0 allocations per iteration - as using ReadBatch is supposed to help with performance on heavily-utilized sockets.What did you see instead?
I saw that each extra message in the
msgs
array adds 2 extra allocations for each iteration of the benchmark, negating the extra performance of ReadBatch() vs traditional Read().The text was updated successfully, but these errors were encountered: