Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: memmove executes unaligned accesses on riscv64 #48248

Closed
mundaym opened this issue Sep 8, 2021 · 7 comments
Closed

runtime: memmove executes unaligned accesses on riscv64 #48248

mundaym opened this issue Sep 8, 2021 · 7 comments
Assignees
Labels
arch-riscv Issues solely affecting the riscv64 architecture. compiler/runtime Issues related to the Go compiler and/or runtime. Performance
Milestone

Comments

@mundaym
Copy link
Member

mundaym commented Sep 8, 2021

The performance of memmove when copying more than ~16 bytes of unaligned data is very poor on the HiFive Unmatched. Looking at the code it only attempts to align the source operand before using word-sized load and store operations. This means that stores to the destination operand will be unaligned. On the HiFive Unmatched unaligned accesses result in a trap that is handled by the kernel and so performance is extremely poor (~10x slower than performing a byte-by-byte copy).

Benchmarks:

name                               speed
Memmove/0-4
Memmove/1-4                        37.0MB/s ± 5%
Memmove/2-4                        55.6MB/s ± 8%
Memmove/3-4                        67.2MB/s ± 5%
Memmove/4-4                        85.5MB/s ± 6%
Memmove/5-4                        98.1MB/s ± 6%
Memmove/6-4                         112MB/s ± 3%
Memmove/7-4                         127MB/s ± 4%
Memmove/8-4                         241MB/s ± 3%
Memmove/9-4                         229MB/s ± 7%
Memmove/10-4                        230MB/s ± 9%
Memmove/11-4                        198MB/s ± 5%
Memmove/12-4                        202MB/s ± 3%
Memmove/13-4                        206MB/s ± 4%
Memmove/14-4                        212MB/s ± 3%
Memmove/15-4                        213MB/s ± 6%
Memmove/16-4                        407MB/s ± 4%
Memmove/32-4                        577MB/s ± 3%
Memmove/64-4                        890MB/s ± 4%
Memmove/128-4                      1.28GB/s ± 6%
Memmove/256-4                      1.52GB/s ± 5%
Memmove/512-4                      1.67GB/s ± 2%
Memmove/1024-4                     1.81GB/s ± 2%
Memmove/2048-4                     1.91GB/s ± 1%
Memmove/4096-4                     1.94GB/s ± 1%
MemmoveOverlap/32-4                 485MB/s ± 5%
MemmoveOverlap/64-4                 694MB/s ± 6%
MemmoveOverlap/128-4                899MB/s ± 3%
MemmoveOverlap/256-4               1.06GB/s ± 3%
MemmoveOverlap/512-4               1.18GB/s ± 2%
MemmoveOverlap/1024-4              1.24GB/s ± 1%
MemmoveOverlap/2048-4              1.28GB/s ± 1%
MemmoveOverlap/4096-4              1.30GB/s ± 1%
MemmoveUnalignedDst/0-4
MemmoveUnalignedDst/1-4            31.6MB/s ± 5%
MemmoveUnalignedDst/2-4            54.1MB/s ±12%
MemmoveUnalignedDst/3-4            66.2MB/s ±10%
MemmoveUnalignedDst/4-4            79.0MB/s ± 7%
MemmoveUnalignedDst/5-4            95.3MB/s ± 5%
MemmoveUnalignedDst/6-4             104MB/s ± 7%
MemmoveUnalignedDst/7-4             115MB/s ± 5%
MemmoveUnalignedDst/8-4            11.9MB/s ± 1%
MemmoveUnalignedDst/9-4            13.2MB/s ± 1%
MemmoveUnalignedDst/10-4           14.5MB/s ± 2%
MemmoveUnalignedDst/11-4           16.0MB/s ± 0%
MemmoveUnalignedDst/12-4           17.3MB/s ± 1%
MemmoveUnalignedDst/13-4           18.7MB/s ± 1%
MemmoveUnalignedDst/14-4           20.0MB/s ± 0%
MemmoveUnalignedDst/15-4           21.3MB/s ± 0%
MemmoveUnalignedDst/16-4           12.2MB/s ± 2%
MemmoveUnalignedDst/32-4           12.5MB/s ± 1%
MemmoveUnalignedDst/64-4           12.6MB/s ± 1%
MemmoveUnalignedDst/128-4          12.7MB/s ± 0%
MemmoveUnalignedDst/256-4          12.8MB/s ± 0%
MemmoveUnalignedDst/512-4          12.8MB/s ± 1%
MemmoveUnalignedDst/1024-4         12.8MB/s ± 0%
MemmoveUnalignedDst/2048-4         12.8MB/s ± 1%
MemmoveUnalignedDst/4096-4         12.8MB/s ± 1%
MemmoveUnalignedDstOverlap/32-4    16.2MB/s ± 1%
MemmoveUnalignedDstOverlap/64-4    14.3MB/s ± 0%
MemmoveUnalignedDstOverlap/128-4   13.5MB/s ± 1%
MemmoveUnalignedDstOverlap/256-4   13.2MB/s ± 0%
MemmoveUnalignedDstOverlap/512-4   13.0MB/s ± 0%
MemmoveUnalignedDstOverlap/1024-4  12.9MB/s ± 1%
MemmoveUnalignedDstOverlap/2048-4  12.9MB/s ± 0%
MemmoveUnalignedDstOverlap/4096-4  12.9MB/s ± 0%
MemmoveUnalignedSrc/0-4
MemmoveUnalignedSrc/1-4            30.2MB/s ±10%
MemmoveUnalignedSrc/2-4            54.8MB/s ±15%
MemmoveUnalignedSrc/3-4            66.5MB/s ± 5%
MemmoveUnalignedSrc/4-4            75.5MB/s ± 7%
MemmoveUnalignedSrc/5-4            92.0MB/s ± 6%
MemmoveUnalignedSrc/6-4             100MB/s ± 4%
MemmoveUnalignedSrc/7-4             115MB/s ± 3%
MemmoveUnalignedSrc/8-4             110MB/s ± 4%
MemmoveUnalignedSrc/9-4             114MB/s ± 5%
MemmoveUnalignedSrc/10-4            116MB/s ± 5%
MemmoveUnalignedSrc/11-4            124MB/s ± 4%
MemmoveUnalignedSrc/12-4            127MB/s ± 3%
MemmoveUnalignedSrc/13-4            133MB/s ± 5%
MemmoveUnalignedSrc/14-4            144MB/s ± 4%
MemmoveUnalignedSrc/15-4           21.5MB/s ± 0%
MemmoveUnalignedSrc/16-4           22.4MB/s ± 2%
MemmoveUnalignedSrc/32-4           16.2MB/s ± 1%
MemmoveUnalignedSrc/64-4           14.3MB/s ± 1%
MemmoveUnalignedSrc/128-4          13.6MB/s ± 1%
MemmoveUnalignedSrc/256-4          13.1MB/s ± 1%
MemmoveUnalignedSrc/512-4          13.0MB/s ± 1%
MemmoveUnalignedSrc/1024-4         12.9MB/s ± 1%
MemmoveUnalignedSrc/2048-4         12.8MB/s ± 1%
MemmoveUnalignedSrc/4096-4         12.8MB/s ± 0%
MemmoveUnalignedSrcOverlap/32-4    12.5MB/s ± 0%
MemmoveUnalignedSrcOverlap/64-4    12.7MB/s ± 1%
MemmoveUnalignedSrcOverlap/128-4   12.8MB/s ± 0%
MemmoveUnalignedSrcOverlap/256-4   12.7MB/s ± 1%
MemmoveUnalignedSrcOverlap/512-4   12.8MB/s ± 1%
MemmoveUnalignedSrcOverlap/1024-4  12.8MB/s ± 1%
MemmoveUnalignedSrcOverlap/2048-4  12.8MB/s ± 0%
MemmoveUnalignedSrcOverlap/4096-4  12.8MB/s ± 1%
@mundaym mundaym added the arch-riscv Issues solely affecting the riscv64 architecture. label Sep 8, 2021
@mundaym mundaym added this to the Go1.18 milestone Sep 8, 2021
@mundaym mundaym self-assigned this Sep 8, 2021
@gopherbot
Copy link

Change https://golang.org/cl/348393 mentions this issue: runtime: use aligned loads and stores for memmove on riscv64

@mknyszek
Copy link
Contributor

Unfortunately I think this needs to wait for the next release at this point.

@mknyszek mknyszek modified the milestones: Go1.18, Go1.19 Nov 10, 2021
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 7, 2022
@aclements aclements modified the milestones: Go1.19, Go1.20 Jul 12, 2022
@aclements
Copy link
Member

cc @golang/riscv64

@mengzhuo
Copy link
Contributor

cc @golang/riscv64

This CL still in "merge conflicted"

cc @mundaym Could you take a look? Thanks.

@gopherbot
Copy link

Change https://go.dev/cl/426256 mentions this issue: runtime: optimise memmove on riscv64

@aclements
Copy link
Member

@mengzhuo, if you'd like to +2 CL 426256, we can probably get the extra +1 quickly and go ahead and land it.

@randall77
Copy link
Contributor

Someone just tripped over this bug over on golang-dev. They are probably using 1.19 and thus didn't pick up the fix for this bug, but worth watching just in case:
https://groups.google.com/g/golang-dev/c/5Om3lJcYxzA/m/UyhzduXTBgAJ?utm_medium=email&utm_source=footer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-riscv Issues solely affecting the riscv64 architecture. compiler/runtime Issues related to the Go compiler and/or runtime. Performance
Projects
Development

No branches or pull requests

6 participants