Skip to content

runtime: memmove executes unaligned accesses on riscv64 #48248

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mundaym opened this issue Sep 8, 2021 · 7 comments
Closed

runtime: memmove executes unaligned accesses on riscv64 #48248

mundaym opened this issue Sep 8, 2021 · 7 comments
Assignees
Labels
arch-riscv Issues solely affecting the riscv64 architecture. compiler/runtime Issues related to the Go compiler and/or runtime. FrozenDueToAge Performance
Milestone

Comments

@mundaym
Copy link
Member

mundaym commented Sep 8, 2021

The performance of memmove when copying more than ~16 bytes of unaligned data is very poor on the HiFive Unmatched. Looking at the code it only attempts to align the source operand before using word-sized load and store operations. This means that stores to the destination operand will be unaligned. On the HiFive Unmatched unaligned accesses result in a trap that is handled by the kernel and so performance is extremely poor (~10x slower than performing a byte-by-byte copy).

Benchmarks:

name                               speed
Memmove/0-4
Memmove/1-4                        37.0MB/s ± 5%
Memmove/2-4                        55.6MB/s ± 8%
Memmove/3-4                        67.2MB/s ± 5%
Memmove/4-4                        85.5MB/s ± 6%
Memmove/5-4                        98.1MB/s ± 6%
Memmove/6-4                         112MB/s ± 3%
Memmove/7-4                         127MB/s ± 4%
Memmove/8-4                         241MB/s ± 3%
Memmove/9-4                         229MB/s ± 7%
Memmove/10-4                        230MB/s ± 9%
Memmove/11-4                        198MB/s ± 5%
Memmove/12-4                        202MB/s ± 3%
Memmove/13-4                        206MB/s ± 4%
Memmove/14-4                        212MB/s ± 3%
Memmove/15-4                        213MB/s ± 6%
Memmove/16-4                        407MB/s ± 4%
Memmove/32-4                        577MB/s ± 3%
Memmove/64-4                        890MB/s ± 4%
Memmove/128-4                      1.28GB/s ± 6%
Memmove/256-4                      1.52GB/s ± 5%
Memmove/512-4                      1.67GB/s ± 2%
Memmove/1024-4                     1.81GB/s ± 2%
Memmove/2048-4                     1.91GB/s ± 1%
Memmove/4096-4                     1.94GB/s ± 1%
MemmoveOverlap/32-4                 485MB/s ± 5%
MemmoveOverlap/64-4                 694MB/s ± 6%
MemmoveOverlap/128-4                899MB/s ± 3%
MemmoveOverlap/256-4               1.06GB/s ± 3%
MemmoveOverlap/512-4               1.18GB/s ± 2%
MemmoveOverlap/1024-4              1.24GB/s ± 1%
MemmoveOverlap/2048-4              1.28GB/s ± 1%
MemmoveOverlap/4096-4              1.30GB/s ± 1%
MemmoveUnalignedDst/0-4
MemmoveUnalignedDst/1-4            31.6MB/s ± 5%
MemmoveUnalignedDst/2-4            54.1MB/s ±12%
MemmoveUnalignedDst/3-4            66.2MB/s ±10%
MemmoveUnalignedDst/4-4            79.0MB/s ± 7%
MemmoveUnalignedDst/5-4            95.3MB/s ± 5%
MemmoveUnalignedDst/6-4             104MB/s ± 7%
MemmoveUnalignedDst/7-4             115MB/s ± 5%
MemmoveUnalignedDst/8-4            11.9MB/s ± 1%
MemmoveUnalignedDst/9-4            13.2MB/s ± 1%
MemmoveUnalignedDst/10-4           14.5MB/s ± 2%
MemmoveUnalignedDst/11-4           16.0MB/s ± 0%
MemmoveUnalignedDst/12-4           17.3MB/s ± 1%
MemmoveUnalignedDst/13-4           18.7MB/s ± 1%
MemmoveUnalignedDst/14-4           20.0MB/s ± 0%
MemmoveUnalignedDst/15-4           21.3MB/s ± 0%
MemmoveUnalignedDst/16-4           12.2MB/s ± 2%
MemmoveUnalignedDst/32-4           12.5MB/s ± 1%
MemmoveUnalignedDst/64-4           12.6MB/s ± 1%
MemmoveUnalignedDst/128-4          12.7MB/s ± 0%
MemmoveUnalignedDst/256-4          12.8MB/s ± 0%
MemmoveUnalignedDst/512-4          12.8MB/s ± 1%
MemmoveUnalignedDst/1024-4         12.8MB/s ± 0%
MemmoveUnalignedDst/2048-4         12.8MB/s ± 1%
MemmoveUnalignedDst/4096-4         12.8MB/s ± 1%
MemmoveUnalignedDstOverlap/32-4    16.2MB/s ± 1%
MemmoveUnalignedDstOverlap/64-4    14.3MB/s ± 0%
MemmoveUnalignedDstOverlap/128-4   13.5MB/s ± 1%
MemmoveUnalignedDstOverlap/256-4   13.2MB/s ± 0%
MemmoveUnalignedDstOverlap/512-4   13.0MB/s ± 0%
MemmoveUnalignedDstOverlap/1024-4  12.9MB/s ± 1%
MemmoveUnalignedDstOverlap/2048-4  12.9MB/s ± 0%
MemmoveUnalignedDstOverlap/4096-4  12.9MB/s ± 0%
MemmoveUnalignedSrc/0-4
MemmoveUnalignedSrc/1-4            30.2MB/s ±10%
MemmoveUnalignedSrc/2-4            54.8MB/s ±15%
MemmoveUnalignedSrc/3-4            66.5MB/s ± 5%
MemmoveUnalignedSrc/4-4            75.5MB/s ± 7%
MemmoveUnalignedSrc/5-4            92.0MB/s ± 6%
MemmoveUnalignedSrc/6-4             100MB/s ± 4%
MemmoveUnalignedSrc/7-4             115MB/s ± 3%
MemmoveUnalignedSrc/8-4             110MB/s ± 4%
MemmoveUnalignedSrc/9-4             114MB/s ± 5%
MemmoveUnalignedSrc/10-4            116MB/s ± 5%
MemmoveUnalignedSrc/11-4            124MB/s ± 4%
MemmoveUnalignedSrc/12-4            127MB/s ± 3%
MemmoveUnalignedSrc/13-4            133MB/s ± 5%
MemmoveUnalignedSrc/14-4            144MB/s ± 4%
MemmoveUnalignedSrc/15-4           21.5MB/s ± 0%
MemmoveUnalignedSrc/16-4           22.4MB/s ± 2%
MemmoveUnalignedSrc/32-4           16.2MB/s ± 1%
MemmoveUnalignedSrc/64-4           14.3MB/s ± 1%
MemmoveUnalignedSrc/128-4          13.6MB/s ± 1%
MemmoveUnalignedSrc/256-4          13.1MB/s ± 1%
MemmoveUnalignedSrc/512-4          13.0MB/s ± 1%
MemmoveUnalignedSrc/1024-4         12.9MB/s ± 1%
MemmoveUnalignedSrc/2048-4         12.8MB/s ± 1%
MemmoveUnalignedSrc/4096-4         12.8MB/s ± 0%
MemmoveUnalignedSrcOverlap/32-4    12.5MB/s ± 0%
MemmoveUnalignedSrcOverlap/64-4    12.7MB/s ± 1%
MemmoveUnalignedSrcOverlap/128-4   12.8MB/s ± 0%
MemmoveUnalignedSrcOverlap/256-4   12.7MB/s ± 1%
MemmoveUnalignedSrcOverlap/512-4   12.8MB/s ± 1%
MemmoveUnalignedSrcOverlap/1024-4  12.8MB/s ± 1%
MemmoveUnalignedSrcOverlap/2048-4  12.8MB/s ± 0%
MemmoveUnalignedSrcOverlap/4096-4  12.8MB/s ± 1%
@mundaym mundaym added the arch-riscv Issues solely affecting the riscv64 architecture. label Sep 8, 2021
@mundaym mundaym added this to the Go1.18 milestone Sep 8, 2021
@mundaym mundaym self-assigned this Sep 8, 2021
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/348393 mentions this issue: runtime: use aligned loads and stores for memmove on riscv64

@mknyszek
Copy link
Contributor

Unfortunately I think this needs to wait for the next release at this point.

@mknyszek mknyszek modified the milestones: Go1.18, Go1.19 Nov 10, 2021
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 7, 2022
@aclements aclements modified the milestones: Go1.19, Go1.20 Jul 12, 2022
@aclements
Copy link
Member

cc @golang/riscv64

@mengzhuo
Copy link
Contributor

cc @golang/riscv64

This CL still in "merge conflicted"

cc @mundaym Could you take a look? Thanks.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/426256 mentions this issue: runtime: optimise memmove on riscv64

@aclements
Copy link
Member

@mengzhuo, if you'd like to +2 CL 426256, we can probably get the extra +1 quickly and go ahead and land it.

Repository owner moved this from Triage Backlog to Done in Go Compiler / Runtime Nov 18, 2022
@gopherbot gopherbot moved this to Done in Release Blockers Nov 18, 2022
@randall77
Copy link
Contributor

Someone just tripped over this bug over on golang-dev. They are probably using 1.19 and thus didn't pick up the fix for this bug, but worth watching just in case:
https://groups.google.com/g/golang-dev/c/5Om3lJcYxzA/m/UyhzduXTBgAJ?utm_medium=email&utm_source=footer

@golang golang locked and limited conversation to collaborators Jul 29, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-riscv Issues solely affecting the riscv64 architecture. compiler/runtime Issues related to the Go compiler and/or runtime. FrozenDueToAge Performance
Projects
Status: Done
Development

No branches or pull requests

6 participants