You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In memcpy, we separated the 3 and 4 byte cases on amd64 to allow load/store forwarding. Should we do the same in memclrNoHeapPointers? Should we generally audit the two routines to make sure they're in sync as needed, and cross-reference in docs?
I don't think so. I've tried recreating Issue18740 for memclr, but it proved to be quite difficult, memory is either cleared in bulk before allocation or memclr is inlined into single MOVL. More generally speaking it makes sense to have different implementation of memmove and memclr (at least for larger sizes), due to different memory access pattern. E. g. memmove will trash 2x cache and will stop fitting into uop cache earlier, because it needs 2x uops (extra read per each write)
In memcpy, we separated the 3 and 4 byte cases on amd64 to allow load/store forwarding. Should we do the same in memclrNoHeapPointers? Should we generally audit the two routines to make sure they're in sync as needed, and cross-reference in docs?
Somewhat related: #23306
cc @randall77 @TocarIP
The text was updated successfully, but these errors were encountered: