New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: understand Go vs assembly rc4 results #27184
Comments
Again on linux/amd64 I see the following result for 1.10, 1.11rc2 and tip.
Taking the the Macbook Pro (Intel(R) Core(TM) i5-6267U CPU @ 2.90GHz stats of CL-102255 into account it seems like the performance is reduced only on server chips (Xeon)... |
It was suggested to me in private that the assembly might be penalized by a slowdown in accessing xmm registers.
Also see the replies to However, I still find confusing that the compiled code, which should not be doing anything special, is faster on the 1.2GHz CPU than it is on the 2.3GHz one. |
The 1.2 GHz processor can turbo to 3.2 GHz. |
Just to add another data point... I tried this on my work machine and got similar numbers to Brad's.
|
Looks like there are at least 2 issues:
|
I think, I have a hypothesis, explaining code running much faster on skylake. Perf shows 1,466,664,692 resource_stalls.rs events, which means that for 1.4 *10^9 cycles reservation station couldn't accept uops. Skylake has 50% higher reservation station capacity, which should allow to interleave more iterations. However I don't think that this explanation provides actionable advise (having less instruction and shorter dependency chains is already a goal) |
Punting to unplanned, too late for anything major in 1.12. |
Tracking bug so somebody (@randall77?) looks into why we got totally opposite performance numbers on different Intel CPUs when we deleted the crypto/rc4 code's assembly in "favor" of the Go version in 30eda67 (https://golang.org/cl/130397):
Super mysterious, so we might want to understand it enough to decide whether we care and whether there's something the compiler might do better for more CPUs.
Maybe the benchmarks or benchstat are wrong? But then that'd be its own interesting bug.
/cc @josharian @aead @FiloSottile
The text was updated successfully, but these errors were encountered: