-
Notifications
You must be signed in to change notification settings - Fork 18k
runtime: tune appendCrossover for 386 and arm #4963
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Labels
Milestone
Comments
Benchmarks on ARM (ODROID-X): things got same or worse in all cases benchmark old ns/op new ns/op delta BenchmarkAppend 394 394 +0.00% BenchmarkAppend1Byte 48 46 -5.14% BenchmarkAppend4Bytes 55 54 -2.87% BenchmarkAppend8Bytes 55 64 +16.16% BenchmarkAppend16Bytes 58 85 +46.67% BenchmarkAppend32Bytes 57 60 +5.57% BenchmarkAppendStr1Byte 39 34 -13.89% BenchmarkAppendStr4Bytes 44 42 -4.76% BenchmarkAppendStr8Bytes 45 52 +14.91% BenchmarkAppendStr16Bytes 48 74 +51.64% BenchmarkAppendStr32Bytes 47 47 +1.27% BenchmarkAppendSpecialCase 276 282 +2.17% |
The results on 386 are not very interesting. Only AppendStr1Byte has significantly improved: benchmark old ns/op new ns/op delta BenchmarkAppend 225 226 +0.44% BenchmarkAppend1Byte 69 58 -15.42% BenchmarkAppend4Bytes 69 71 +3.75% BenchmarkAppend8Bytes 69 69 +0.29% BenchmarkAppend16Bytes 71 86 +21.12% BenchmarkAppend32Bytes 69 70 +2.17% BenchmarkAppendStr1Byte 63 37 -40.50% BenchmarkAppendStr4Bytes 63 47 -25.75% BenchmarkAppendStr8Bytes 63 52 -17.74% BenchmarkAppendStr16Bytes 64 72 +13.73% BenchmarkAppendStr32Bytes 63 65 +2.98% BenchmarkAppendSpecialCase 119 120 +0.84% |
Good point. Has that landed yet ? If so, i'll put myself down to redo the crossovers for all three archs. Labels changed: added go1.2, removed arch-arm. Owner changed to @davecheney. |
Here as an interesting result on linux/386 comparing a crossover of 16 (old) to 8 (new) 220887(~/go/src/pkg/runtime) % ~/go/misc/benchcmp {old,new}.txt benchmark old ns/op new ns/op delta BenchmarkAppend 219 218 -0.46% BenchmarkAppend1Byte 75 75 +0.00% BenchmarkAppend4Bytes 91 92 +0.11% BenchmarkAppend8Bytes 114 114 +0.00% BenchmarkAppend16Bytes 168 78 -53.51% BenchmarkAppend32Bytes 88 88 +0.00% BenchmarkAppendStr1Byte 57 57 +0.35% BenchmarkAppendStr4Bytes 72 73 +1.10% BenchmarkAppendStr8Bytes 92 113 +22.16% BenchmarkAppendStr16Bytes 141 63 -54.82% BenchmarkAppendStr32Bytes 75 76 +1.32% BenchmarkAppendSpecialCase 270 272 +0.74% |
tip vs append crossover logic removed, linux/386 benchmark old ns/op new ns/op delta BenchmarkAppend 218 218 +0.00% BenchmarkAppend1Byte 75 72 -3.44% BenchmarkAppend4Bytes 91 73 -19.80% BenchmarkAppend7Bytes 108 74 -31.30% BenchmarkAppend8Bytes 115 74 -35.48% BenchmarkAppend15Bytes 162 77 -52.22% BenchmarkAppend16Bytes 168 77 -53.99% BenchmarkAppend32Bytes 88 86 -2.16% BenchmarkAppendStr1Byte 57 57 -0.35% BenchmarkAppendStr4Bytes 72 72 -0.14% BenchmarkAppendStr8Bytes 92 92 -0.22% BenchmarkAppendStr16Bytes 141 141 +0.00% BenchmarkAppendStr32Bytes 75 75 -0.26% BenchmarkAppendSpecialCase 270 270 +0.00 |
I've been playing with this a bit more and have some results that show that removing append crossover for 386/amd64 is a net positive. linux/amd64 benchmark old ns/op new ns/op delta BenchmarkAppend 102 104 +1.96% BenchmarkAppend1Byte 10 11 +0.92% BenchmarkAppend4Bytes 15 11 -28.10% BenchmarkAppend7Bytes 17 12 -32.58% BenchmarkAppend8Bytes 18 12 -36.17% BenchmarkAppend15Bytes 24 11 -55.02% BenchmarkAppend16Bytes 25 11 -56.03% BenchmarkAppend32Bytes 11 12 +4.31% BenchmarkAppendStr1Byte 8 9 +13.99% BenchmarkAppendStr4Bytes 11 9 -17.52% BenchmarkAppendStr8Bytes 14 9 -35.70% BenchmarkAppendStr16Bytes 21 9 -55.19% BenchmarkAppendStr32Bytes 10 10 -5.66% BenchmarkAppendSpecialCase 49 52 +7.96% linux/386 benchmark old ns/op new ns/op delta BenchmarkAppend 218 218 +0.00% BenchmarkAppend1Byte 75 73 -3.31% BenchmarkAppend4Bytes 91 73 -19.91% BenchmarkAppend7Bytes 108 74 -31.20% BenchmarkAppend8Bytes 115 74 -35.39% BenchmarkAppend15Bytes 162 77 -52.22% BenchmarkAppend16Bytes 168 77 -53.93% BenchmarkAppend32Bytes 88 86 -1.93% BenchmarkAppendStr1Byte 57 59 +3.14% BenchmarkAppendStr4Bytes 72 59 -17.40% BenchmarkAppendStr8Bytes 92 60 -34.77% BenchmarkAppendStr16Bytes 141 63 -54.89% BenchmarkAppendStr32Bytes 75 74 -1.72% BenchmarkAppendSpecialCase 270 288 +6.67% There is a small regression for AppendStr1Byte and the special case case. However the same is not true for linux/arm which regresses as it lacks the powerful string move operations that intel has. It is possible that the append crossover should be reduced from 8 to 4. alarm(~/go/src/pkg/runtime) % ~/go/misc/benchcmp {old,new}.txt benchmark old ns/op new ns/op delta BenchmarkAppend 811 801 -1.23% BenchmarkAppend1Byte 100 103 +3.00% BenchmarkAppend4Bytes 117 145 +23.93% BenchmarkAppend8Bytes 139 119 -14.39% BenchmarkAppend16Bytes 130 124 -4.62% BenchmarkAppend32Bytes 125 145 +16.00% BenchmarkAppendStr1Byte 72 83 +16.39% BenchmarkAppendStr4Bytes 90 125 +38.58% BenchmarkAppendStr8Bytes 112 98 -11.88% BenchmarkAppendStr16Bytes 105 103 -1.90% BenchmarkAppendStr32Bytes 101 124 +22.77% BenchmarkAppendSpecialCase 624 622 -0.32% |
https://golang.org/cl/12440044/ Status changed to Started. |
This issue was closed by revision 8ce8adb. Status changed to Fixed. |
This issue was closed.
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
The text was updated successfully, but these errors were encountered: