New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: register allocation unnecessarily leaves reload in loop containing conditional call #22234
Comments
This issue affects adding preemption checks to loops; the conditional call models the preemption check, which suffers the same extra load in the loop, and if the loop is tight it matters. |
On
The logic there is if there's a call in the loop, we're going to have to load the value inside the loop anyway. No point in loading it before the loop starts as well. This logic kinda falls down when the call is inside an This isn't a problem for the pointer value, because it is modified during the loop. It has a phi at the top of the loop that causes it to be allocated to a register. |
I'm pretty sure that there's code (or there was code in some CL) that computed whether a loop contained a call-free path. I could see if I could find that somewhere. |
That would be great if that code existed somewhere. I think just replacing |
Do we need to know the blocks that are in the call-free paths? I couldn't find the code, but I'm pretty sure I can reconstitute it. |
Nope. We only need the info on block entry. It would be enough to modify the semantics of the containsCall boolean. I don't think it is used anywhere else. |
Changed milestone because this was motivated by the preemptible loop changes, which are not appearing in 1.10. |
I am not working on that right now, I attempted something general and
didn't get it working.
Works for innermost loops is probably all we really need to get most of the
benefit.
…On Thu, Nov 30, 2017 at 3:28 PM, Ilya Tocar ***@***.***> wrote:
@dr2chase <https://github.com/dr2chase> are you working on containsCall
changes?
I've made a quick and dirty prototype to fix #22698
<#22698> and I'd like to avoid effort
duplication.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#22234 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AB1vJ_SCSHX9S7u2ewlwKH2nlPlYUGyrks5s7w_zgaJpZM4P3STC>
.
|
Change https://golang.org/cl/84055 mentions this issue: |
I think Ilya's CL is the way forward. It is mostly there. If he doesn't have time to get to it, I am happy to look into it in a few weeks if that delay is okay. Right now I am working on debugging stuff. |
No hurry, just want to make sure that something makes it into 1.11. We've seen enough incarnations of this issue that we should fix it. |
I was planing to work on it last week, however I'm currently having problems[1] with my google account, once they are resolved I will push updated version. 1: "ilya.tocar@intel.com has already been registered under a different account accessible by: |
@andybons, can you look into Ilya's Gerrit account problems? |
I think my account was deleted and now I have new account with the same email. Now I see 2 ilya.tocar@intel.com accounts at https://go-review.googlesource.com old one with all CL that I can't login into and new one that can login into, but can't +2/change my old CLs |
On it. Filed internal bug. |
@TocarIP OK you should be able to log in now. |
@andybons Thanks, it works now! |
Currently we don't lift spill out of loop if loop contains call. However often we have code like this: for .. { if hard_case { call() } // simple case, without call } So instead of checking for any call, check for unavoidable call. For #22698 cases I see: mime/quotedprintable/Writer-6 10.9µs ± 4% 9.2µs ± 3% -15.02% (p=0.000 n=8+8) And: compress/flate/Encode/Twain/Huffman/1e4-6 99.4µs ± 6% 90.9µs ± 0% -8.57% (p=0.000 n=8+8) compress/flate/Encode/Twain/Huffman/1e5-6 760µs ± 1% 725µs ± 1% -4.56% (p=0.000 n=8+8) compress/flate/Encode/Twain/Huffman/1e6-6 7.55ms ± 0% 7.24ms ± 0% -4.07% (p=0.000 n=8+7) There are no significant changes on go1 benchmarks. But for cases with runtime arch checks, where we call generic version on old hardware, there are respectable performance gains: math/RoundToEven-6 1.43ns ± 0% 1.25ns ± 0% -12.59% (p=0.001 n=7+7) math/bits/OnesCount64-6 1.60ns ± 1% 1.42ns ± 1% -11.32% (p=0.000 n=8+8) Also on some runtime benchmarks loops have less loads and higher performance: runtime/RuneIterate/range1/ASCII-6 15.6ns ± 1% 13.9ns ± 1% -10.74% (p=0.000 n=7+8) runtime/ArrayEqual-6 3.22ns ± 0% 2.86ns ± 2% -11.06% (p=0.000 n=7+8) Fixes #22698 Updates #22234 Change-Id: I0ae2f19787d07a9026f064366dedbe601bf7257a Reviewed-on: https://go-review.googlesource.com/84055 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
@TocarIP @randall77 should this still be a release blocker for 1.11? |
This is fixed by Ilya's CL of March 20. |
Please answer these questions before submitting your issue. Thanks!
What version of Go are you using (
go version
)?go1.10devel
What operating system and processor architecture are you using (
go env
)?amd64
What did you do?
Compiled various simple loops that include "if flag { call(); }" (and that is the only call in the loop)
What did you expect to see?
All spills and reloads grouped around the call and not in the common code path of the loop.
What did you see instead?
Half-right -- the reload occurs under the conditional, but it is not properly hoisted out of the loop so there is also a reload in the common code path
Sample code and compiled assembly follow:
The text was updated successfully, but these errors were encountered: