New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: loop optimization #24240
Comments
Since you have the code handy, can you provide a working example that includes a benchmark? |
i can't share that particular code...but i can and will build a test to
share
…On Sun, Mar 4, 2018 at 12:24 PM, Giovanni Bajo ***@***.***> wrote:
Since you have the code handy, can you provide a working example that
includes a benchmark?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#24240 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AHgypbiFJkzX-7c-COnX95uFPgAafJqXks5tbE1jgaJpZM4SbYI5>
.
--
Michael T. Jones
michael.jones@gmail.com
|
When n is negated, if that's the style chosen to re-write the loop, then look out for n starting off as the smallest possible value as two's complement representation is asymmetrical. |
Isn't the decrement of i followed by a jz (or jnz) another way to write this. |
Writing assembly for various architectures, a pre-decrement test against zero is the normal way I'm used to seeing, yes. But I wanted to warn against a flaw in the OP's suggestion. |
thank you for the advice.
…On Sun, Mar 4, 2018 at 2:41 PM, Ralph Corderoy ***@***.***> wrote:
Writing assembly for various architectures, a pre-decrement test against
zero is the normal way I'm used to seeing, yes. But I wanted to warn
against a flaw in the OP's suggestion.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#24240 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AHgypXq7fJVfP__QGo8lu4smKk9MLp6Fks5tbG18gaJpZM4SbYI5>
.
--
Michael T. Jones
michael.jones@gmail.com
|
Something else to ponder in implementing this... I hear some folk use a debugger; might they try and inspect the loop's variable and be confused that it's changed sign, or running in the opposite direction? Or perhaps DWARF can represent the transformation for the debugger to reverse? If other compilers have implemented this for other languages then they might give ideas. |
OTOH, stepping through C code that's been built with |
Ok, here is a simplified and extracted example. I called it "loop_test.go" https://play.golang.org/p/uRdymTPtTCi Alas, i don't see such a big difference in this case. The code is not the same, not so much register pressure. Summary: maybe this is not so important. |
I think it's still worth trying to measure gains across more tests, especially as some platforms have more register pressure than others. But I've also a niggling recollection that this has been suggested before and turned down because of debugger confusion, probably by @rsc. :-) |
Please answer these questions before submitting your issue. Thanks!
What version of Go are you using (
go version
)?master (1.11 dev)
Does this issue reproduce with the latest release?
this is a compilation strategy proposal
What operating system and processor architecture are you using (
go env
)?amd64 / Darwin (macbook pro)
What did you do?
I have code that iterates without referencing the loop variable. Changing that from:
for i := 0; i < n; i++ {
// no access to i
}
...to...
for i := -n; i < 0; i++ {
// no access to i
}
saves me 4%-5% in my test cases.
Looking at the .S files, I see that there are two gains (as expected): one is that the loop limit needs only be addressable and referenced at the start of the loop--which frees a register over the body of the loop; and also, because the test is against zero and there are dedicated instructions for that rather than loading immediate literals or otherwise.
It seems (from afar) that there could be a rule along the lines of:
if loop variable not referenced in body
and assignment has block structure (i:=)
and type of loop variable is signed
and incr RHS does not depend on additive offset to variable (i++ ok, i*=2 not),
then transform "for v := low; v OP high; incr" into "for v := low-high; v OP 0; incr"
I don't have any comprehensive data about how common this situation may be, but it does help.
What did you expect to see?
no change because i expected the compiler to be doing this.
What did you see instead?
5% speedup
The text was updated successfully, but these errors were encountered: