New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/internal/obj: pcsp table entries wrong at CALL morestack #13346
Comments
Your gofmt.after3 look like a win on all counts. But I suspect this is something to tackle on the dev.ssa branch. |
I thought dev.ssa is primarily about changing cmd/compile? All of the fixes I tested were entirely local to cmd/internal/obj/x86/obj6.go, which doesn't seem substantially altered in the dev.ssa branch. (Also, FWIW, my fear is gofmt.after3 could result in worse instruction caching, because the morestack-call instructions will appear in the middle of functions with multiple RETs, whereas at least with gofmt.after2 they'll stay out of the way with other unlikely/cold instructions. But performance predictions/measurements aren't my strong point.) |
Right, this prologue insertion happens late in the process, the SSA backend On Fri, Nov 20, 2015 at 2:47 PM, Matthew Dempsky notifications@github.com
|
Sorry for the misinformation on dev.ssa. I think that putting the morestack instructions after a RET instruction in the middle of the function is unlikely to matter for instruction caching. |
Ah, it looks like the "CALL morestack" instructions are at the end of the function due to golang.org/cl/10367, which was motivated by improving static branch prediction, not instruction caching like I'd guessed. I can rerun the benchmarks from that CL to see if there's any difference between before, after2, and after3. |
I ran {regexp, sort, go1} x {before, after2, after3} about 100 times over night on my otherwise idle workstation. Full results at benchmarks.tar.gz.txt. (The .txt extension is just to make github happy; it's just a .tar.gz.) Benchstat summaries at https://gist.github.com/mdempsky/181b45d0bb41b3c00300. However, I'm not sure much can be meaningfully inferred from these results as there's so much noise. The error bars are almost all around ±15%, which is much higher than the ±1% @josharian saw when benchmarking golang.org/cl/10367. :-/ |
I am surprised the pcsp info is wrong. I thought we'd checked that. But we certainly need to fix it for Go 1.6. Any of your solutions sound fine, @mdempsky. |
Since it's nearing 1.6, I propose simply leaving the instruction order alone and fixing the PCSP table entries as the least intrusive / most conservative fix. I'll revisit moving the CALL instructions as an optimization for 1.7. |
CL https://golang.org/cl/18154 mentions this issue. |
On 386, amd64, arm, and arm64, stacksplit appends a "CALL morestack; JMP f(SB)" call sequence to the end of f, but the pcsp table entries at the end of f default to f's full frame size rather than 0.
I think this can cause a problem in the following situation on x86:
I briefly explored three ways to fix this and measured the file size impact on cmd/gofmt on linux/amd64:
gofmt.after1 is leaving the generated instruction sequence alone, but setting Spadj accordingly so extra pcsp table entries will be generated for the morestack calls.
gofmt.after2 is to insert the "CALL morestack; JMP f(SB)" call sequence after the last RET instruction in a function, so that it gets to share a pcsp table entry with the RET instruction. (And for functions with no RET instructions, to continue appending to the end like in gofmt.after1.)
gofmt.after3 is the same as gofmt.after2, except inserting after the first RET instruction. The additional text savings are from more opportunities to use the 2-byte near-JMP instruction encoding for "JMP f(SB)" instead of the 5-byte encoding.
The text was updated successfully, but these errors were encountered: