Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/internal/obj: pcsp table entries wrong at CALL morestack #13346

Closed
mdempsky opened this issue Nov 20, 2015 · 9 comments
Closed

cmd/internal/obj: pcsp table entries wrong at CALL morestack #13346

mdempsky opened this issue Nov 20, 2015 · 9 comments
Milestone

Comments

@mdempsky
Copy link
Member

On 386, amd64, arm, and arm64, stacksplit appends a "CALL morestack; JMP f(SB)" call sequence to the end of f, but the pcsp table entries at the end of f default to f's full frame size rather than 0.

I think this can cause a problem in the following situation on x86:

  1. The user sets a breakpoint on the "JMP f(SB)" instruction.
  2. That causes rewindmorestack to not rewind the PC, so gp.sched.pc will continue to correspond to the bad pcsp entry.
  3. Within copystack, the call to gentraceback(^uintptr(0), ^uintptr(0), 0, gp, ...) will use gp.sched.pc and gp.sched.sp to walk the stack.
  4. But because of the bad pcsp entry, it'll misinterpret the stack frames.

I briefly explored three ways to fix this and measured the file size impact on cmd/gofmt on linux/amd64:

   text    data     bss     dec     hex filename
2687573   74416  129816 2891805  2c201d gofmt.before
2688597   74416  129816 2892829  2c241d gofmt.after1
2686430   74416  129816 2890662  2c1ba6 gofmt.after2
2684162   74416  129816 2888394  2c12ca gofmt.after3

gofmt.after1 is leaving the generated instruction sequence alone, but setting Spadj accordingly so extra pcsp table entries will be generated for the morestack calls.

gofmt.after2 is to insert the "CALL morestack; JMP f(SB)" call sequence after the last RET instruction in a function, so that it gets to share a pcsp table entry with the RET instruction. (And for functions with no RET instructions, to continue appending to the end like in gofmt.after1.)

gofmt.after3 is the same as gofmt.after2, except inserting after the first RET instruction. The additional text savings are from more opportunities to use the 2-byte near-JMP instruction encoding for "JMP f(SB)" instead of the 5-byte encoding.

@ianlancetaylor
Copy link
Contributor

Your gofmt.after3 look like a win on all counts. But I suspect this is something to tackle on the dev.ssa branch.

@mdempsky
Copy link
Member Author

I thought dev.ssa is primarily about changing cmd/compile? All of the fixes I tested were entirely local to cmd/internal/obj/x86/obj6.go, which doesn't seem substantially altered in the dev.ssa branch.

(Also, FWIW, my fear is gofmt.after3 could result in worse instruction caching, because the morestack-call instructions will appear in the middle of functions with multiple RETs, whereas at least with gofmt.after2 they'll stay out of the way with other unlikely/cold instructions. But performance predictions/measurements aren't my strong point.)

@randall77
Copy link
Contributor

Right, this prologue insertion happens late in the process, the SSA backend
doesn't see it.

On Fri, Nov 20, 2015 at 2:47 PM, Matthew Dempsky notifications@github.com
wrote:

I thought dev.ssa is primarily about changing cmd/compile? All of the
fixes I tested were entirely local to cmd/internal/obj/x86/obj6.go, which
doesn't seem substantially altered in the dev.ssa branch.

(Also, FWIW, my fear is gofmt.after3 could result in worse instruction
caching, because the morestack-call instructions will appear in the middle
of functions with multiple RETs, whereas at least with gofmt.after2 they'll
stay out of the way with other unlikely/cold instructions. But performance
predictions/measurements isn't my strong point.)


Reply to this email directly or view it on GitHub
#13346 (comment).

@ianlancetaylor
Copy link
Contributor

Sorry for the misinformation on dev.ssa.

I think that putting the morestack instructions after a RET instruction in the middle of the function is unlikely to matter for instruction caching.

@mdempsky
Copy link
Member Author

Ah, it looks like the "CALL morestack" instructions are at the end of the function due to golang.org/cl/10367, which was motivated by improving static branch prediction, not instruction caching like I'd guessed.

I can rerun the benchmarks from that CL to see if there's any difference between before, after2, and after3.

@mdempsky
Copy link
Member Author

I ran {regexp, sort, go1} x {before, after2, after3} about 100 times over night on my otherwise idle workstation. Full results at benchmarks.tar.gz.txt. (The .txt extension is just to make github happy; it's just a .tar.gz.) Benchstat summaries at https://gist.github.com/mdempsky/181b45d0bb41b3c00300.

However, I'm not sure much can be meaningfully inferred from these results as there's so much noise. The error bars are almost all around ±15%, which is much higher than the ±1% @josharian saw when benchmarking golang.org/cl/10367. :-/

@rsc rsc added this to the Go1.6 milestone Dec 28, 2015
@rsc
Copy link
Contributor

rsc commented Dec 28, 2015

I am surprised the pcsp info is wrong. I thought we'd checked that. But we certainly need to fix it for Go 1.6. Any of your solutions sound fine, @mdempsky.

@mdempsky mdempsky self-assigned this Dec 30, 2015
@mdempsky
Copy link
Member Author

Since it's nearing 1.6, I propose simply leaving the instruction order alone and fixing the PCSP table entries as the least intrusive / most conservative fix. I'll revisit moving the CALL instructions as an optimization for 1.7.

@gopherbot
Copy link

CL https://golang.org/cl/18154 mentions this issue.

@golang golang locked and limited conversation to collaborators Jan 4, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants