Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: get rid of SSA call block type #15631

Closed
randall77 opened this issue May 10, 2016 · 10 comments
Closed

cmd/compile: get rid of SSA call block type #15631

randall77 opened this issue May 10, 2016 · 10 comments

Comments

@randall77
Copy link
Contributor

The SSA Call block is a vestige of when we had explicit panic edges in the call graph. Now that panic edges are implicit, we don't really need a call block - call Values take a memory and produce a memory, and that is enough to get all the memory ordering correct (even if the call panics).

@randall77 randall77 self-assigned this May 10, 2016
@randall77 randall77 added this to the Go1.8 milestone May 10, 2016
@dr2chase
Copy link
Contributor

This means one big block full of calls instead of many little CallBlocks?

Another thing Quentin just noticed (looking at giant machine-generated protobuf files) is that memory barrier ops introduce branches which generally causes sadness around phi location and generation; if we solve the rewrite-to-multiple Blocks issue, this might allow us to do general phi placement first on much larger blocks, then expand these afterwards. One minor issue is the cost of splitting variables between Blocks and reassigning them; we'd want to do this in an order that avoids reassigning to new blocks variables as we split blocks again and again at write barriers.

@randall77
Copy link
Contributor Author

This means one big block full of calls instead of many little CallBlocks?

Right.

@dr2chase
Copy link
Contributor

I did a little research on #15537, using the improved-version of sparse (that generates the cleanest code, hence slightly less repulsive to humans, and a svelte 8.8MB dump after the build phase). Crudely, after building SSA, the largest function Theproto3Description has these stats:

222053 lines
first block defines 18190 Values.
35952 Blocks (11222 Call, 11476 Plain, 336 Exit, 6996 Check)
150146 Values (6054 Phi, 36483 Copy, 23976 Copy)
highest numbered Value is 150148

After early deadcode:
183521 lines
first block defines 18190 Values.
35441 Blocks (11222 Call, 10966 Plain, 336 Exit, 6996 Check)
112638 Values (5544 Phi)

After flagalloc (note blocks added to remove critical edges):
134496 lines
28927 blocks (256 are for critical edges, 1255 Check)
76642 values (5483 Phi, I think 49384 are
highest numbered Value is 149634

Merging Call blocks with their successors should cut the number of blocks by about 1/3, and also cut the number of Copy nodes out of the phi assignment phase. This seems like a good thing.

Given how costly register allocation is in space and time, would it make sense to run a Value compaction pass just before?

@randall77
Copy link
Contributor Author

Yes, if value number compaction would help we can do that.

@josharian
Copy link
Contributor

I think value number compaction will help a lot with our trouble case. Normal ID utilization at entry to regalloc is normally 30–60%; for the trouble case, it is 0.6%, and that translates into giant allocs. I started on a number compaction optimization a couple of days ago but ran into some bumps (free lists, constant caches) and decided to see whether I could avoid generating so many values to begin with. I found a few minor optimizations (worth a couple of percent of total allocs) but nothing worth mailing during the freeze. I have one idea left to try before moving on, will know about that soon. If no one has started on it, I'm happy to finish off ID compaction. Please let me know one way or the other.

@dr2chase
Copy link
Contributor

Value number compaction looks like it might be worth doing.
These numbers are approximate (minor version skew, etc)
but we're talking about gigabytes and minutes of difference:

tip:
real 10m36.569s
user 25m52.286s
sys 4m3.696s
8278.85MB

tip+sparse:
real 4m35.200s
user 13m16.644s
sys 0m36.712s
3427.73MB

tip+sparse+tweak_for_1.8:
real 3m23.623s
user 9m40.168s
sys 0m19.299s
2434.28MB

tip+value_compact:
real 5m56.505s
user 14m46.326s
sys 2m8.673s
6190.47MB

tip+sparse+value_compact:
real 3m17.096s
user 9m16.985s
sys 0m25.464s
3092.89MB

tip+sparse+tweak_for_1.8+value_compact:
real 2m57.781s
user 8m36.943s
sys 0m14.986s
2443.76MB

Obviously, I have a value-compacter written.

It will mess with debugging in ssa.html, so we might only want to enable it when the number of values is very large, or perhaps if we can determine ahead of time that it will win big.

Note that it still helps a little even with tip+sparse+tweak_for_1.8, but that tweak leaves the values alone. The tweaked version avoids creating a whole lot of phi functions that then gets eliminated; Call merging will avoid creating a whole bunch of Copy that then gets eliminated. Long run, value compaction might not be that helpful, but in the short run it looks safe and effective.

@josharian
Copy link
Contributor

Looks like we are entirely on the same page. Will you go ahead and mail your ID compactor so that I can take a look? We should be able to cheaply decide to only compact if f.NumValues > some threshold and % utilization < some threshold, but it might always be worth doing, at least until other improvements obviate it. And yes, we will definitely want some way to disable it entirely via a flag for debugging.

Given the subtlety of sparse phi and other mucking around, it's possible we should just do ID compaction for 1.8; not sure it that gets us far enough.

@dr2chase
Copy link
Contributor

I'm going to make a CL of it as soon as it passes tests locally. I don't think the sparse phi stuff is that subtle; it's tree numbering games for the dominators and balanced-tree data structures, and it helps avoid renumbering completely. If there's places it needs more explanation, I can work on that.

@josharian
Copy link
Contributor

With regard to the original content of this issue (getting rid of SSA Call block type), SGTM. I have a handful of optimizations sitting in my local tree that will be impacted by it, so ideally this would happen early in the 1.8 cycle, so that I can rebase and remeasure.

@gopherbot
Copy link

CL https://golang.org/cl/28950 mentions this issue.

@golang golang locked and limited conversation to collaborators Sep 12, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants