cmd/compile: get rid of SSA call block type #15631

randall77 · 2016-05-10T16:36:31Z

The SSA Call block is a vestige of when we had explicit panic edges in the call graph. Now that panic edges are implicit, we don't really need a call block - call Values take a memory and produce a memory, and that is enough to get all the memory ordering correct (even if the call panics).

dr2chase · 2016-05-10T19:58:04Z

This means one big block full of calls instead of many little CallBlocks?

Another thing Quentin just noticed (looking at giant machine-generated protobuf files) is that memory barrier ops introduce branches which generally causes sadness around phi location and generation; if we solve the rewrite-to-multiple Blocks issue, this might allow us to do general phi placement first on much larger blocks, then expand these afterwards. One minor issue is the cost of splitting variables between Blocks and reassigning them; we'd want to do this in an order that avoids reassigning to new blocks variables as we split blocks again and again at write barriers.

randall77 · 2016-05-10T20:54:47Z

This means one big block full of calls instead of many little CallBlocks?

Right.

dr2chase · 2016-05-11T16:06:08Z

I did a little research on #15537, using the improved-version of sparse (that generates the cleanest code, hence slightly less repulsive to humans, and a svelte 8.8MB dump after the build phase). Crudely, after building SSA, the largest function Theproto3Description has these stats:

222053 lines
first block defines 18190 Values.
35952 Blocks (11222 Call, 11476 Plain, 336 Exit, 6996 Check)
150146 Values (6054 Phi, 36483 Copy, 23976 Copy)
highest numbered Value is 150148

After early deadcode:
183521 lines
first block defines 18190 Values.
35441 Blocks (11222 Call, 10966 Plain, 336 Exit, 6996 Check)
112638 Values (5544 Phi)

After flagalloc (note blocks added to remove critical edges):
134496 lines
28927 blocks (256 are for critical edges, 1255 Check)
76642 values (5483 Phi, I think 49384 are
highest numbered Value is 149634

Merging Call blocks with their successors should cut the number of blocks by about 1/3, and also cut the number of Copy nodes out of the phi assignment phase. This seems like a good thing.

Given how costly register allocation is in space and time, would it make sense to run a Value compaction pass just before?

randall77 · 2016-05-11T16:13:43Z

Yes, if value number compaction would help we can do that.

josharian · 2016-05-11T18:07:07Z

I think value number compaction will help a lot with our trouble case. Normal ID utilization at entry to regalloc is normally 30–60%; for the trouble case, it is 0.6%, and that translates into giant allocs. I started on a number compaction optimization a couple of days ago but ran into some bumps (free lists, constant caches) and decided to see whether I could avoid generating so many values to begin with. I found a few minor optimizations (worth a couple of percent of total allocs) but nothing worth mailing during the freeze. I have one idea left to try before moving on, will know about that soon. If no one has started on it, I'm happy to finish off ID compaction. Please let me know one way or the other.

dr2chase · 2016-05-11T18:21:05Z

Value number compaction looks like it might be worth doing.
These numbers are approximate (minor version skew, etc)
but we're talking about gigabytes and minutes of difference:

tip:
real 10m36.569s
user 25m52.286s
sys 4m3.696s
8278.85MB

tip+sparse:
real 4m35.200s
user 13m16.644s
sys 0m36.712s
3427.73MB

tip+sparse+tweak_for_1.8:
real 3m23.623s
user 9m40.168s
sys 0m19.299s
2434.28MB

tip+value_compact:
real 5m56.505s
user 14m46.326s
sys 2m8.673s
6190.47MB

tip+sparse+value_compact:
real 3m17.096s
user 9m16.985s
sys 0m25.464s
3092.89MB

tip+sparse+tweak_for_1.8+value_compact:
real 2m57.781s
user 8m36.943s
sys 0m14.986s
2443.76MB

Obviously, I have a value-compacter written.

It will mess with debugging in ssa.html, so we might only want to enable it when the number of values is very large, or perhaps if we can determine ahead of time that it will win big.

Note that it still helps a little even with tip+sparse+tweak_for_1.8, but that tweak leaves the values alone. The tweaked version avoids creating a whole lot of phi functions that then gets eliminated; Call merging will avoid creating a whole bunch of Copy that then gets eliminated. Long run, value compaction might not be that helpful, but in the short run it looks safe and effective.

josharian · 2016-05-11T18:30:03Z

Looks like we are entirely on the same page. Will you go ahead and mail your ID compactor so that I can take a look? We should be able to cheaply decide to only compact if f.NumValues > some threshold and % utilization < some threshold, but it might always be worth doing, at least until other improvements obviate it. And yes, we will definitely want some way to disable it entirely via a flag for debugging.

Given the subtlety of sparse phi and other mucking around, it's possible we should just do ID compaction for 1.8; not sure it that gets us far enough.

dr2chase · 2016-05-11T18:47:50Z

I'm going to make a CL of it as soon as it passes tests locally. I don't think the sparse phi stuff is that subtle; it's tree numbering games for the dominators and balanced-tree data structures, and it helps avoid renumbering completely. If there's places it needs more explanation, I can work on that.

josharian · 2016-05-11T20:57:43Z

With regard to the original content of this issue (getting rid of SSA Call block type), SGTM. I have a handful of optimizations sitting in my local tree that will be impacted by it, so ideally this would happen early in the 1.8 cycle, so that I can rebase and remeasure.

gopherbot · 2016-09-09T21:00:24Z

CL https://golang.org/cl/28950 mentions this issue.

randall77 self-assigned this May 10, 2016

randall77 added the ToolSpeed label May 10, 2016

randall77 added this to the Go1.8 milestone May 10, 2016

randall77 mentioned this issue May 10, 2016

cmd/compile: stores in infinite loops disappear #15635

Closed

dr2chase mentioned this issue May 11, 2016

cmd/compile: Renumbering Value IDs before register allocation can save time/space #15652

Closed

gopherbot closed this as completed in c345a39 Sep 12, 2016

golang locked and limited conversation to collaborators Sep 12, 2017

gopherbot added the FrozenDueToAge label Sep 12, 2017

rsc unassigned randall77 Jun 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/compile: get rid of SSA call block type #15631

cmd/compile: get rid of SSA call block type #15631

randall77 commented May 10, 2016

dr2chase commented May 10, 2016

randall77 commented May 10, 2016

dr2chase commented May 11, 2016

randall77 commented May 11, 2016

josharian commented May 11, 2016

dr2chase commented May 11, 2016

josharian commented May 11, 2016

dr2chase commented May 11, 2016

josharian commented May 11, 2016

gopherbot commented Sep 9, 2016

cmd/compile: get rid of SSA call block type #15631

cmd/compile: get rid of SSA call block type #15631

Comments

randall77 commented May 10, 2016

dr2chase commented May 10, 2016

randall77 commented May 10, 2016

dr2chase commented May 11, 2016

randall77 commented May 11, 2016

josharian commented May 11, 2016

dr2chase commented May 11, 2016

josharian commented May 11, 2016

dr2chase commented May 11, 2016

josharian commented May 11, 2016

gopherbot commented Sep 9, 2016