Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: rewrite rulegen to not generate plaintext Go code directly #30810

Closed
mvdan opened this issue Mar 13, 2019 · 17 comments
Closed

cmd/compile: rewrite rulegen to not generate plaintext Go code directly #30810

mvdan opened this issue Mar 13, 2019 · 17 comments
Labels
early-in-cycle A change that should be done early in the 3 month dev cycle. FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@mvdan
Copy link
Member

mvdan commented Mar 13, 2019

cmd/compile/internal/ssa/gen/rulegen.go is a program that generates the code for all the SSA rewrite rules.

The way it currently works, the code is almost always generated directly. This keeps the code simple, but also limits it quite a bit. For example, it's hard to tell if a variable will be used or not, so declarations like typ := &config.Types tend to be followed by _ = typ.

It also means we tend to write repetitive and verbose code. For example, these two if statements could be collapsed like if X || Y { break }:

if z1.Op != OpAMD64SHRLconst {
        break
}
if z1.AuxInt != 31 {
        break
}

@josharian tried to do this last bit with string handling in https://go-review.googlesource.com/c/go/+/167438, but that turns into spaghetti code that noone can maintain.

There are other ways we could post-process the code to simplify it, too. For example, we could move declarations closer to their uses, and we could remove unused variable declarations instead of using so many _ = foo safeguards.

We could use go/ast for this, but that's probably too complex for our needs. The generated code only uses a tiny subset of Go, and we could have nodes for higher-level concepts like "rewrite this value".

The last step of the program would be to stringify the node tree, and pass it through go/format as usual.

The purpose of this issue is first to gather feedback. If it sounds like a good idea, I'd like to begin work on this soon, hopefully before the freeze is here again. The first incremental step would probably be to rewrite rulegen to use the intermediate node tree, without changing the generated code at all.

/cc @josharian @mundaym @cherrymui @randall77 @dr2chase

@mvdan
Copy link
Member Author

mvdan commented Mar 13, 2019

I forgot to mention; we should synchronize so that this doesn't cause painful conflicts with other rulegen CLs. Ideally, I wouldn't start work until all outstanding rulegen CLs are merged, and any further rulegen work would be delayed until the rewrite is finished. I should need about a week to rewrite the program.

@josharian
Copy link
Contributor

As long as the result ends up being reasonably clear and simple (as I believe it can be), this sounds fine to me.

@randall77
Copy link
Contributor

I'm not really clear on what the goal of this is.

  1. Make the generated code more readable.
  2. Make the generated code compile faster.
  3. Make the generated code run faster.

I don't think the rewrites you suggested help much with 3, as the compiler does effectively those rewrites internally anyway.

It would probably help 2, but I don't think the compile time of the compiler is much of an issue.

I don't really see much benefit for 1 either. Not many people are reading the generated code.

Or is there something else?

@josharian
Copy link
Contributor

The motivation for me is that the giant rewrite files make my editor and git client unhappy.

I also actually care a little bit about 1 and 2, both of which I believe this could help with a bit.

@mvdan
Copy link
Member Author

mvdan commented Mar 14, 2019

For some quick numbers: we have ~300k generated lines of code for the rules. Josh's CL 167438 removes ~20k, or ~6%. I presume we could realistically shave off another 5-10% elsewhere, by removing more unnecessary lines and simplifying statements.

As another data point, I sent CL 167399 yesterday, which shaves off ~4k, or ~1.5%. Its implementation would be simpler if we had an intermediate step; the way I hack this together with the plaintext is by using two layers of buffers, which isn't great.

@katiehockman katiehockman added the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Mar 19, 2019
@mvdan
Copy link
Member Author

mvdan commented Mar 22, 2019

Ideally, I wouldn't start work until all outstanding rulegen CLs are merged, and any further rulegen work would be delayed until the rewrite is finished. I should need about a week to rewrite the program.

Assuming that work on rulegen stalls during the freeze, perhaps I could take the first week of the 1.13 freeze to do this. We can get it merged shortly after, since the generated code should be unaffected - making the change safe even during the freeze. That should minimize the amount of people I'm blocking with the rewrite, I think.

@randall77
Copy link
Contributor

I think @mvdan is actually arguing for 4., that this change would help make further rulegen changes easier. I'm on board with that one; rulegen changes are currently hard.

I'm convinced, let's do it.

@mvdan mvdan self-assigned this Mar 22, 2019
@mvdan mvdan added this to the Go1.13 milestone Mar 22, 2019
@mvdan mvdan added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. labels Mar 22, 2019
@mvdan
Copy link
Member Author

mvdan commented Mar 22, 2019

Is anyone planning rulegen changes before the freeze? I only see https://go-review.googlesource.com/c/go/+/167438, which @josharian labeled as "DO NOT SUBMIT" precisely because it's far too complex to do before the rewrite.

Perhaps we can do this before the freeze, if no further changes to rulegen are planned. Though dotGo is in just three days, so the earliest I could start is late next week.

@randall77
Copy link
Contributor

I've been playing with this one https://go-review.googlesource.com/c/go/+/135420 but I don't think I'm going to have time to make it happen for 1.13.

@josharian
Copy link
Contributor

I have all sorts of half-abandoned rulegen CLs in my tree. :) But I’d much rather finish them after this goes in. So green light from me.

@cespare
Copy link
Contributor

cespare commented Mar 22, 2019

@mvdan just to be clear (since the title of the issue and much of the description is about what you want the rulegen code to not do), is this summary correct?

  • Introduce a new AST structure that's narrowly focused on what rulegen needs
  • The AST->Go conversion handles things like not emitting unused variables
  • The bulk of the rulegen code manipulates an AST, which is hopefully simpler than the printf-ing it's doing now

@josharian
Copy link
Contributor

One possible rulegen change that I can imagine wanting to make in 1.13 is to support @mundaym's logic pass, discussed in #30645. But again, that would be easier to do after a rulegen cleanup. :)

@mvdan
Copy link
Member Author

mvdan commented Mar 23, 2019

@cespare correct. The only clarification I'd make is that the printing of the syntax tree to plain Go source code should be pretty dumb. All the extra magic should happen entirely within the syntax tree.

For example, Josh's CL mentioned earlier merges if statements. That's easy to write as a "rewrite rule" that modifies nodes. One can imagine other rewrite rules, like removing unused variables, empty switch statements, and so on So, in a way, we'd have rewrite rules for the generated code that applies Go's rewrite rules :)

@gopherbot
Copy link

Change https://golang.org/cl/177539 mentions this issue: cmd/compile: initial rulegen rewrite

@mvdan mvdan modified the milestones: Go1.13, Go1.14 Jul 6, 2019
@mvdan mvdan added the early-in-cycle A change that should be done early in the 3 month dev cycle. label Jul 6, 2019
@mvdan
Copy link
Member Author

mvdan commented Jul 6, 2019

This didn't get any reviews during the first two months of the freeze, so it's too risky to review merge with only three weeks to go. I'm moving it to early in the 1.14 cycle. It would be great to get some reviews before the tree reopens though, so we can ensure it doesn't get left for later again.

@mvdan
Copy link
Member Author

mvdan commented Aug 14, 2019

We're in good shape; the two main refactor CLs have a +2 and are pretty much ready to go. I plan on working on a few CLs on top, but I'll file separate issues for those, as this will be done.

gopherbot pushed a commit that referenced this issue Aug 27, 2019
rulegen.go produces plaintext Go code directly, which was fine for a
while. However, that's started being a bottleneck for making code
generation more complex, as we can only generate code directly one line
at a time.

Some workarounds were used, like multiple layers of buffers to generate
chunks of code, to then use strings.Contains to see whether variables
need to be defined or not. However, that's error-prone, verbose, and
difficult to work with.

A better approach is to generate an intermediate syntax tree in memory,
which we can inspect and modify easily. For example, we could run a
number of "passes" on the syntax tree before writing to disk, such as
removing unused variables, simplifying logic, or moving declarations
closer to their uses.

This is the first step in that direction, without changing any of the
generated code. We didn't use go/ast directly, as it's too complex for
our needs. In particular, we only need a few kinds of simple statements,
but we do want to support arbitrary expressions. As such, define a
simple set of statement structs, and add thin layers for printer.Fprint
and ast.Inspect.

A nice side effect of this change, besides removing some buffers and
string handling, is that we can now avoid passing so many parameters
around. And, while we add over a hundred lines of code, the tricky
pieces of code are now a bit simpler to follow.

While at it, apply some cleanups, such as replacing isVariable with
token.IsIdentifier, and consistently using log.Fatalf.

Follow-up CLs will start improving the generated code, also simplifying
the rulegen code itself. I've added some TODOs for the low-hanging fruit
that I intend to work on right after.

Updates #30810.

Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d
Reviewed-on: https://go-review.googlesource.com/c/go/+/177539
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
tomocy pushed a commit to tomocy/go that referenced this issue Sep 1, 2019
rulegen.go produces plaintext Go code directly, which was fine for a
while. However, that's started being a bottleneck for making code
generation more complex, as we can only generate code directly one line
at a time.

Some workarounds were used, like multiple layers of buffers to generate
chunks of code, to then use strings.Contains to see whether variables
need to be defined or not. However, that's error-prone, verbose, and
difficult to work with.

A better approach is to generate an intermediate syntax tree in memory,
which we can inspect and modify easily. For example, we could run a
number of "passes" on the syntax tree before writing to disk, such as
removing unused variables, simplifying logic, or moving declarations
closer to their uses.

This is the first step in that direction, without changing any of the
generated code. We didn't use go/ast directly, as it's too complex for
our needs. In particular, we only need a few kinds of simple statements,
but we do want to support arbitrary expressions. As such, define a
simple set of statement structs, and add thin layers for printer.Fprint
and ast.Inspect.

A nice side effect of this change, besides removing some buffers and
string handling, is that we can now avoid passing so many parameters
around. And, while we add over a hundred lines of code, the tricky
pieces of code are now a bit simpler to follow.

While at it, apply some cleanups, such as replacing isVariable with
token.IsIdentifier, and consistently using log.Fatalf.

Follow-up CLs will start improving the generated code, also simplifying
the rulegen code itself. I've added some TODOs for the low-hanging fruit
that I intend to work on right after.

Updates golang#30810.

Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d
Reviewed-on: https://go-review.googlesource.com/c/go/+/177539
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
@mvdan
Copy link
Member Author

mvdan commented Sep 2, 2019

This is done! As said in the previous comment, I have more work planned, but that will happen as separate issues like #33644.

@mvdan mvdan closed this as completed Sep 2, 2019
t4n6a1ka pushed a commit to t4n6a1ka/go that referenced this issue Sep 5, 2019
rulegen.go produces plaintext Go code directly, which was fine for a
while. However, that's started being a bottleneck for making code
generation more complex, as we can only generate code directly one line
at a time.

Some workarounds were used, like multiple layers of buffers to generate
chunks of code, to then use strings.Contains to see whether variables
need to be defined or not. However, that's error-prone, verbose, and
difficult to work with.

A better approach is to generate an intermediate syntax tree in memory,
which we can inspect and modify easily. For example, we could run a
number of "passes" on the syntax tree before writing to disk, such as
removing unused variables, simplifying logic, or moving declarations
closer to their uses.

This is the first step in that direction, without changing any of the
generated code. We didn't use go/ast directly, as it's too complex for
our needs. In particular, we only need a few kinds of simple statements,
but we do want to support arbitrary expressions. As such, define a
simple set of statement structs, and add thin layers for printer.Fprint
and ast.Inspect.

A nice side effect of this change, besides removing some buffers and
string handling, is that we can now avoid passing so many parameters
around. And, while we add over a hundred lines of code, the tricky
pieces of code are now a bit simpler to follow.

While at it, apply some cleanups, such as replacing isVariable with
token.IsIdentifier, and consistently using log.Fatalf.

Follow-up CLs will start improving the generated code, also simplifying
the rulegen code itself. I've added some TODOs for the low-hanging fruit
that I intend to work on right after.

Updates golang#30810.

Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d
Reviewed-on: https://go-review.googlesource.com/c/go/+/177539
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
@golang golang locked and limited conversation to collaborators Sep 1, 2020
@rsc rsc unassigned mvdan Jun 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
early-in-cycle A change that should be done early in the 3 month dev cycle. FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

6 participants