-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: rewrite rulegen to not generate plaintext Go code directly #30810
Comments
I forgot to mention; we should synchronize so that this doesn't cause painful conflicts with other rulegen CLs. Ideally, I wouldn't start work until all outstanding rulegen CLs are merged, and any further rulegen work would be delayed until the rewrite is finished. I should need about a week to rewrite the program. |
As long as the result ends up being reasonably clear and simple (as I believe it can be), this sounds fine to me. |
I'm not really clear on what the goal of this is.
I don't think the rewrites you suggested help much with 3, as the compiler does effectively those rewrites internally anyway. It would probably help 2, but I don't think the compile time of the compiler is much of an issue. I don't really see much benefit for 1 either. Not many people are reading the generated code. Or is there something else? |
The motivation for me is that the giant rewrite files make my editor and git client unhappy. I also actually care a little bit about 1 and 2, both of which I believe this could help with a bit. |
For some quick numbers: we have ~300k generated lines of code for the rules. Josh's CL 167438 removes ~20k, or ~6%. I presume we could realistically shave off another 5-10% elsewhere, by removing more unnecessary lines and simplifying statements. As another data point, I sent CL 167399 yesterday, which shaves off ~4k, or ~1.5%. Its implementation would be simpler if we had an intermediate step; the way I hack this together with the plaintext is by using two layers of buffers, which isn't great. |
Assuming that work on rulegen stalls during the freeze, perhaps I could take the first week of the 1.13 freeze to do this. We can get it merged shortly after, since the generated code should be unaffected - making the change safe even during the freeze. That should minimize the amount of people I'm blocking with the rewrite, I think. |
I think @mvdan is actually arguing for 4., that this change would help make further rulegen changes easier. I'm on board with that one; rulegen changes are currently hard. I'm convinced, let's do it. |
Is anyone planning rulegen changes before the freeze? I only see https://go-review.googlesource.com/c/go/+/167438, which @josharian labeled as "DO NOT SUBMIT" precisely because it's far too complex to do before the rewrite. Perhaps we can do this before the freeze, if no further changes to rulegen are planned. Though dotGo is in just three days, so the earliest I could start is late next week. |
I've been playing with this one https://go-review.googlesource.com/c/go/+/135420 but I don't think I'm going to have time to make it happen for 1.13. |
I have all sorts of half-abandoned rulegen CLs in my tree. :) But I’d much rather finish them after this goes in. So green light from me. |
@mvdan just to be clear (since the title of the issue and much of the description is about what you want the rulegen code to not do), is this summary correct?
|
@cespare correct. The only clarification I'd make is that the printing of the syntax tree to plain Go source code should be pretty dumb. All the extra magic should happen entirely within the syntax tree. For example, Josh's CL mentioned earlier merges if statements. That's easy to write as a "rewrite rule" that modifies nodes. One can imagine other rewrite rules, like removing unused variables, empty switch statements, and so on So, in a way, we'd have rewrite rules for the generated code that applies Go's rewrite rules :) |
Change https://golang.org/cl/177539 mentions this issue: |
This didn't get any reviews during the first two months of the freeze, so it's too risky to review merge with only three weeks to go. I'm moving it to early in the 1.14 cycle. It would be great to get some reviews before the tree reopens though, so we can ensure it doesn't get left for later again. |
We're in good shape; the two main refactor CLs have a +2 and are pretty much ready to go. I plan on working on a few CLs on top, but I'll file separate issues for those, as this will be done. |
rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates #30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates golang#30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
This is done! As said in the previous comment, I have more work planned, but that will happen as separate issues like #33644. |
rulegen.go produces plaintext Go code directly, which was fine for a while. However, that's started being a bottleneck for making code generation more complex, as we can only generate code directly one line at a time. Some workarounds were used, like multiple layers of buffers to generate chunks of code, to then use strings.Contains to see whether variables need to be defined or not. However, that's error-prone, verbose, and difficult to work with. A better approach is to generate an intermediate syntax tree in memory, which we can inspect and modify easily. For example, we could run a number of "passes" on the syntax tree before writing to disk, such as removing unused variables, simplifying logic, or moving declarations closer to their uses. This is the first step in that direction, without changing any of the generated code. We didn't use go/ast directly, as it's too complex for our needs. In particular, we only need a few kinds of simple statements, but we do want to support arbitrary expressions. As such, define a simple set of statement structs, and add thin layers for printer.Fprint and ast.Inspect. A nice side effect of this change, besides removing some buffers and string handling, is that we can now avoid passing so many parameters around. And, while we add over a hundred lines of code, the tricky pieces of code are now a bit simpler to follow. While at it, apply some cleanups, such as replacing isVariable with token.IsIdentifier, and consistently using log.Fatalf. Follow-up CLs will start improving the generated code, also simplifying the rulegen code itself. I've added some TODOs for the low-hanging fruit that I intend to work on right after. Updates golang#30810. Change-Id: Ic371c192b29c85dfc4a001be7fbcbeec85facc9d Reviewed-on: https://go-review.googlesource.com/c/go/+/177539 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
cmd/compile/internal/ssa/gen/rulegen.go
is a program that generates the code for all the SSA rewrite rules.The way it currently works, the code is almost always generated directly. This keeps the code simple, but also limits it quite a bit. For example, it's hard to tell if a variable will be used or not, so declarations like
typ := &config.Types
tend to be followed by_ = typ
.It also means we tend to write repetitive and verbose code. For example, these two if statements could be collapsed like
if X || Y { break }
:@josharian tried to do this last bit with string handling in https://go-review.googlesource.com/c/go/+/167438, but that turns into spaghetti code that noone can maintain.
There are other ways we could post-process the code to simplify it, too. For example, we could move declarations closer to their uses, and we could remove unused variable declarations instead of using so many
_ = foo
safeguards.We could use
go/ast
for this, but that's probably too complex for our needs. The generated code only uses a tiny subset of Go, and we could have nodes for higher-level concepts like "rewrite this value".The last step of the program would be to stringify the node tree, and pass it through
go/format
as usual.The purpose of this issue is first to gather feedback. If it sounds like a good idea, I'd like to begin work on this soon, hopefully before the freeze is here again. The first incremental step would probably be to rewrite rulegen to use the intermediate node tree, without changing the generated code at all.
/cc @josharian @mundaym @cherrymui @randall77 @dr2chase
The text was updated successfully, but these errors were encountered: