New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x/tools/go/cfg: add AST information to Blocks #53367
Comments
You may want to compare against x/tools/go/ssa as it is a close analog. It assigns token.Pos for each instruction. FWIW you can also create synthetic ast.Nodes. Also which are you proposing: extending Block with an ast.Node or a token.Pos? 2cents: Both seem quite likely to be backwards compatible. It may be reasonable to move this out of the proposal process. To understand the cost of the change having a draft available somewhere would help. This type of extension is something that the community can contribute. |
Oh cool, hadn't seen ssa... I'll need to look at that.
Okay to keep things simple, I'm proposing
Ah I just noticed the mirror here on github. I can make an MR there (just the minimal changes I hacked in). |
I checked out ssa, but I'm not sure how much I can infer about the source from it in order to report convention/style issues. For instance, will My gut feeling is |
My point about ssa is that it is likely to have solved the issue for which token.Pos to select in different situations. The implementation for |
Ah right, that makes sense. Okay, here's what I'm using locally (rougly): golang/tools#385 |
Thank you for putting together a PR for us to look at. That really helps. Skimming over the code. The core suggestion of extending Block with a Pos field looks reasonable. Removing the Pos field would be a breaking change so it is hard to back out of if it gets added (so there is some reason to be cautious). My main concern is whether this information is enough to solve the problem. A bunch of positions are the same so it may be confusing to use this field. It is unclear if Pos + comment (via String() ) suffices. @adonovan thoughts? Some comments on the implementation. The documentation for what the Pos field means and when it exists (!=NoPos) means may need clarification. Some of the position choices look like they could be improved. This can all be resolved later IMO. |
What can a user conclude from the association of a block with a position (or statement)? I can't see any consistent interpretation, so the only way to use Block.Pos would be to study the CFG builder implementation very carefully and hope that it doesn't change. That doesn't seem like a good interface. Also, I suggest you connect directly to the actual syntax tree instead of just its Pos. I think it would be reasonable for all basic blocks that come from explicit {...} syntactic blocks to link to the BlockStmt. Would that be enough to solve your problem? What is the specific problem, exactly? |
Sorry, I should have probably been more specific up front. My exact use case is I'm doing branch analysis around explicit variable initialization. So I want to report when a variable was not initialized in a certain branch where it was in others. So for example
the Of course, there won't always be a good source location, such as if the
I guess you're suggesting that implicit branches would have a nil/zero position. That makes sense to me. It would help if I could establish some rule for finding the nearest relevant syntactic element though, like "if a block has a nil location, there will always either be a single predecessor block or single successor block that has non-nil location information".
Like an AST node? There would be no issue with that I think, I didn't have any reason to chose Pos over Node except that I didn't need Node here. |
Let me see if I understand the suggestion. On |
What if we change the Block.comment field to a public enum, add a Block.Syntax field that links to the |
Change https://go.dev/cl/555255 mentions this issue: |
See the attached CL for a sketch of what I was referring to. Please try it out and let me know if this solves your problem. |
Sorry, I totally let this slip off the radar. We're still using my hacked up fork locally atm. I'll take a look, and thanks for looking into it 🙇 |
Oh yeah, that looks awesome! I haven't tried it out yet so I'll try hooking it up, but I don't imagine there'd be any issues with that approach. |
Sorry again for how much time it took me. I integrated your branch and everything works as expected! We have a decently sized codebase, so I expect it encounters most combinations of blocks somewhere or another. I don't have anything else to add, the patch 100% solves the issue from my perspective. So IIUC if we're walking the blocks from a known reachable block we'll never encounter an unreachable block (tautologically)? So everything will have a non-nill Stmt. I didn't get any NPE in my testing so I believe that's the case. |
This proposal has been added to the active column of the proposals project |
Proposed API: type Block struct {
+ Kind BlockKind
...
}
// A BlockKind identifies the purpose of a block.
// It also determines the possible types of its Stmt field.
type BlockKind uint8
const (
KindInvalid BlockKind = iota
KindUnreachable // unreachable block after BranchStmt or no-return call ExprStmt
KindBody // function body BlockStmt
KindForBody // body of ForStmt
KindForDone // block after ForStmt
KindForLoop // head of ForStmt
KindForPost // post condition of ForStmt
KindIfDone // block after IfStmt
KindIfElse // else block of IfStmt
KindIfThen // then block of IfStmt
KindLabel // labeled block of BranchStmt
KindRangeBody // body of RangeStmt
KindRangeDone // block after RangeStmt
KindRangeLoop // head of RangeStmt
KindSelectCaseBody // body of SelectStmt
KindSelectDone // block after SelectStmt
KindSelectAfterCase // block after a CommClause
KindSwitchCaseBody // body of CaseClause
KindSwitchDone // block after {Type.}SwitchStmt
KindSwitchNextCase // secondary expression of a multi-expression CaseClause
)
func (kind BlockKind) String() string |
I believe we will also eventually need a kind to represent the basic block for the exit switch of a rangefunc RangeStmt. I think it is okay to leave this out of the proposed API while rangefunc is a GOEXPERIMENT. |
There's no such thing as an exit switch in this representation, and in any case there's no way for the CFG builder to know whether a RangeStmt's operand is func. I just don't think people use go/cfg to write analyses that care about effects so precisely that they would need to consider such details; that's what go/ssa is for (and still I have never seen a go/ssa client other than the interpreter that really cares about such things). |
I am not yet convinced that we should not provide this level of accuracy in the future. I do agree that we can punt on rangefunc precision for now. And adding a new kind should be its own proposal. We can hash out rangefunc on that proposal. |
Based on the discussion above, this proposal seems like a likely accept. Proposal is in #53367 (comment). |
No change in consensus, so accepted. 🎉 Proposal is in #53367 (comment). |
(Edit: jump to #53367 (comment) for specific API)
I'd like to add some metadata to
Block
to indicate which part of the code the block is associated with.I'm doing an analysis based on the CFG which reports branches (regarding variable accesses or rather lack thereof), but sometimes critical blocks have no elements so it's impossible to report where in a function the issue was.
I think the
token.Pos
of the closest associated AST node would be good enough, but any alternative that solves the above would be great. So for instancefor.body
andfor.done
could both haves.Pos()
. Since there are synthetic blocks I don't think there's a perfect solution.The text was updated successfully, but these errors were encountered: