Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/tools/go/cfg: add AST information to Blocks #53367

Closed
andrewbaxter opened this issue Jun 14, 2022 · 22 comments
Closed

x/tools/go/cfg: add AST information to Blocks #53367

andrewbaxter opened this issue Jun 14, 2022 · 22 comments

Comments

@andrewbaxter
Copy link

andrewbaxter commented Jun 14, 2022

(Edit: jump to #53367 (comment) for specific API)

I'd like to add some metadata to Block to indicate which part of the code the block is associated with.

I'm doing an analysis based on the CFG which reports branches (regarding variable accesses or rather lack thereof), but sometimes critical blocks have no elements so it's impossible to report where in a function the issue was.

I think the token.Pos of the closest associated AST node would be good enough, but any alternative that solves the above would be great. So for instance for.body and for.done could both have s.Pos(). Since there are synthetic blocks I don't think there's a perfect solution.

@gopherbot gopherbot added this to the Proposal milestone Jun 14, 2022
@timothy-king
Copy link
Contributor

You may want to compare against x/tools/go/ssa as it is a close analog. It assigns token.Pos for each instruction. FWIW you can also create synthetic ast.Nodes.

Also which are you proposing: extending Block with an ast.Node or a token.Pos?

2cents: Both seem quite likely to be backwards compatible. It may be reasonable to move this out of the proposal process. To understand the cost of the change having a draft available somewhere would help. This type of extension is something that the community can contribute.

@andrewbaxter
Copy link
Author

andrewbaxter commented Jun 15, 2022

Oh cool, hadn't seen ssa... I'll need to look at that.

Also which are you proposing: extending Block with an ast.Node or a token.Pos?

Okay to keep things simple, I'm proposing token.Pos just because that's what I did locally in my fork. I don't know if this would have other use cases that would require more data.

To understand the cost of the change having a draft available somewhere would help

Ah I just noticed the mirror here on github. I can make an MR there (just the minimal changes I hacked in).

@andrewbaxter
Copy link
Author

andrewbaxter commented Jun 15, 2022

I checked out ssa, but I'm not sure how much I can infer about the source from it in order to report convention/style issues. For instance, will var x always correspond to a Alloc with a naive build, or could that disappear or be a different node type due to internal ssa code changes?

My gut feeling is cfg is intended to be closer to the source (and it has an analysis via ctrlflow already, etc), otherwise ssa seems like it would have solved my problem.

@timothy-king
Copy link
Contributor

var x can be represented by things other than an Alloc in ssa. ssa is closer to a compiler's IR than ast. Like a compiler's IR it can optimize away variables. It is designed to enable analysis so it tried to stay close-ish to the source (after expanding implicit constructs, order of operations, basic escape analysis, assigning types, etc). If you are primarily concerned with the ast and syntax it is not a good choice.

My point about ssa is that it is likely to have solved the issue for which token.Pos to select in different situations. The implementation for cfg also is highly similar to a component in the implementation of ssa.

@ianlancetaylor ianlancetaylor changed the title proposal: x/tools/go/cfg: Add location information to Blocks proposal: x/tools/go/cfg: add location information to Blocks Jun 15, 2022
@ianlancetaylor ianlancetaylor added this to Incoming in Proposals (old) Jun 15, 2022
@andrewbaxter
Copy link
Author

Ah right, that makes sense.

Okay, here's what I'm using locally (rougly): golang/tools#385

@timothy-king
Copy link
Contributor

Thank you for putting together a PR for us to look at. That really helps.

Skimming over the code. The core suggestion of extending Block with a Pos field looks reasonable. Removing the Pos field would be a breaking change so it is hard to back out of if it gets added (so there is some reason to be cautious). My main concern is whether this information is enough to solve the problem. A bunch of positions are the same so it may be confusing to use this field. It is unclear if Pos + comment (via String() ) suffices.

@adonovan thoughts?

Some comments on the implementation. The documentation for what the Pos field means and when it exists (!=NoPos) means may need clarification. Some of the position choices look like they could be improved. This can all be resolved later IMO.

@adonovan
Copy link
Member

What can a user conclude from the association of a block with a position (or statement)? I can't see any consistent interpretation, so the only way to use Block.Pos would be to study the CFG builder implementation very carefully and hope that it doesn't change. That doesn't seem like a good interface.

Also, I suggest you connect directly to the actual syntax tree instead of just its Pos. I think it would be reasonable for all basic blocks that come from explicit {...} syntactic blocks to link to the BlockStmt. Would that be enough to solve your problem? What is the specific problem, exactly?

@andrewbaxter
Copy link
Author

andrewbaxter commented Jun 23, 2022

Sorry, I should have probably been more specific up front. My exact use case is I'm doing branch analysis around explicit variable initialization. So I want to report when a variable was not initialized in a certain branch where it was in others. So for example

if x {
    needsToBeInitialized = value
} else {
}

the else branch block would contain no instructions, however there's a clearly associated syntax element (the else token, or the {) and I'd like to point the user to that location if possible.

Of course, there won't always be a good source location, such as if the else was omitted here. However in that case even just pointing to if and saying "there's an issue in one of the branches of this if statement" would be great.

that come from explicit {...} syntactic blocks

I guess you're suggesting that implicit branches would have a nil/zero position. That makes sense to me.

It would help if I could establish some rule for finding the nearest relevant syntactic element though, like "if a block has a nil location, there will always either be a single predecessor block or single successor block that has non-nil location information".

Also, I suggest you connect directly to the actual syntax tree instead of just its Pos

Like an AST node? There would be no issue with that I think, I didn't have any reason to chose Pos over Node except that I didn't need Node here.

@timothy-king
Copy link
Contributor

I think it would be reasonable for all basic blocks that come from explicit {...} syntactic blocks to link to the BlockStmt.

Let me see if I understand the suggestion. On if x { if y { foo() }; bar() }; baz(), I think we would have the blocks b0 : if x { b1: if y { b2: foo() }; b3: bar() }; b4: baz(). IIU the suggestion correctly we would have the nodes b1 -> { if y { foo() }; bar() }, b2->{ foo() } and b0, b3, and b4 having no BlockStmt. Is this right? [roughly?]

@adonovan
Copy link
Member

adonovan commented Jan 10, 2024

It would help if I could establish some rule for finding the nearest relevant syntactic element though, like "if a block has a nil location, there will always either be a single predecessor block or single successor block that has non-nil location information".

What if we change the Block.comment field to a public enum, add a Block.Syntax field that links to the {Labeled,Switch,Block,If,Range,For,Branch,Select}Stmt} node associated with the block, and document the relationship at the enum? The only blocks without syntax are the unreachable ones after a jump.

@gopherbot
Copy link

Change https://go.dev/cl/555255 mentions this issue: go/cfg: record block kind and associated statement

@adonovan
Copy link
Member

See the attached CL for a sketch of what I was referring to. Please try it out and let me know if this solves your problem.

@andrewbaxter
Copy link
Author

Sorry, I totally let this slip off the radar. We're still using my hacked up fork locally atm. I'll take a look, and thanks for looking into it 🙇

@andrewbaxter
Copy link
Author

Oh yeah, that looks awesome! I haven't tried it out yet so I'll try hooking it up, but I don't imagine there'd be any issues with that approach.

@andrewbaxter
Copy link
Author

Sorry again for how much time it took me. I integrated your branch and everything works as expected! We have a decently sized codebase, so I expect it encounters most combinations of blocks somewhere or another.

I don't have anything else to add, the patch 100% solves the issue from my perspective.

So IIUC if we're walking the blocks from a known reachable block we'll never encounter an unreachable block (tautologically)? So everything will have a non-nill Stmt. I didn't get any NPE in my testing so I believe that's the case.

@rsc
Copy link
Contributor

rsc commented Feb 1, 2024

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@adonovan
Copy link
Member

adonovan commented Feb 7, 2024

Proposed API:

type Block struct {
+    Kind BlockKind
   ...
}

// A BlockKind identifies the purpose of a block.
// It also determines the possible types of its Stmt field.
type BlockKind uint8

const (
	KindInvalid BlockKind = iota

	KindUnreachable     // unreachable block after BranchStmt or no-return call ExprStmt
	KindBody            // function body BlockStmt
	KindForBody         // body of ForStmt
	KindForDone         // block after ForStmt
	KindForLoop         // head of ForStmt
	KindForPost         // post condition of ForStmt
	KindIfDone          // block after IfStmt
	KindIfElse          // else block of IfStmt
	KindIfThen          // then block of IfStmt
	KindLabel           // labeled block of BranchStmt
	KindRangeBody       // body of RangeStmt
	KindRangeDone       // block after RangeStmt
	KindRangeLoop       // head of RangeStmt
	KindSelectCaseBody  // body of SelectStmt
	KindSelectDone      // block after SelectStmt
	KindSelectAfterCase // block after a CommClause
	KindSwitchCaseBody  // body of CaseClause
	KindSwitchDone      // block after {Type.}SwitchStmt
	KindSwitchNextCase  // secondary expression of a multi-expression CaseClause
)

func (kind BlockKind) String() string

@timothy-king
Copy link
Contributor

I believe we will also eventually need a kind to represent the basic block for the exit switch of a rangefunc RangeStmt. I think it is okay to leave this out of the proposed API while rangefunc is a GOEXPERIMENT.

@adonovan
Copy link
Member

adonovan commented Feb 7, 2024

I believe we will also eventually need a kind to represent the basic block for the exit switch of a rangefunc RangeStmt. I think it is okay to leave this out of the proposed API while rangefunc is a GOEXPERIMENT.

There's no such thing as an exit switch in this representation, and in any case there's no way for the CFG builder to know whether a RangeStmt's operand is func. I just don't think people use go/cfg to write analyses that care about effects so precisely that they would need to consider such details; that's what go/ssa is for (and still I have never seen a go/ssa client other than the interpreter that really cares about such things).

@timothy-king
Copy link
Contributor

I am not yet convinced that we should not provide this level of accuracy in the future. I do agree that we can punt on rangefunc precision for now. And adding a new kind should be its own proposal. We can hash out rangefunc on that proposal.

@rsc rsc changed the title proposal: x/tools/go/cfg: add location information to Blocks proposal: x/tools/go/cfg: add AST information to Blocks Feb 8, 2024
@rsc
Copy link
Contributor

rsc commented Feb 9, 2024

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

Proposal is in #53367 (comment).

@rsc
Copy link
Contributor

rsc commented Feb 14, 2024

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

Proposal is in #53367 (comment).

@rsc rsc changed the title proposal: x/tools/go/cfg: add AST information to Blocks x/tools/go/cfg: add AST information to Blocks Feb 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Accepted
Development

No branches or pull requests

5 participants