x/tools/go/cfg: add AST information to Blocks #53367

andrewbaxter · 2022-06-14T11:21:17Z

(Edit: jump to #53367 (comment) for specific API)

I'd like to add some metadata to Block to indicate which part of the code the block is associated with.

I'm doing an analysis based on the CFG which reports branches (regarding variable accesses or rather lack thereof), but sometimes critical blocks have no elements so it's impossible to report where in a function the issue was.

I think the token.Pos of the closest associated AST node would be good enough, but any alternative that solves the above would be great. So for instance for.body and for.done could both have s.Pos(). Since there are synthetic blocks I don't think there's a perfect solution.

The text was updated successfully, but these errors were encountered:

timothy-king · 2022-06-14T17:23:59Z

You may want to compare against x/tools/go/ssa as it is a close analog. It assigns token.Pos for each instruction. FWIW you can also create synthetic ast.Nodes.

Also which are you proposing: extending Block with an ast.Node or a token.Pos?

2cents: Both seem quite likely to be backwards compatible. It may be reasonable to move this out of the proposal process. To understand the cost of the change having a draft available somewhere would help. This type of extension is something that the community can contribute.

andrewbaxter · 2022-06-15T07:22:41Z

Oh cool, hadn't seen ssa... I'll need to look at that.

Also which are you proposing: extending Block with an ast.Node or a token.Pos?

Okay to keep things simple, I'm proposing token.Pos just because that's what I did locally in my fork. I don't know if this would have other use cases that would require more data.

To understand the cost of the change having a draft available somewhere would help

Ah I just noticed the mirror here on github. I can make an MR there (just the minimal changes I hacked in).

andrewbaxter · 2022-06-15T10:53:43Z

I checked out ssa, but I'm not sure how much I can infer about the source from it in order to report convention/style issues. For instance, will var x always correspond to a Alloc with a naive build, or could that disappear or be a different node type due to internal ssa code changes?

My gut feeling is cfg is intended to be closer to the source (and it has an analysis via ctrlflow already, etc), otherwise ssa seems like it would have solved my problem.

timothy-king · 2022-06-15T15:47:45Z

var x can be represented by things other than an Alloc in ssa. ssa is closer to a compiler's IR than ast. Like a compiler's IR it can optimize away variables. It is designed to enable analysis so it tried to stay close-ish to the source (after expanding implicit constructs, order of operations, basic escape analysis, assigning types, etc). If you are primarily concerned with the ast and syntax it is not a good choice.

My point about ssa is that it is likely to have solved the issue for which token.Pos to select in different situations. The implementation for cfg also is highly similar to a component in the implementation of ssa.

andrewbaxter · 2022-06-16T02:58:52Z

Ah right, that makes sense.

Okay, here's what I'm using locally (rougly): golang/tools#385

timothy-king · 2022-06-22T17:48:37Z

Thank you for putting together a PR for us to look at. That really helps.

Skimming over the code. The core suggestion of extending Block with a Pos field looks reasonable. Removing the Pos field would be a breaking change so it is hard to back out of if it gets added (so there is some reason to be cautious). My main concern is whether this information is enough to solve the problem. A bunch of positions are the same so it may be confusing to use this field. It is unclear if Pos + comment (via String() ) suffices.

@adonovan thoughts?

Some comments on the implementation. The documentation for what the Pos field means and when it exists (!=NoPos) means may need clarification. Some of the position choices look like they could be improved. This can all be resolved later IMO.

adonovan · 2022-06-22T18:49:10Z

What can a user conclude from the association of a block with a position (or statement)? I can't see any consistent interpretation, so the only way to use Block.Pos would be to study the CFG builder implementation very carefully and hope that it doesn't change. That doesn't seem like a good interface.

Also, I suggest you connect directly to the actual syntax tree instead of just its Pos. I think it would be reasonable for all basic blocks that come from explicit {...} syntactic blocks to link to the BlockStmt. Would that be enough to solve your problem? What is the specific problem, exactly?

andrewbaxter · 2022-06-23T14:29:39Z

Sorry, I should have probably been more specific up front. My exact use case is I'm doing branch analysis around explicit variable initialization. So I want to report when a variable was not initialized in a certain branch where it was in others. So for example

if x {
    needsToBeInitialized = value
} else {
}

the else branch block would contain no instructions, however there's a clearly associated syntax element (the else token, or the {) and I'd like to point the user to that location if possible.

Of course, there won't always be a good source location, such as if the else was omitted here. However in that case even just pointing to if and saying "there's an issue in one of the branches of this if statement" would be great.

that come from explicit {...} syntactic blocks

I guess you're suggesting that implicit branches would have a nil/zero position. That makes sense to me.

It would help if I could establish some rule for finding the nearest relevant syntactic element though, like "if a block has a nil location, there will always either be a single predecessor block or single successor block that has non-nil location information".

Also, I suggest you connect directly to the actual syntax tree instead of just its Pos

Like an AST node? There would be no issue with that I think, I didn't have any reason to chose Pos over Node except that I didn't need Node here.

timothy-king · 2022-06-23T18:12:51Z

I think it would be reasonable for all basic blocks that come from explicit {...} syntactic blocks to link to the BlockStmt.

Let me see if I understand the suggestion. On if x { if y { foo() }; bar() }; baz(), I think we would have the blocks b0 : if x { b1: if y { b2: foo() }; b3: bar() }; b4: baz(). IIU the suggestion correctly we would have the nodes b1 -> { if y { foo() }; bar() }, b2->{ foo() } and b0, b3, and b4 having no BlockStmt. Is this right? [roughly?]

adonovan · 2024-01-10T20:07:13Z

It would help if I could establish some rule for finding the nearest relevant syntactic element though, like "if a block has a nil location, there will always either be a single predecessor block or single successor block that has non-nil location information".

What if we change the Block.comment field to a public enum, add a Block.Syntax field that links to the {Labeled,Switch,Block,If,Range,For,Branch,Select}Stmt} node associated with the block, and document the relationship at the enum? The only blocks without syntax are the unreachable ones after a jump.

gopherbot · 2024-01-10T21:27:16Z

Change https://go.dev/cl/555255 mentions this issue: go/cfg: record block kind and associated statement

adonovan · 2024-01-10T21:49:54Z

See the attached CL for a sketch of what I was referring to. Please try it out and let me know if this solves your problem.

andrewbaxter · 2024-01-11T05:15:40Z

Sorry, I totally let this slip off the radar. We're still using my hacked up fork locally atm. I'll take a look, and thanks for looking into it 🙇

andrewbaxter · 2024-01-11T05:49:36Z

Oh yeah, that looks awesome! I haven't tried it out yet so I'll try hooking it up, but I don't imagine there'd be any issues with that approach.

andrewbaxter · 2024-01-30T06:49:32Z

Sorry again for how much time it took me. I integrated your branch and everything works as expected! We have a decently sized codebase, so I expect it encounters most combinations of blocks somewhere or another.

I don't have anything else to add, the patch 100% solves the issue from my perspective.

So IIUC if we're walking the blocks from a known reachable block we'll never encounter an unreachable block (tautologically)? So everything will have a non-nill Stmt. I didn't get any NPE in my testing so I believe that's the case.

rsc · 2024-02-01T20:24:15Z

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

adonovan · 2024-02-07T18:29:14Z

Proposed API:

type Block struct {
+    Kind BlockKind
   ...
}

// A BlockKind identifies the purpose of a block.
// It also determines the possible types of its Stmt field.
type BlockKind uint8

const (
	KindInvalid BlockKind = iota

	KindUnreachable     // unreachable block after BranchStmt or no-return call ExprStmt
	KindBody            // function body BlockStmt
	KindForBody         // body of ForStmt
	KindForDone         // block after ForStmt
	KindForLoop         // head of ForStmt
	KindForPost         // post condition of ForStmt
	KindIfDone          // block after IfStmt
	KindIfElse          // else block of IfStmt
	KindIfThen          // then block of IfStmt
	KindLabel           // labeled block of BranchStmt
	KindRangeBody       // body of RangeStmt
	KindRangeDone       // block after RangeStmt
	KindRangeLoop       // head of RangeStmt
	KindSelectCaseBody  // body of SelectStmt
	KindSelectDone      // block after SelectStmt
	KindSelectAfterCase // block after a CommClause
	KindSwitchCaseBody  // body of CaseClause
	KindSwitchDone      // block after {Type.}SwitchStmt
	KindSwitchNextCase  // secondary expression of a multi-expression CaseClause
)

func (kind BlockKind) String() string

timothy-king · 2024-02-07T21:04:12Z

I believe we will also eventually need a kind to represent the basic block for the exit switch of a rangefunc RangeStmt. I think it is okay to leave this out of the proposed API while rangefunc is a GOEXPERIMENT.

adonovan · 2024-02-07T21:08:37Z

I believe we will also eventually need a kind to represent the basic block for the exit switch of a rangefunc RangeStmt. I think it is okay to leave this out of the proposed API while rangefunc is a GOEXPERIMENT.

There's no such thing as an exit switch in this representation, and in any case there's no way for the CFG builder to know whether a RangeStmt's operand is func. I just don't think people use go/cfg to write analyses that care about effects so precisely that they would need to consider such details; that's what go/ssa is for (and still I have never seen a go/ssa client other than the interpreter that really cares about such things).

timothy-king · 2024-02-07T21:26:25Z

I am not yet convinced that we should not provide this level of accuracy in the future. I do agree that we can punt on rangefunc precision for now. And adding a new kind should be its own proposal. We can hash out rangefunc on that proposal.

rsc · 2024-02-09T00:00:01Z

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

Proposal is in #53367 (comment).

rsc · 2024-02-14T23:51:07Z

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

Proposal is in #53367 (comment).

andrewbaxter added the Proposal label Jun 14, 2022

gopherbot added this to the Proposal milestone Jun 14, 2022

ianlancetaylor changed the title ~~proposal: x/tools/go/cfg: Add location information to Blocks~~ proposal: x/tools/go/cfg: add location information to Blocks Jun 15, 2022

ianlancetaylor added this to Incoming in Proposals (old) Jun 15, 2022

andrewbaxter mentioned this issue Jun 16, 2022

Add source location information to cfg blocks golang/tools#385

Open

rsc changed the title ~~proposal: x/tools/go/cfg: add location information to Blocks~~ proposal: x/tools/go/cfg: add AST information to Blocks Feb 8, 2024

rsc added the Proposal-FinalCommentPeriod label Feb 9, 2024

rsc changed the title ~~proposal: x/tools/go/cfg: add AST information to Blocks~~ x/tools/go/cfg: add AST information to Blocks Feb 14, 2024

rsc modified the milestones: Proposal, Backlog Feb 14, 2024

rsc added Proposal-Accepted and removed Proposal-FinalCommentPeriod labels Feb 14, 2024

gopherbot closed this as completed in golang/tools@fb020a0 Feb 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

x/tools/go/cfg: add AST information to Blocks #53367

x/tools/go/cfg: add AST information to Blocks #53367

andrewbaxter commented Jun 14, 2022 •

edited by adonovan

timothy-king commented Jun 14, 2022

andrewbaxter commented Jun 15, 2022 •

edited

andrewbaxter commented Jun 15, 2022 •

edited

timothy-king commented Jun 15, 2022

andrewbaxter commented Jun 16, 2022

timothy-king commented Jun 22, 2022

adonovan commented Jun 22, 2022

andrewbaxter commented Jun 23, 2022 •

edited

timothy-king commented Jun 23, 2022

adonovan commented Jan 10, 2024 •

edited

gopherbot commented Jan 10, 2024

adonovan commented Jan 10, 2024

andrewbaxter commented Jan 11, 2024

andrewbaxter commented Jan 11, 2024

andrewbaxter commented Jan 30, 2024

rsc commented Feb 1, 2024

adonovan commented Feb 7, 2024

timothy-king commented Feb 7, 2024

adonovan commented Feb 7, 2024

timothy-king commented Feb 7, 2024

rsc commented Feb 9, 2024

rsc commented Feb 14, 2024

x/tools/go/cfg: add AST information to Blocks #53367

x/tools/go/cfg: add AST information to Blocks #53367

Comments

andrewbaxter commented Jun 14, 2022 • edited by adonovan

timothy-king commented Jun 14, 2022

andrewbaxter commented Jun 15, 2022 • edited

andrewbaxter commented Jun 15, 2022 • edited

timothy-king commented Jun 15, 2022

andrewbaxter commented Jun 16, 2022

timothy-king commented Jun 22, 2022

adonovan commented Jun 22, 2022

andrewbaxter commented Jun 23, 2022 • edited

timothy-king commented Jun 23, 2022

adonovan commented Jan 10, 2024 • edited

gopherbot commented Jan 10, 2024

adonovan commented Jan 10, 2024

andrewbaxter commented Jan 11, 2024

andrewbaxter commented Jan 11, 2024

andrewbaxter commented Jan 30, 2024

rsc commented Feb 1, 2024

adonovan commented Feb 7, 2024

timothy-king commented Feb 7, 2024

adonovan commented Feb 7, 2024

timothy-king commented Feb 7, 2024

rsc commented Feb 9, 2024

rsc commented Feb 14, 2024

andrewbaxter commented Jun 14, 2022 •

edited by adonovan

andrewbaxter commented Jun 15, 2022 •

edited

andrewbaxter commented Jun 15, 2022 •

edited

andrewbaxter commented Jun 23, 2022 •

edited

adonovan commented Jan 10, 2024 •

edited