Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

go/parser: (*ast.File).Decls is empty after bailout #70725

Open
mateusz834 opened this issue Dec 8, 2024 · 11 comments
Open

go/parser: (*ast.File).Decls is empty after bailout #70725

mateusz834 opened this issue Dec 8, 2024 · 11 comments
Labels
NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Milestone

Comments

@mateusz834
Copy link
Member

Currently when parser.ParseFile is executed without parser.AllErrors the parsing is stopped, but the (*ast.File).Decls field is not populated with the parsed (valid and invalid) declarations:

Reproducer:

func TestEarlyBailout(t *testing.T) {
	const src = `package test
func test() {
	validCode()
}

func parseError() { var a }
func parseError() { var a }
func parseError() { var a }
func parseError() { var a }
func parseError() { var a }
func parseError() { var a }
func parseError() { var a }
func parseError() { var a }
func parseError() { var a }
func parseError() { var a }
func parseError() { var a }
func parseError() { var a }
`

	fset := token.NewFileSet()
	f, _ := ParseFile(fset, "test.go", src, SkipObjectResolution)
	if len(f.Decls) == 0 {
		t.Fatal("no decls found") // fails
	}
}

I think that we should return at least a partial AST in this case, or we should document this behavior.

CC @adonovan @griesemer

@gabyhelp
Copy link

gabyhelp commented Dec 8, 2024

Related Issues

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

@mateusz834
Copy link
Member Author

Also same thing happens with all these fields:

go/src/go/parser/parser.go

Lines 2902 to 2911 in c8fb6ae

f := &ast.File{
Doc: doc,
Package: pos,
Name: ident,
Decls: decls,
// File{Start,End} are set by the defer in the caller.
Imports: p.imports,
Comments: p.comments,
GoVersion: p.goVersion,
}

@mateusz834 mateusz834 added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Dec 8, 2024
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/634537 mentions this issue: go/parser: fill decls after bailout

@griesemer
Copy link
Contributor

griesemer commented Dec 10, 2024

The documentation for ParseFile already says that the returned AST is partial in case of an error, so arguably an empty Decls list is a "partial" list. But it may be unsatisfactory.

The core of the problem is that (all?) parse functions collect the ASTs of their respective sub-ASTs in local variables, and then produce the top-level AST node at the end. If a bailout happens, all the intermediate work is lost. In case of the File node, the intermediate work is the entire file contents. But that's only part of the problem: if the final return in a parse function is not executed, the result is nil, even if the respective top-level AST node was populated. To fix that one would need to name the result parameters.

Perhaps we should allocate the top-level nodes first and populate them as we go along (that's what the syntax package does), and use their name as a result parameter. That would address this (and possibly other cases).

cc @findleyr for input

@griesemer griesemer added this to the Backlog milestone Dec 10, 2024
@griesemer griesemer added NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Dec 10, 2024
@adonovan
Copy link
Member

adonovan commented Dec 10, 2024

I have always understood the default behavior to mean "abort parsing at the first error", in which case File.Decls must obviously be incomplete. By contrast, AllErrors means keep going until the end of the file, at which point File.Decls will have one entry per import and package-level func/var/const/type.

I would be happy to document this behavior, but I don't yet see any reason to change it.

@mateusz834
Copy link
Member Author

mateusz834 commented Dec 10, 2024

I have always understood the default behavior to mean "abort parsing at the first error", in which case File.Decls must obviously be incomplete.

Not so obviously, actually it is the 10th error that causes a bailout, also the errors might not be in the first declaration, but in the second, in that case i would expect to have the first decl returned.

It's hard for me to find a case where this would matter in practice — why someone would try to process a *ast.file with errors without specifying AllErrors. The only case I can think of is people being confused about why the *ast.File ends up essentially empty in such situations (like in my case), and why Package/Imports field is not set, while in PackageClauseOnly or ImportsOnly modes, it would have been set.

I would be happy to document this behavior, but I don't yet see any reason to change it.

That is fine, but also looking at the current docs, nothing really mentions that AllErrors causes the parsing to stop, it only says "report all errors". There is no correlation in the docs between not setting the AllErrors and the returned *ast.File.

Perhaps we should allocate the top-level nodes first and populate them as we go along (that's what the syntax package does), and use their name as a result parameter. That would address this (and possibly other cases).

Oh, this approach sound nice (but not sure whether the complexity is worth it), it might also fix: #70731

@adonovan
Copy link
Member

Not so obviously, actually it is the 10th error that causes a bailout, also the errors might not be in the first declaration, but in the second, in that case i would expect to have the first decl returned.

Quite right; I misspoke, but my point was that it will give up.

It's hard for me to find a case where this would matter in practice — why someone would try to process a *ast.File with errors without specifying AllErrors.

For many clients (e.g. a compiler), there are only two interesting outcomes: complete success, or failure with informative (but not overwhelming) error messages. For such clients, the default behavior makes sense.

The only case I can think of is people being confused about why the *ast.File ends up essentially empty in such situations (like in my case), and why Package/Imports field is not set, while in PackageClauseOnly or ImportsOnly modes, it would have been set.

FWIW, those two modes are for quickly extracting project structure information from a file while doing the least amount of parsing work.

I would be happy to document this behavior, but I don't yet see any reason to change it.

That is fine, but also looking at the current docs, nothing really mentions that AllErrors causes the parsing to stop, it only says "report all errors". There is no correlation in the docs between not setting the AllErrors and the returned *ast.File.

Yes, many of our doc comments are a little too brief. The temptation to keep inline comments short enough to stop them from wrapping the line can be a disincentive to recording things that really ought to be recorded.

@mateusz834
Copy link
Member Author

For many clients (e.g. a compiler), there are only two interesting outcomes: complete success, or failure with informative (but not overwhelming) error messages. For such clients, the default behavior makes sense.

Right, right, but it those cases (i assume) we don't process further the *ast.File, we display the error directly to the user (and ignore the *ast.File).

@adonovan
Copy link
Member

why someone would try to process a *ast.File with errors without specifying AllErrors

I agree, that does seem like an odd thing to do.

@findleyr
Copy link
Member

findleyr commented Dec 10, 2024

Returning 10 errors seems very much like overfitting to an old command-line use case (probably gofmt).

I agree with @adonovan, and think the fix here is documentation: if you care about the resulting tree in the presence of errors, you should be using AllErrors.

Doing more work when AllErrors is not set seems like a regression: presumably it is better for most callers to fail fast (as is currently done).

Perhaps we should allocate the top-level nodes first and populate them as we go along (that's what the syntax package does), and use their name as a result parameter. That would address this (and possibly other cases).

I don't think we should do this. There are already fewer invariants for invalid parse trees, which can be a challenge when developing tools. If we return "whatever was set" when we bail out, then even those invariants go out the window.

@griesemer
Copy link
Contributor

FWIW: The Go compiler reports 10 errors and then stops. Hence the 10.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Projects
None yet
Development

No branches or pull requests

6 participants