go/parser: allow continuing parsing after package only / imports only #24273

josharian · 2018-03-06T19:16:35Z

I sometimes find myself wanting to parse only the imports (or only the package) from a file and then, if it is interesting, continue parsing the rest of the file.

go/parser doesn't support this right now. Instead you have to choose a non-ideal workaround:

read the entire file up front: memory inefficient)
seek to the beginning: only works with an io.Seeker, but e.g. go/build only guarantees an io.ReadCloser
use io.TeeReader and bytes.Buffer and io.MultiReader to cache the read bytes and then replay them: inefficient, fiddly
use io.TeeReader and io.Pipe to simultaneously start multiple parsers: wasteful in the early exit case

It'd be nice if instead we could just start the parser in package-only or imports-only mode and then ask the parser to keep going.

If we decide not to fix this for go/parser, it might still be worth addressing in the new parser's API.

cc @griesemer @mvdan

mvdan · 2018-03-06T19:31:55Z

This is an interesting idea. What happens at the moment with other cases like this one, such as when go build has to figure out whether files are excluded or not from a build depending on their build tags?

It'd be nice if instead we could just start the parser in package-only or imports-only mode and then ask the parser to keep going.

Have you given thought to a lower-level API? For example, besides the func that parses an entire file, funcs that parse a package clause, import declarations, and a top-level declaration. That could have other exit-early uses, for example if you were only interested in a const declared near the top of a file. And it could potentially mean less knobs like ImportsOnly.

josharian · 2018-03-06T19:39:14Z

Yeah, a "step by step" API would work nicely. First step package clause, then each subsequent step is one top level declaration. (Note that it might take several such steps to get through all imports.)

One downside is that the object resolution that go/parser does couldn't be completed for file-scoped and package-scoped objects until everything is parsed, which might be confusing.

josharian · 2018-03-06T19:39:51Z

Also would have to think about how this interacts with parsing and association of comments.

mvdan · 2018-03-06T19:45:51Z

Also would have to think about how this interacts with parsing and association of comments.

I was thinking in the happy land where comments are all attached to nodes and never floating around :) With that in mind, it's a similar case to object resolution. You aren't guaranteed to have all the comments in their right places until the entire file has been parsed (for example, for trailing comments right after a declaration).

It would also be neat if the func parsing a file was written in terms of those lower-level funcs. Then, if one needed some custom logic, they would just have to copy the ten-line func body and change it.

mvdan · 2018-03-06T19:50:44Z

Going back to go/parser vs syntax - I personally think the lower-level API is a bit too invasive to include in the existing go/parser. It would require introducing a type and a bunch of methods. And the new signatures would likely not match ParseFile much at all.

But I don't see an easier way to fit this into the existing API. Another parser mode can't fix it, as there's still no way to tell the parser to stop or continue. And you'd always have to deal with the free-floating comments, which at least would make the API more verbose.

mvdan · 2018-03-06T19:59:53Z

Here's a somewhat related idea - it would be nice if the parser also allowed parsing other kinds of nodes alone. At the moment, only an entire file or a single expression are possible. I would find it very useful if it also allowed parsing of a single declaration (as discussed above), a single type expression, and a single statement, among others.

At the moment, the only workaround is to use templating to form a full file, which is what a package of mine does. However, that results in lots of boilerplate code, confusing errors if something goes wrong, and position information that is very wrong. That's what gofmt -r does too, I think. See: https://github.com/mvdan/gogrep/blob/fb1920eae0dfe92b189205768dc459e0e5ad21d8/expr_parse.go#L86-L219

josharian · 2018-03-06T20:07:06Z

I believe that github.com/josharian/impl does some of this as well.

There is https://golang.org/pkg/go/parser/#ParseExpr and https://golang.org/pkg/go/parser/#ParseExprFrom.

I can imagine that decl parsing API might be one building block of an incremental parser, though.

griesemer · 2018-03-07T05:48:32Z

@josharian The go/parser has has Mode argument, and one of the modes is ImportsOnly. Is that what you are looking for?

josharian · 2018-03-07T06:22:27Z

No. I want to parse ImportsOnly and then, based on the results, decide whether or not to continue parsing the rest of the file.

griesemer · 2018-03-07T16:23:12Z

@josharian It may be cheap enough to use ImportsOnly first, and based on the result parse the entire file again. Another option could be to provide some kind of callback (but that would change the API).

griesemer · 2018-03-07T21:17:09Z

An "incremental" parser makes the API more complex and requires more consistency checks; it's also harder to guarantee the result. One point of the new syntax package is that it should have a simpler API (excluding the AST nodes for a moment). I was envisioning that a single Parse function should be able to determine what it parses and do the right thing (package, declaration, statement, expression). The result will tell what it parsed.

Regarding the option to parse up to a certain point and continue on demand: I'd do this with a callback where the callback has a chance to inspect the current AST and (probably) the next token. Either way, it has to be essentially zero cost when not needed.

But unless you have severe time constraints, I think there's a work-around right now as I suggested in #24273 (comment) .

ianlancetaylor · 2018-03-12T20:37:33Z

ImportsOnly ought to be fast enough, and if not we should make it faster. It doesn't seem necessary to add more API here.

Decision: We won't do this.

josharian · 2018-03-12T20:55:39Z

I'm not going to appeal this--it seems like a reasonable decision--but just to set the record straight, the issue is not really around how fast/slow ImportsOnly is. It's about the awkwardness of managing the input, if you don't want to read the entire file into memory. See the list of workarounds in the original bug report.

griesemer · 2018-03-12T21:40:40Z

@josharian Understood. The parser (the scanner, really) requires the file to be in memory after all (at the bottom it doesn't use a reader, it uses a []byte). There's no easy way to change that so that not all of the file is loaded into memory if not needed; besides doing the work outside somehow. Which is another reason we don't want to change this API which is about 10 years old at this point.

The new syntax package uses an io.Reader all the way down and thus only reads what is needed (i.e., buffered).

josharian added the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Mar 6, 2018

josharian added this to the Unplanned milestone Mar 6, 2018

josharian mentioned this issue Mar 6, 2018

x/tools/cmd/guru: consider avoiding reading entire file in globalReferrersPkgLevel #24274

Open

ianlancetaylor closed this as completed Mar 12, 2018

golang locked and limited conversation to collaborators Mar 12, 2019

gopherbot added the FrozenDueToAge label Mar 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

go/parser: allow continuing parsing after package only / imports only #24273

go/parser: allow continuing parsing after package only / imports only #24273

josharian commented Mar 6, 2018

mvdan commented Mar 6, 2018

josharian commented Mar 6, 2018

josharian commented Mar 6, 2018

mvdan commented Mar 6, 2018

mvdan commented Mar 6, 2018

mvdan commented Mar 6, 2018

josharian commented Mar 6, 2018

griesemer commented Mar 7, 2018

josharian commented Mar 7, 2018

griesemer commented Mar 7, 2018

griesemer commented Mar 7, 2018

ianlancetaylor commented Mar 12, 2018

josharian commented Mar 12, 2018

griesemer commented Mar 12, 2018

go/parser: allow continuing parsing after package only / imports only #24273

go/parser: allow continuing parsing after package only / imports only #24273

Comments

josharian commented Mar 6, 2018

mvdan commented Mar 6, 2018

josharian commented Mar 6, 2018

josharian commented Mar 6, 2018

mvdan commented Mar 6, 2018

mvdan commented Mar 6, 2018

mvdan commented Mar 6, 2018

josharian commented Mar 6, 2018

griesemer commented Mar 7, 2018

josharian commented Mar 7, 2018

griesemer commented Mar 7, 2018

griesemer commented Mar 7, 2018

ianlancetaylor commented Mar 12, 2018

josharian commented Mar 12, 2018

griesemer commented Mar 12, 2018