Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/tools/go/loader: improve and stabilize API #14120

Closed
alandonovan opened this issue Jan 27, 2016 · 21 comments
Closed

x/tools/go/loader: improve and stabilize API #14120

alandonovan opened this issue Jan 27, 2016 · 21 comments

Comments

@alandonovan
Copy link
Contributor

alandonovan commented Jan 27, 2016

We should simplify and stabilize the go/loader package's API. Before that happens, there are a number of features to add and bugs to fix:

Features:

  1. support ... patterns just like "go build" and "go list" do. (Ideally the logic would be shared. Russ suggested fork/execing "go list", but I don't like this approach: it uses the actual file system instead of the abstraction defined by build.Context.)

  2. Allow packages specified on the command line to use cgo. Currently only imported packages are subject to cgo preprocessing.

  3. Support pkg-config.

API simplifications:

  1. The public fields of loader.Program are sometimes annoying to use. Study existing clients and come up with something more convenient.

  2. Update the go/types tutorial at https://github.com/golang/example/tree/master/gotypes.

Once this is done, we should use go/loader in tools such as golint, govet and stringer.

cc: @griesemer

@rsc rsc added this to the Unreleased milestone Jan 27, 2016
tbg added a commit to tbg/cockroach that referenced this issue Oct 14, 2016
All packages need to be installed before we can run (some) of the checks and
code generators reliably. More precisely, anything that using x/tools/go/loader
is fragile (this includes stringer, vet and others).

The blocking issue is golang/go#14120; see
golang/go#10249 for some more concrete discussion on
`stringer` and golang/go#16086 for `vet`.
@josharian
Copy link
Contributor

Alan, any chance of revisiting this early in 1.9? I'd be happy to be a sounding board.

I just found that even golang.org/x/tools/cmd/gotype doesn't use go/loader, and it pains me to invoke slow, heavy 'go install' just to get some code typechecked.

@alandonovan
Copy link
Contributor Author

alandonovan commented Dec 19, 2016

Hi Josh, thanks for the offer. I would like to finish off the loader but none of these changes are on my plate for Google project work in the near future so I can't promise any schedule.

Do you have a sense of how much churn loader clients outside x/tools can tolerate as we finalize the API? Since Google vendors everything we're insulated from it, but I got a panel question at GopherCon that was effectively "when will you stop breaking things?". Perhaps we need a mailing list of interested parties.

@josharian
Copy link
Contributor

I think that churn during a dev cycle is expected. Once it gets frozen, then the Go 1 compatibility promise kicks in.

This issue could be the mailing list, for a first pass. Off the top of my head, the non-Google people that come to mind who work a lot with the go/* packages are: @dominikh @fatih @valyala

@myitcv
Copy link
Member

myitcv commented Apr 24, 2018

Coming back to this in light of the the 1.10 build cache (and vgo), I still think there is a use case for loading from source, more specifically partially loading from source.

The use case is code that does not (yet) fully compile. In particular I have code generators in mind, but the same could apply to tools like @mdempsky's go/types-based gocode etc where partial results are desirable.

My thinking is that x/tools/go/loader could be changed such that either:

  • x/tools/go/loader uses go/importer if a package is present in the build cache (determined by the changes that will land as part of the https://go-review.googlesource.com/c/go/+/107776 sequence), i.e. is current and compiles, and falls back to using source code loading if not
  • x/tools/go/loader remains as is but uses the supplied Config.TypeChecker.Importer if it implements types.ImporterFrom (with that implementation doing the build cache check), falling back to the current behaviour if nil, nil is returned (or some such defined behaviour)

Thoughts?

@dominikh
Copy link
Member

Loading from export data and loading from source give drastically different results. Export data doesn't provide enough information about function bodies, unexported objects, associations with the AST, and so on. People who use go/loader usually depend on it being a source importer, and changing that would break a lot of existing tools such as errcheck, staticcheck and many more. That renders your first option untenable.

@alandonovan
Copy link
Contributor Author

alandonovan commented Apr 24, 2018 via email

@myitcv
Copy link
Member

myitcv commented Apr 24, 2018

Thanks @alandonovan, great to hear this is being worked on.

Much more efficient is to load source code only for the
requested packages, and to use export data (compiler-generated .a files)
for their dependencies.

This was exactly the thinking behind my second option. Sounds good.

The new x/tools/go/loader API, which we plan to call x/tools/go/packages

Is the plan still to have @rsc's go list -deps work-chain and preliminary module support land in Go 1.11 and have your changes build on top of that?

The metadata query function will shell out to an external
query tool appropriate to the current workspace.

This might not be the appropriate place, but I'm going to ask the question in any case and we can move discussion elsewhere as required.

I use godef extensively, as do many others. With the arrival of vgo I think editors are going to have to develop some concept of context for source code navigation. At the moment they can use godef context-free. With vgo this ceases to be the case; import resolution happens in the context of a go.mod file. I guess this point may also be true for other build systems or go.mod equivalents.

I make this point because I assume external tooling is going to have to supply this context to x/tools/go/packages and the like in order for resolution to properly function.

I might have totally missed something obvious here...

In the (unlikely) event I haven't, none of this is, I suspect, news to you, but I'm guessing/hoping what you share in a couple of weeks will cover this point?

Thanks

@dominikh
Copy link
Member

@alandonovan I hope you can consider the following points for the design of the go/packages API:

For staticcheck I've been working on my own go/loader and my own export data format.

Much like your plans, my loader uses export data if available and up to date, and source code otherwise. My export data, however, is different from that of any existing Go toolchain in that it stores a full representation of the AST and all type information, so that the end result is identical to parsing from source. I would hope that your new API would allow me to use my own export data instead of being forced to use that produced by the build system (go, vgo, bazel, …)

I've been working on my entirely own loader because the go/loader API lacks certain associations that I need in my tools. Specifically, I need to map between all the different domains of go/token, go/ast, go/build, go/types and go/ssa. I need to be able to map from go/types.Package to loader.Package, from loader.Package to build.Package, from go/ast.File to loader.Package and even from token.File to ast.File.

go/loader doesn't maintain some of these mappings (for example it discards go/build.Packages), and relies on expensive operations like PathEnclosingInterval for others.

@josharian
Copy link
Contributor

Excited to hear this is being worked on. I look forward to reviewing the API, for little things like being able to (optionally?) discover the size of a file prior to reading it.

@alandonovan
Copy link
Contributor Author

alandonovan commented Apr 25, 2018

Paul:

Is the plan still to have @rsc's go list -deps work-chain and preliminary module support land in Go 1.11 and have your changes build on top of that?

Exactly. The -deps flag was added primarily so that we could obtain package metadata in a single invocation. (Currently it requires no fewer than three.)

With the arrival of vgo I think editors are going to have to develop some concept of context for source code navigation. At the moment they can use godef context-free. With vgo this ceases to be the case; import resolution happens in the context of a go.mod file.

Well observed. Future analysis tools will require the working directory as an input, as the name of the file or directory on which to operate, even if absolute, will no longer be sufficient.

Dominik:

My export data, however, is different from that of any existing Go toolchain in that it stores a full representation of the AST and all type information, so that the end result is identical to parsing from source. I would hope that your new API would allow me to use my own export data instead of being forced to use that produced by the build system (go, vgo, bazel, …)

Interesting. We already have serial formats for ASTs (Go syntax) and for types (exportdata). Is it significantly more efficient to reconstruct typed ASTs by deserializing a third, combined format, rather than simply re-parsing and re-typechecking? Or is your goal to avoid the dichotomy between packages loaded from export data (which don't have typed syntax trees) and those produced by the type checker, which do?

I need to map between all the different domains... go/loader doesn't maintain some of these mappings and relies on expensive operations ... for others.

Yes, the plethora of different entities that are 1:1 with, say, a package, can be confusing, but I'm surprised that performance (not complexity or mere inconvenience) of the operations for converting from one representation to another was a significant problem. Can you point me at any examples where these conversions were so burdensome that they motivated your current work?

Josh:

I look forward to reviewing the API, for little things like being able to (optionally?) discover the size of a file prior to reading it.

Thanks. That requirement was not on my radar.

@dominikh
Copy link
Member

Interesting. We already have serial formats for ASTs (Go syntax) and for types (exportdata). Is it significantly more efficient to reconstruct typed ASTs by deserializing a third, combined format, rather than simply re-parsing and re-typechecking? Or is your goal to avoid the dichotomy between packages loaded from export data (which don't have typed syntax trees) and those produced by the type checker, which do?

For my tools, I always need the full AST (including comments), as well as the full type information (not just the export data), to compute various information (such as "is this function pure?", "is this function deprecated?" and soon taint analysis). Existing export data doesn't record this information, and even if it did, I continuously need new information computed, which makes it simpler to just serialize full type information + AST, than adding more and more aggregate information.

To draw a comparison, imagine guru's peers query. Imagine this: one package, which is loaded from source, passes a channel to a function in another package, which is loaded from export data. How would you implement the peers query so that it can point out individual channel operations in the function?

Yes, the plethora of different entities that are 1:1 with, say, a package, can be confusing, but I'm surprised that performance (not complexity or mere inconvenience) of the operations for converting from one representation to another was a significant problem. Can you point me at any examples where these conversions were so burdensome that they motivated your current work?

Convenience is certainly one factor. With go/loader, staticcheck computes a lot of the mappings based on the results of go/loader, which is certainly doable. Some of that, however, involves duplicating work, such as going back to disk to find and partially parse go/build packages.

@mvdan
Copy link
Member

mvdan commented Apr 27, 2018

I'd like to add to the point about custom export formats. I believe that what Dominik wants is to be able to cache extra data on top of the existing export data that gc produces. That may be the detailed syntax tree, or it might be mappings between go/ast and go/types, or it might be other static analysis computations that a linter would like to cache to speed up consecutive runs (such as "is this func pure?").

Would it not be possible to add this to the interface, as a way to attach extra information to each package's cached data? I imagine said extra data could be invalidated with the existing data such as type information. The extra interface could be similar to context.Context, mapping interface{} keys to interface{} values.

I'm not sure if that would remove the need to support arbitrary export formats, but at least it seems closer to what most Go tool authors would need. It also seems like it would be easier to do, rather than designing an entirely separate export format.

@dominikh
Copy link
Member

@mvdan Maintaining the go/ast <-> go/types mapping is one important aspect of my work, yes. The other, however, is retaining all information about unexported types.

@myitcv
Copy link
Member

myitcv commented Apr 27, 2018

@alandonovan

With the arrival of vgo I think editors are going to have to develop some concept of context for source code navigation. At the moment they can use godef context-free. With vgo this ceases to be the case; import resolution happens in the context of a go.mod file.

Well observed. Future analysis tools will require the working directory as an input, as the name of the file or directory on which to operate, even if absolute, will no longer be sufficient.

I pondered for a bit why you said "require the working directory" (emphasis my own) instead of go.mod file as input here. My guess would be that your thinking is that one can find the go.mod file from the working directory?

But I'm now struggling to see how the go/types.ImporterFrom interface holds up here; does that now not need to be extended?

type ImporterFrom interface {
    Importer
    ImportFrom(path, dir string, mode ImportMode) (*Package, error)
    ImportFrom2(path, dir, ctxtDir string, mode ImportMode) (*Package, error)
}

The suitably-poorly-named ImportFrom2 having three parameters:

  • path - the (possibly relative) import path of the target package (because we're stuck with relative imports paths until Go 2)
  • dir - the directory containing the package file that contains an import of path
  • ctxtDir - the "working directory" to use your terminology of the current module (i.e. a means of resolving the current go.mod)

All of this might well be pre-empting what you're going to share... in which case please just say and I'll hold my tongue for now!

@jimmyfrasche
Copy link
Member

@alandonovan that sounds very much like what I've always wanted ❤️.

However, a recurring pain point for me with the current system has been build tags. I doubt a new system will be able to (or even should) cover all of these concerns but I'll lay them out and hope that at least some will be covered, here or elsewhere:

The current tags used are all in build.Context, but not necessarily all in BuildTags so serializing them to pass to a -tags flag when invoking an external command has to be done by hand. It would be nice if there were an easy way to get all the build tags used to load a program as a []string.

Conversely, it would be nice to take a []string of tags and the analog of build.Context in the new system and return a version with the appropriate fields set while inheriting everything else appropriately. (Similar in purpose to buildutil.TagsFlag but it sets stuff like GOARCH so that the context can be easily introspected without having to know what all the tags mean).

It would also be nice to know, file by file, what build tags from the Context, if any, triggered that file's inclusion. (I want this primarily to list files in a godoc type interface, annotated with why they were included).

@rogpeppe
Copy link
Contributor

rogpeppe commented May 1, 2018

I like the general idea, but I'm concerned that almost none of my projects that currently use go/build will be able to use this new package, because almost all of them require access to source code. For example, one extracts doc comments; another prints file names of dependencies.

I'm also concerned about tags - some tools require a coherent source tree that can build; others care more about seeing all source files regardless of tag combination. In the latter group, I see most tools that do package dependency analysis - I generally want to know about all possible dependencies, not just the ones for one particular combination of build tags.

FWIW, the heuristic I used in my godeps (dependency locking) tool for tags is to treat any build tag expression as true for known OS and architecture tags even when negated, and fall back to the default tag matching for other tags. (code here)

@alandonovan
Copy link
Contributor Author

alandonovan commented Jun 5, 2018

Sorry for radio silence; this project is inching along.

Please take a look at the API outlined in https://go-review.googlesource.com/c/tools/+/116359. Feedback welcome.

dominikh: you should find the new loader's WholeProgram mode gives you a complete source program, just as the current loader does today.

But you hint at an interesting problem, that of building modular analyses that pass information derived from inspecting a lower-level package to the analysis of a higher-level one. I've been working on an "Analysis API" for package-at-a-time analyses that allows vet-like checkers to be expressed independent of the driver tool (such as vet, 'go test', 'go build', web-based CI or code review tools, etc). I hope it will enable us to write, share, and re-use checkers across tools more easily. Currently I'm working on a way for each checker to get at facts deduced during checking of lower-level packages such as "function p.F delegates to printf" or "function p.F never returns". Checkers may depend on other checkers both within a unit and across units. Facts must be serializable so they may be passed between units, allowing "separate analysis" analogous to separate compilation. I don't plan to specify the binary encoding; each fact can choose its own. I would love your comments on it (will share link soon).

myitcv: the ImporterFrom interface doesn't have a "working directory" parameter but it's trivial to implement it using (in effect) a closure over the loader's working directory (os.Getwd by default).

jimmyfrasche, rogpeppe: re: build tags, I have not thought through the implications of build tags yet, but it's on the list. I don't expect we'll provide a mechanism to explain which tags were significant, or which files were excluded from compilation based on tags (or other aspects of configuration) as it appears infeasible to implement on all build systems. A recurring challenge has been the range of tools that use go/build, which defies easy generalization. A tool that runs the type-checker must necessarily follow the compiler's definition of a package---a concrete list of files that make up one compilation unit under a specific configuration---whereas a tool that, say, enumerates the source files of a project needs to generalize across all possible configurations. The new go/packages API biases towards the former, not the latter. (This is problematic even for type-based tools: a refactoring tool must modify all the files of a package, not just the ones for the specific configuration that was analyzed by the type checker.)

@myitcv
Copy link
Member

myitcv commented Jun 6, 2018

the ImporterFrom interface doesn't have a "working directory" parameter but it's trivial to implement it using (in effect) a closure over the loader's working directory (os.Getwd by default).

Thanks. Given the details you've shared in the CL, I think this becomes a moot question in any case.

@myitcv
Copy link
Member

myitcv commented Jun 7, 2018

To add two related questions here:

go generate

Currently the interface go generate presents to the generators it runs is (taken from go help generate):

  • The following environment variables:
$GOARCH
        The execution architecture (arm, amd64, etc.)
$GOOS
        The execution operating system (linux, windows, etc.)
$GOFILE
        The base name of the file.
$GOLINE
        The line number of the directive in the source file.
$GOPACKAGE
        The name of the package of the file containing the directive.
$DOLLAR
        A dollar sign.
  • A working directory that corresponds to the go list .Dir of the package being go generate-d

So I think we're missing the working directory of go generate itself as part of this interface (possibly another environment variable, GOGENERATEDIR?). A generator needs this value to correctly use a Loader.

Editors

At least as far as I've understood, it seems editors are going going to need to retain some sort of state relating to the current vgo module (or the directory containing the vgo module at least, the module itself is probably more intuitive to the user). Reason being, this directory will then be required for all calls to external tools that use the Loader: gocode, goimports etc.

Is it worth fleshing this out as we go? That way we can communicate something to editor folks/plugin writers etc as soon as possible? Because it seems the tool changes by themselves (to gocode, goimports etc) won’t be sufficient.

Thoughts?

@kardianos
Copy link
Contributor

@alandonovan

The new go/packages API biases towards the former, not the latter. (This is problematic even for type-based tools: a refactoring tool must modify all the files of a package, not just the ones for the specific configuration that was analyzed by the type checker.)

I was hoping for go/packages to be a generalized loader that didn't need to pay attention to specific GOOS and GOARCH or other tags, but I discovered using go/packages in the current API will be impossible. I also discovered that there is a field called "Flags" that tags can be passed through to the underlying tools. (The issue that I filed was #26697 .)

This seems like a mistake to me (both counts) and will force packages will be ported over to using like guru and gorename to be forced to ignore files that guarded by OS/ARCH/other tags, which is not what I want the tools to do at all, ever.

Perhaps the underlying tools are hard to discover a maximal graph (might be the case with bazel?) Then maybe return E_NOT_SUPPORTED and tools could fall back using "build mode" and setting a specific GOOS/GOARCH/tags.

Anyway, I have a specific need that requires the more general mode and I could work on this enabling this if that was desired. If not I'll just solve my case in my own repo. Thanks.

@mvdan
Copy link
Member

mvdan commented Mar 21, 2019

I think we should close this issue, as discussion on go/packages has been happening elsewhere for almost a year. go/loader will be deprecated any day now, so I don't think the original issue is worth keeping open. /cc @matloob @ianthehat

@golang golang locked and limited conversation to collaborators Mar 20, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

10 participants