Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/tools/go/packages: Consider giving consumers control over type-checking behaviour #66603

Open
fbbdev opened this issue Mar 30, 2024 · 11 comments
Labels
FeatureRequest NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Tools This label describes issues relating to any tools in the x/tools repository.
Milestone

Comments

@fbbdev
Copy link

fbbdev commented Mar 30, 2024

The doc comment for the field packages.Config.ParseFile says:

An application may supply a custom implementation of ParseFile
to change the effective file contents or the behavior of the parser,
or to modify the syntax tree. For example, selectively eliminating
unwanted function bodies can significantly accelerate type checking.

However, doing so (even when preserving semantic correctness) easily results in a bunch of errors being emitted about unused imports and other global identifiers.

Since the type-checker has an option types.Config.IgnoreFuncBodies that controls whether function bodies are checked, it would be nice if consumers of the package loader were able to configure that either per package (ideally) or at least globally.
At the moment, the flag is only used internally for dependencies when NeedDeps is not set.

Another nice addition would be the ability to have distinct but overlapping calls to packages.Load (with the same config) share the same type-checking Objects (see #61418, #61412 for related, but different issues) and maybe even the same package graph. Such a feature would allow incremental, on-demand refinement of package information while still keeping the ability to reason about types globally, and avoiding duplicated work.

@gopherbot gopherbot added the Tools This label describes issues relating to any tools in the x/tools repository. label Mar 30, 2024
@gopherbot gopherbot added this to the Unreleased milestone Mar 30, 2024
@adonovan
Copy link
Member

However, doing so (even when preserving semantic correctness) easily results in a bunch of errors being emitted about unused imports and other global identifiers.

That's true, and it's a little fiddly, but it's not an insurmountable problem. For example, you can discard "unused import" errors in files with discarded function bodies, and if they are the only errors, you can consider the package well typed.

Since the type-checker has an option types.Config.IgnoreFuncBodies that controls whether function bodies are checked, it would be nice if consumers of the package loader were able to configure that either per package (ideally) or at least globally.

That seems reasonable, but I wonder exactly what problem you are trying to solve, because there may be a solution using the existing API. For example, if you're discarding all function bodies in the package, do you need typed syntax at all? It might be that export data contains sufficient information.

Another nice addition would be the ability to have distinct but overlapping calls to packages.Load (with the same config) share the same type-checking Objects (see #61418, #61412 for related, but different issues) and maybe even the same package graph. Such a feature would allow incremental, on-demand refinement of package information while still keeping the ability to reason about types globally, and avoiding duplicated work.

We wanted something like this this from the outset, but never managed to implement it, and I think it may be infeasible to do compatibly at this stage. The main challenge is that later calls to Load may observe a different file-system state from earlier ones, but there is no way to discard just the invalid packages in a type graph; you have to invalidate everything above them too. So some symbols will have the same types.Objects as before, but others (including many that haven't changed at all) will have new types.Objects. Mixing and matching them is extremely prone to errors.

@dmitshur dmitshur added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. FeatureRequest labels Apr 2, 2024
@fbbdev
Copy link
Author

fbbdev commented Apr 29, 2024

Thank you for your prompt answer and sorry for the delay. I needed to check some details before answering but I never got the time to until recently.

That's true, and it's a little fiddly, but it's not an insurmountable problem. For example, you can discard "unused import" errors in files with discarded function bodies, and if they are the only errors, you can consider the package well typed.

This is an interesting workaround, but it's still much more work (and code complexity) than just setting an option. By the way, I noticed recently that go/types even has an option for discarding "unused package" errors...

That seems reasonable, but I wonder exactly what problem you are trying to solve, because there may be a solution using the existing API. For example, if you're discarding all function bodies in the package, do you need typed syntax at all? It might be that export data contains sufficient information.

I am (re)writing a binding generator for a RPC system. The requirements are as follows:

  • first, I have to scan a set of root packages, locate all instantiations of a certain generic function and extract type args;
  • then I have to collect method sets and parameter/result types for said type args and generate equivalent TypeScript signatures including doc comments.

For the first step, I need full typed syntax including function bodies, but just for the root packages.

For the second step, selected types and methods might live in dependencies of the root packages, but export data suffices except for doc comments.

The approach I had in mind initially was to type-check the whole dependency graph, but include function bodies only for the roots. However, the current API does not make this very easy.

Now I came up with an approach that might actually be better: I load and fully typecheck the root set of packages, without dependencies.

If I discover methods or types that belong to a dependency, I load that package in NeedSyntax mode only and match declarations to type-checker objects.

The only issues I'm still battling with are:

  1. I would like to know when a method parameter is typed by an alias, but it seems the type checker resolves all aliases;
  2. I would like to retrieve the declaring syntax for struct fields.

In order to do this manually I need to resolve identifiers in file scopes, but export data has package scopes only.

Maybe a feature that could help here is an automated way to match export data to declaring syntax (returning an error in case of failures). I can surely implement this myself, but IMO it's not a sensible thing to keep in my codebase.

The main challenge is that later calls to Load may observe a different file-system state from earlier ones.

I was concerned about this too. However, after giving it some thought I concluded that the concern is unwarranted.

A single call to packages.Load (or even the Go build tool, for that matter) is already going to incur undefined behaviour if file system state changes concurrently, as it can't possibly access the fs atomically in the general case, nor detect all possible inconsistencies.

Tools that need to be robust w.r.t. file system changes (e.g. gopls) must implement their own monitoring mechanisms.

Non-interactive tools can just warn the user not to touch the filesystem concurrently. At most — assuming they manage to detect inconsistencies — they should just throw an error and stop.

Therefore, I believe this is not actually an issue.

Is my reasoning sound here? If not, could you please help me see how?

@adonovan
Copy link
Member

I think your best bet is to use one of these three approaches:

  • use LoadAllSyntax to load, parse, and type-check the initial packages and all their dependencies; this produces more information than you need, but will give you all the type information and all of the comments.
  • use LoadSyntax to load, parse, and type-check only the initial packages, loading dependencies from export data; then parse the files of the dependencies in order to extract the comments you need.
  • use LoadAllSyntax, but set a custom ParseFile to destroy function bodies of dependencies, and filter out "unused import" errors from the types.Config.Error handler.

The first approach is simpler but slower. The second and third are faster because they avoid complete typed syntax for dependencies. (I don't have a good intuition about which is faster: perhaps the third one since it doesn't involve the compiler?) I don't think any of them should be hard to implement though.

@fbbdev
Copy link
Author

fbbdev commented Apr 29, 2024

So you're suggesting I should invoke the type checker myself; then I would be able to control its configuration. I think I'll experiment some more with alternatives, then I may either go for that or convince people to relax requirements – we'll see. Thank you anyways.

@adonovan
Copy link
Member

So you're suggesting I should invoke the type checker myself

No, that's not necessary. packages.Config.ParseFile provides the hook to destroy function bodies, and packages.Package.TypeErrors contains the list of errors from the type checker via the types.Config.Error handler. You can filter that list after the fact without thinking about types.Config at all.

@timothy-king
Copy link
Contributor

What does "all instantiations of a certain generic function" mean in your context? Do you need only instantiations from non-parameterized code? Or type instantiations from within instantiations the bodies of generic function?

@fbbdev
Copy link
Author

fbbdev commented Apr 29, 2024

So you're suggesting I should invoke the type checker myself

No, that's not necessary.

Sorry, I misinterpreted LoadAllSyntax as if it didn't perform type-checking.

packages.Config.ParseFile provides the hook to destroy function bodies

In the beginning I attempted using packages.Config.ParseFile to destroy function bodies just for dependencies, as the docs suggest. It's problematic, though, because there's no easy way to determine which package a file belongs to. The ParseFile hook does not receive enough information for that purpose.

So I resorted to invoking packages.Load twice: first with NeedCompiledGoFiles, without NeedDeps, to get a list of file paths for the root packages; then in LoadAllSyntax mode, using the list I had obtained before to filter files in the ParseFile hook.

However, I was concerned that this approach might not be robust enough when symbolic links or the overlay build flag are involved. This is partly what pushed me to look for alternatives and eventually open this issue.

And again, this approach drives up code complexity a lot, has greater fragility, and it is a bit absurd considering that the type checker knows perfectly well how to ignore function bodies, and packages.Load has all the information needed to discriminate dependencies...

If that is the only alternative, it makes much more sense to me to invoke the type-checker manually.

@fbbdev
Copy link
Author

fbbdev commented Apr 29, 2024

What does "all instantiations of a certain generic function" mean in your context? Do you need only instantiations from non-parameterized code? Or type instantiations from within instantiations the bodies of generic function?

Ideally, I'd need instantiations from within instantiations of other generic functions too.

As long as those too occur within the root set of packages.

@timothy-king
Copy link
Contributor

Ideally, I'd need instantiations from within instantiations of other generic functions too.

Compilers and x/tools/go/ssa + ssa.InstantiateGenerics are the only tools I am aware of that do this. This is a hard problem. You will need the ast bodies of all of the generic functions that may be reachable wand instantiate what those call. I would start by using x/tools/go/ssa or reading its source code. (You could also relax the goal.)

As long as those too occur within the root set of packages.

What does within the root set of packages mean? There are two parts. The site of the instantiation and the function being instantiated. So are you hoping for the site of the instantiation to be in the root package? The instantiated function is in the root package? Both? (FWIW when the set of root packages is all of the packages, this turns into ssa.InstantiateGenerics.)

Also do you know that LoadAllSyntax cannot work for you for some reason? It may be the path of least resistance and save you time. (Of course x/tools/go/packages could always be better too...)

@adonovan
Copy link
Member

Sorry, I misinterpreted LoadAllSyntax as if it didn't perform type-checking.

You're not the only one. The mode bits are both poorly named and hard to use. (See #63517, #56677, #56633, and my tirade at #48226 (comment).)

I'd need instantiations from within instantiations of other generic functions too.

It sounds like you're attempting to enumerate the complete set of instantiations required by a particular program, which is inherently a whole-program analysis, meaning that it must start from one or more main functions (in main packages). It can't entirely discard function bodies of dependencies, because they are needed in order to record that (for example) if you instantiate function f[T] with T=string, then f needs Set[T] with T=string, which in turn needs Map[K]V with K=string and V=bool, and so on. This is quite a difficult computation, and as @timothy-king said, golang.org/x/tools/go/ssa with mode InstantiateGenerics is the simplest way to compute this set. See for example the golang.org/x/tools/cmd/deadcode command.

The ParseFile hook does not receive enough information for that purpose [inferring the package].

Yeah, that is unfortunate.

@fbbdev
Copy link
Author

fbbdev commented May 2, 2024

Thank you both for your answers.

Also do you know that LoadAllSyntax cannot work for you for some reason? It may be the path of least resistance and save you time. (Of course x/tools/go/packages could always be better too...)

My search for a lighter approach was driven by two considerations:

  • this binding generator is invoked on a hot reload path, so the faster it is, the better (but this should not come at the cost of excessive code complexity);
  • I know for certain that there is gonna be a very large package among the dependencies which would drive up type checking time, but is irrelevant to the static analysis phase; also I don't need to process any package from the stdlib.

I've run some experiments and here's what I found out:

  1. my current approach that typechecks as few packages as possible and loads dependencies in syntax only mode as needed is the fastest (total processing time is ~0.5s±0.1 for each test case in my current suite, with most time spent in packages.Load); however, code complexity is high and I've had quite some problems with missed corner cases;
  2. loading all packages in syntax-only mode and running the typechecker with a custom configuration increases total processing time by ~0.15s±0.05; code complexity is much lower;
  3. using LoadAllSyntax increases total processing time by ~1.1s±0.1, minimises code complexity and should be all in all more future-proof.

At this point I am seriously considering a switch to LoadAllSyntax, as the project's maintainers favour simplicity and robustness over speed, and I do as well.

However, the difference in performance is significant and I still would like very much to see an API for type-checker configuration added to go/packages. A minimal approach could be the addition of global options for IgnoreFuncBodies and DisableUnusedImportCheck; a richer approach could be e.g. the addition of a post-load, pre-check hook that gets (i) a partial packages.Package; (ii) a boolean indicating whether the package matches an input pattern or has been loaded as a dependency. The hook may then (a) alter parsed syntax; (b) decide whether the package should be type-checked or loaded from export data; (c) configure IgnoreFuncBodies and DisableUnusedImportCheck for that package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FeatureRequest NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Tools This label describes issues relating to any tools in the x/tools repository.
Projects
None yet
Development

No branches or pull requests

6 participants
@timothy-king @dmitshur @fbbdev @adonovan @gopherbot and others