Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: x/tools/go/analysis: change singlechecker/multichecker to allow customization of package loading #53215

Open
hyangah opened this issue Jun 3, 2022 · 18 comments
Assignees
Labels
Analysis Issues related to static analysis (vet, x/tools/go/analysis) Proposal Tools This label describes issues relating to any tools in the x/tools repository.
Milestone

Comments

@hyangah
Copy link
Contributor

hyangah commented Jun 3, 2022

Background

The existing singlechecker.Main and multichecker.Main are simple:
they take package patterns string args (e.g. ./...), loaded them with a hard-coded go/packages.Config, and applied analyzers on the loaded packages. They also hand flags common across many analyzers, and handle error parsing, etc.

They are convenient, but not flexible enough to support analyzers that need special setup, more involved go/packages.Config (e.g. excluding tests, loading modules, applying overlays). An example analyzer I want to write:

  • It needs to initialize its state based on the module information loadable with go/packages.NeedModules load mode. Current APIs don't use NeedModules nor expose the loaded packages.

  • The initialization needs to access the loaded packages - e.g. this loads vulnerability info from a remote DB, and the loaded go/packages.Package and their modules are needed to retrieve only relevant info.

  • Since singlechecker.Main and multichecker.Main handle flag parsing, attempting to parse flags or command arguments before them will clobber their flag parsing logic.

Potential workaround is either to load the packages twice or to write the analyzer in a way that the very first call of its Run triggers the remaining initialization work. , which isn't ideal.

I propose we introduce a new API that provides hooks and potential customization options. With the change users can

  • Use a different load mode, configuration, or even filter/fake the packages.
  • Inject an extra initialization logic that requires loaded packages before running analysis.

Option 1: A new Run that takes a Config parameter:

package singlechecker

type Config = checker.Config

func Run(cfg Config, analyzers ...*analysis.Analyzer) { ... }

func Main(analyzers ...*analysis.Analyzer) {
  Run(Config{}, analyzers...)
}
package checker // golang.org/x/tools/go/analysis/internal/checker

type Config struct {
   // Load is called once at the start of Run to load packages specified by patterns.
   Load func(mode packages.LoadMode, patterns []string) ([]*packages.Package, error)
}

Option 2: A package-level global variable for stubbing:

We can consider a package-level global Load or Config variable that affects the existing Main's functionality.

package multichecker

// Load is a load function used by Main when loading packages to analyze.
// This is exported so users can inject their custom package load logic.
var Load = func(mode packages.LoadMode, patterns []string) ([]*packages.Package, error) { /* default load */ }

// Main uses Loader to load packages. 
func Main(analyzers... *analysis.Analyzer) { ... }

Since singlechecker and multichecker are used as the 'main' of an analysis tool and already made use of many process-wide global variables, I expect the introduction of a global, exported variable is not too bad.

cc @golang/tools-team @adonovan @timothy-king

@gopherbot gopherbot added the Tools This label describes issues relating to any tools in the x/tools repository. label Jun 3, 2022
@gopherbot gopherbot added this to the Unreleased milestone Jun 3, 2022
@timothy-king
Copy link
Contributor

It needs to initialize its state based on the module information loadable with go/packages.NeedModules load mode.

We can forward information from *packages.Modules to the *Pass when a flag is set on the Analyzer (like RunDespiteErrors). This would still be per *Package, but it would be more information. Would that suffice? If this would help, that should probably be its own proposal. (I vaguely recall this being discussed before. I did not find it.)

expose the loaded packages

I don't think I know what this means. Can you be more specific about the API?

The initialization needs to access the loaded packages - e.g. this loads vulnerability info from a remote DB, and the loaded go/packages.Package and their modules are needed to retrieve only relevant info.

It is not yet clear to me whether this will fit into the {single,multichecker} framework yet. Config will need to make sense within unitchecker mode and split across processes. Not clear if this is compatible with your goals or not yet.

Also the checkers cache on *Pass, which are conceptually (*Analyzer, *Package) pairs. For a remote db, it will need to be possible to invalidate a *Pass based on an external source of data while in unitchecker mode.

We can potentially expose a different part of internal/analysis too. Examples include a different Main function in another package or a more direct Run function that does not support unitchecker mode (or uses it differently). I am not sure I understand the requirements well enough to know if this would be compatible with your goals.

Option 2: A package-level global variable for stubbing:

2cents: I am not seeing an advantage of doing Option 2 over Option 1 yet. One could just pass Load as parameter.

@gopherbot
Copy link

Change https://go.dev/cl/410254 mentions this issue: go/analysis: add Config, Run in single/multichecker

@hyangah
Copy link
Contributor Author

hyangah commented Jun 3, 2022

We can forward information from *packages.Modules to the *Pass when a flag is set on the Analyzer (like RunDespiteErrors). This would still be per *Package, but it would be more information. Would that suffice? If this would help, that should probably be its own proposal. (I vaguely recall this being discussed before. I did not find it.)

The kind of initialization I want to do should run outside the analysis - for example, accessing the network to load extra info such as vulndb queries. We don't want to do that in the middle of some analyzer step is running in arbitrary time. We want the network access to run in predictable manner.

https://go-review.googlesource.com/c/tools/+/410125/1 demonstrates a simple example analyzer that controls package loading and injects custom initialization logic using a proposed api (option 1)

At this moment, unitchecker isn't my focus, but I expect that can be also extended in a similar style (each distributed process runs a kind of initialization)

I am not seeing an advantage of doing Option 2 over Option 1 yet. One could just pass Load as parameter.

Smaller API surface - Instead of two ways (Run and Main), there is only one way to start analysis. I expect most analyzers will not need to switch to the new API.

@adonovan
Copy link
Member

adonovan commented Jun 3, 2022

I'm not convinced that either of these options is actually the best solution to the specific problem at hand (that we discussed in person), which is that you have an analysis (similar to https://go-review.googlesource.com/c/tools/+/408715) you wish to apply to all the packages of a program in topological order, but your analysis also needs global information about the program (a set of vulndb entries) that forces you to call packages.Load before doing the analysis. I think this indicates that the analysis simply isn't an instance of the kind of modular analysis that the go/analysis framework was designed to support. It would be relatively straightforward to modify the analysis to perform its own parallel depth-first traversal over the go/packages.Package graph. It wouldn't need to use Facts because it could assume a single address space. (Indeed, I developed the sketch in the CL above using go/packages directly and then converted it to go/analysis once it was working. It wasn't a difficult change and it would be easy to undo.)

An alternative approach would be to design an interface to the vulndb that allows each go/analysis Pass to query the database piecemeal as it encounters the need. This would of course make more calls to the database (one per package, or perhaps as few as one per package on the critical path) but they could be cached in the local file system to avoid repeat calls from developer machines.

@hyangah
Copy link
Contributor Author

hyangah commented Jun 3, 2022

It wouldn't need to use Facts because it could assume a single address space. (Indeed, I developed the sketch in the CL above using go/packages directly and then converted it to go/analysis once it was working. It wasn't a difficult change and it would be easy to undo.)

Module approach in single address space is still beneficial because it allows incremental updates. For example, when a user is editing a package, we don't need to recompute the whole program again but can use the facts from previous analysis for unchanged packages.

We can invent our own wheel to allow such incremental analysis but writing this in a custom way would make it harder to be adopted by existing linters or gopls.

An alternative approach would be to design an interface to the vulndb that allows each go/analysis Pass to query the database piecemeal as it encounters the need.

This is what I want to avoid. In gopls, for example, diagnostics and analysis run can occur at arbitrary time while users edits a file. I expect the vulndb query and local copy update to be completely decoupled from the actual analysis (either updated periodically, or only when go.mod change is detected)

@hyangah
Copy link
Contributor Author

hyangah commented Jun 3, 2022

@adonovan and I had a chat and we decided to explore a different approach
where the analyzer takes the input from local source specified with analyzer-specific
flags, and initializes within the Run if necessary.

https://go-review.googlesource.com/c/tools/+/410126

is an attempt to demonstrate this approach.

I think we can optimize the local source format further.

@hyangah
Copy link
Contributor Author

hyangah commented Jun 6, 2022

Analyzer-specific flags that specify input files or state to sideload (e.g. --vulns-file in https://go-review.googlesource.com/c/tools/+/410126) allows analyzers to load state lazily as their run is invoked first time.

Still this doesn't solve the original problem -

I agree that it's not ideal to expose the dependency on go/packages from singlechecker/multichecker's APIs, but given that singlechecker/multichecker didn't work without go/packages API for a while, I wonder if it's time to embrace go/packages in the singlechecker/multichecker (note: this issue is about singlechecker/multichecker APIs, not go/analysis package).

@adonovan
Copy link
Member

adonovan commented Jun 6, 2022

I think it would be reasonable to expose an API for running analyzers on a whole program loaded using go/packages, that allows the client to parse flags, load the program, and analyze it under its control, and to process the results without necessarily printing them. The underlying code would be shared with singlechecker and multichecker. Some care is required to ensure that the program is loaded using a configuration that is acceptable to the analysis that follows. And the Analyzer / Pass API should never depend on go/packages.

It would also be ok to expose go/packages configuration options through the single/multichecker CLI, such as the -test flag you added today. But single/multichecker are really not intended to be used as anything other than the main function of a standalone program.

@timothy-king
Copy link
Contributor

+1 to everything Alan said (except calling it "whole program"). I think it is reasonable to expose what is effectively a lower level API for Run() from a different package than {single,multi}checker. As an example a check.Run(, []analysis.Analyzers, []*packages.Package) could produce a ([]Facts, Result, []Diagnostics) for each provided top level package. Essentially generate the actions+passes needed from preload packages, execute those actions, and then expose the results of the directly requested passes. There are a number of details to debate for such an API so this not a real suggestion yet. But it would solve the problem I am seeing in https://go-review.googlesource.com/c/tools/+/410125/1 and https://go-review.googlesource.com/c/tools/+/410126. What this would likely need to give up is providing a modular mode that is compatible with go vet -vettool=....

@timothy-king
Copy link
Contributor

FWIW a hack you could do shove this into a modular analyzer is to split into 2 processes that:

  1. fetch the remote db, determine if an update happened compared to the last run (likely with manually managed files), and generate a flag value --timestamp representing the last update, and
  2. run a {single,multi}checker with the externally computed --timestamp flag.

This should force the modular analysis machinery to discard all of the previous results. The resulting analyzer would be tricky to run from command line but this may not be an issue for you. This does not solve custom loading.

@hyangah
Copy link
Contributor Author

hyangah commented Jun 6, 2022

I think exposing part of checker API addresses many issues that were closed without actually being addressed

#30231
#30219
#31007
#31897
#50265
#53063
...

We've seen many lint-like tools came up with their own solutions and copying of checker package internals (which I think is totally fine, but would be nice if we can help people avoiding such copying when possible.)

Re: #53215 (comment) That is what https://go-review.googlesource.com/c/tools/+/410126 is already doing - except --timestamp is an implementation detail specific to the type of analyzers and I think it's out of scope.

@ianlancetaylor
Copy link
Contributor

What is the current proposal? Thanks.

@ianlancetaylor ianlancetaylor added this to Incoming in Proposals (old) Jun 8, 2022
@rsc
Copy link
Contributor

rsc commented Jun 8, 2022

I suspect both of these should be deprecated in favor of a better package.
I couldn't use them the last time I tried, and forking them was a pain due to use of internal packages.

@gopherbot
Copy link

Change https://go.dev/cl/411907 mentions this issue: go/analysis/checker: a go/packages-based driver library

@rsc rsc changed the title x/tools/go/analysis: change singlechecker/multichecker to allow customization of package loading proposal: x/tools/go/analysis: change singlechecker/multichecker to allow customization of package loading Jun 22, 2022
SilverRainZ pushed a commit to SilverRainZ/tools that referenced this issue Aug 24, 2022
The {single,multi}checker packages provide the main function
for a complete application, as a black box. Many users want
the ability to customize the analyzer behavior with additional
logic, as described in the attached issues.

This change creates a new package, go/analysis/checker, that
exposes an Analyze pure function---one that avoids global flags,
os.Exit, logging, profiling, and other side effects---that
runs a set of analyzers on a set of packages loaded (by the
client) using go/packages, and presents the graph of results
in a form that allows postprocessing.

This is just a sketch. API feedback welcome.

DO NOT SUBMIT

Updates golang/go#30231
Updates golang/go#30219
Updates golang/go#31007
Updates golang/go#31897
Updates golang/go#50265
Updates golang/go#53215
Updates golang/go#53336

Change-Id: I745d319a587dca506564a4624b52a7f1eb5f4751
@adonovan
Copy link
Member

A sketch of a new API of components for building go/packages-based analysis drivers can be found in https://go.dev/cl/411907.

This CL contains an Analyze function that takes an extensible Input struct, initially listing packages and analyzers, and returns an extensible Output struct, initially containing the roots of the action graph. An action is a step of work done by the checker, identified by the pair (analyzer, package). The rest of the action graph can be found by chasing pointers, analogous to go/packages. Each action exposes its analyzer, package, error, result, diagnostics, and facts.

The API in that CL is still not sufficient to implement all of multichecker. We still need functions for:

(It is possible to test most analyzers today with analysistest, but it too is a monolith that could be broken into reusable chunks of functionality--to load packages, run the checkers, verify the expectations, verify the suggested fixes--while also allowing ad hoc test logic to access all of the result data structures that are currently encapsulated.)

I have not sketched the API for these four features yet. My initial instinct was to publish a set of functions similar in form to the existing ones, but @hyangah raised the possibility that several of these operations might usefully be considered instances of a general post-processing interface, perhaps similar to the one in golangci-lint, which is used for a range of tasks including:

filtering diagnostics:

  • skip diagnostics in generated files
  • adjust cgo file names and skip some fishy cgo diagnostics
  • skip diagnostics with //nolint annotations
  • limit the number of diagnostics from a given analyzer or file
  • deduplicate diagnostics
  • discard diagnostics of low severity
  • skip diagnostics in certain files or directories

transforming diagnostics:

  • add a prefix to each file name
  • retrofit quotation marks around %s portions of specific diagnostics that should have been %q
  • remove unnecessary prefix from each file name
  • sort
  • attach text of source line to each diagnostic

and side effects:

  • apply quick fixes

It's not obvious to me that we need the full generality of another extensible plug-in mechanism, but clearly the question warrants more thought.

At this stage, the main questions to decide are:

  1. is the initial API sketched in the CL acceptable?
  2. should we proceed with it, or first produce a complete proposal for the other features listed above?

@zpavlinovic
Copy link
Contributor

FWIW, I recently stumbled on what I believe is a good use case of this: in a server environment, run vet checkers and store the results in a database. I believe this could be accomplished using this driver: use packages.Load to load the packages, run vet analyzers on it, and in the post-processing step save the results to the db.

@adonovan
Copy link
Member

adonovan commented Sep 29, 2022

That does seem like a totally reasonable program to want to write, but the way I would write it today without the checker API is by fork+execing the multichecker executable and saving its JSON output. The overheads of process startup compared to everything else are negligible, and there are isolation benefits to using multiple processes.

@timothy-king
Copy link
Contributor

but the way I would write it today without the checker API is by fork+execing the multichecker executable and saving its JSON output.

This is a fair point. But it does impose some complexity on users. Most likely in the form of bundling an extra executable. Not insurmountable, but comparatively inconvenient.

SilverRainZ pushed a commit to SilverRainZ/tools that referenced this issue Apr 10, 2023
The {single,multi}checker packages provide the main function
for a complete application, as a black box. Many users want
the ability to customize the analyzer behavior with additional
logic, as described in the attached issues.

This change creates a new package, go/analysis/checker, that
exposes an Analyze pure function---one that avoids global flags,
os.Exit, logging, profiling, and other side effects---that
runs a set of analyzers on a set of packages loaded (by the
client) using go/packages, and presents the graph of results
in a form that allows postprocessing.

This is just a sketch. API feedback welcome.

DO NOT SUBMIT

Updates golang/go#30231
Updates golang/go#30219
Updates golang/go#31007
Updates golang/go#31897
Updates golang/go#50265
Updates golang/go#53215
Updates golang/go#53336

Change-Id: I745d319a587dca506564a4624b52a7f1eb5f4751
SilverRainZ pushed a commit to SilverRainZ/tools that referenced this issue Apr 11, 2023
The {single,multi}checker packages provide the main function
for a complete application, as a black box. Many users want
the ability to customize the analyzer behavior with additional
logic, as described in the attached issues.

This change creates a new package, go/analysis/checker, that
exposes an Analyze pure function---one that avoids global flags,
os.Exit, logging, profiling, and other side effects---that
runs a set of analyzers on a set of packages loaded (by the
client) using go/packages, and presents the graph of results
in a form that allows postprocessing.

This is just a sketch. API feedback welcome.

DO NOT SUBMIT

Updates golang/go#30231
Updates golang/go#30219
Updates golang/go#31007
Updates golang/go#31897
Updates golang/go#50265
Updates golang/go#53215
Updates golang/go#53336

Change-Id: I745d319a587dca506564a4624b52a7f1eb5f4751
@adonovan adonovan added the Analysis Issues related to static analysis (vet, x/tools/go/analysis) label Apr 22, 2023
@adonovan adonovan self-assigned this Jun 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Analysis Issues related to static analysis (vet, x/tools/go/analysis) Proposal Tools This label describes issues relating to any tools in the x/tools repository.
Projects
Status: Incoming
Development

No branches or pull requests

7 participants