Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/tools/go/analysis/checker: an importable analysis driver based on go/packages #61324

Open
adonovan opened this issue Jul 12, 2023 · 36 comments

Comments

@adonovan
Copy link
Member

adonovan commented Jul 12, 2023

We propose to create a new package, golang.org/x/tools/go/analysis/checker, to provide a function, Analyze, that applies a set of analyzers to a set of packages and returns a data structure providing access to the complete results of the operation.

Background: the go/analysis framework defines an interface for static checkers that inspect a parsed, type-checked Go package and report mistakes or suggest fixes. The interface allows checkers (e.g. printf, nilness) to be written independent of the driver, and drivers (e.g. go vet) to be written independent of the checkers. The x/tools repo provides two drivers. The first, unitchecker, used by go vet, performs "separate analysis", of one package at a time, using serialized types and facts for dependencies. The second, internal/checker, uses go/packages to load packages and all their dependencies from source. Currently it has two public interfaces, singlechecker and multichecker, that differ only trivially: a singlechecker has only one analyzer, whereas a multichecker contains many and has a slightly more complicated command-line interface. However, both of these public interfaces consist only of a Main function that does the entire job and then calls os.Exit.

Users reported that this was not a very useful building block, and that they wanted to incorporate analysis into larger applications, or add some extra steps after package loading but before analysis, or some extra postprocessing of the analysis results. See:

We propose to create a new package with the following public API.

package checker // golang.org/x/tools/go/analysis/checker

// Analyze runs the specified analyzers on the initial packages.
//
// For each analyzer that uses facts, Analyze also runs it on all the
// dependencies of the initial packages. (In this case the program
// must have been loaded using the packages.LoadAllSyntax flag.)
//
// On success, it returns a Result whose Roots holds one item per (a,
// p) in the cross-product of analyzers and pkgs.
func Analyze(analyzers []*analysis.Analyzer, pkgs []*packages.Package, opts *Options) (*Graph, error)

type Options struct {
	// Verbose     bool // -v: log each step  // NOTE: removed from earlier draft.
	Sequential  bool // -p: disable parallelism
	SanityCheck bool // -s: perform additional sanity checks
	ShowFacts   bool // -f: log each exported fact

	/* more fields in future */
}

// Graph holds the results of a batch of analysis, including
// information about the requested actions (analyzers applied to
// packages) plus any dependent actions that it was necessary to
// compute.
type Graph struct { // NOTE: was Result in earlier draft
	// Roots contains the roots of the action graph.
	// Each node (a, p) in the action graph represents the
	// application of one analyzer a to one package p. (A node
	// thus corresponds to one analysis.Pass instance.)
	// The root actions are the product Input.Packages ×
	// Input.Analyzers.
	//
	// Each element of Action.Deps represents an edge in the
	// action graph: a dependency from one action to another.
	// An edge of the form (a, p) -> (a, p2) indicates that the
	// analysis of package p requires information ("facts") from
	// the same analyzer applied to one of p's dependencies, p2.
	// An edge of the form (a, p) -> (a2, p) indicates that the
	// analysis of package p requires information ("results")
	// from a different analyzer applied to the same package.
	// Such edges are sometimes termed "vertical" and "horizontal",
	// respectively.
	Roots []*Action

	/* more fields in future */
}

// An Action represents one unit of analysis work by the driver: the
// application of one analysis to one package. It provides the inputs
// to and records the outputs of a single analysis.Pass.
//
// Actions form a DAG, both within a package (as different analyzers
// are applied, either in sequence or parallel), and across packages
// (as dependencies are analyzed).
type Action struct {
	Analyzer    *analysis.Analyzer
	Package         *packages.Package // NOTE: was Pkg in earlier draft.
	IsRoot      bool // whether this is a root node of the graph
	Deps        []*Action
	Result      interface{} // computed result of Analyzer.run, if any (and if IsRoot)
	Err         error       // error result of Analyzer.run
	Diagnostics []analysis.Diagnostic
	Duration    time.Duration // execution time of this step

	/* more fields in future */
}
func (act *Action) AllObjectFacts() []analysis.ObjectFact
func (act *Action) AllPackageFacts() []analysis.PackageFact
func (act *Action) ObjectFact(obj types.Object, ptr analysis.Fact) bool
func (act *Action) PackageFact(pkg *types.Package, ptr analysis.Fact) bool

// -- utilities (all pure functions) --

func (*Graph) Visit(f func(*Action) error) error) 
// NOTE: was ForEach in earlier draft:
// func ForEach(roots []*Action, f func(*Action) error) error 

func (*Graph) JSONDiagnostics(w.io.Writer) 
func (*Graph) TextDiagnostics(w.io.Writer, contextLines int) 
// NOTE: in earlier draft, these were:
// func PrintDiagnostics(out io.Writer, roots []*Action, context int) (exitcode int)
// func PrintJSON(out io.Writer, roots []*Action)

// NOTE: removed from earlier draft
// func PrintTiming(out io.Writer, roots []*Action)

// NOTE: Removed from earlier draft; will be dealt with later:
// type FixFilterFunc = func(*Action, analysis.Diagnostic, analysis.SuggestedFix) bool
// func ApplyFixes(actions []*Action, filter FixFilterFunc) error

The primary entry point is Analyze; it accepts a set of packages and analyzers and runs a unit of analysis (an "Action") for each element of their cross product, plus all necessary horizontal and vertical dependencies. The result is exposed as a graph, only the roots of which are provided in the Result struct; this approach is similar to the one used in go/packages.Load.

To reduce churn and skew, the above API code is just the declarations but not all of the comments. You can see the complete API and implementation in https://go.dev/cl/411907.

Questions:

  • should the package name be something other than checker, so as not to suggest commonality with unitchecker? "srcchecker", perhaps? (If we could start over, I would use "driver" for {single,multi,unit}checker and this package, and use "checker" for analysis.Analyzer.)
  • see also the [rough, evolving] analysistest proposal proposal: x/tools/go/analysis/analysistest: improved, module-aware API #61336. Does this proposal have implications for analysistest?
@gopherbot
Copy link

Change https://go.dev/cl/411907 mentions this issue: go/analysis/checker: a go/packages-based driver library

@lfolger
Copy link
Contributor

lfolger commented Jul 14, 2023

I like the proposal and have two questions about this.

  1. Do you have an opinion how error handling works in with this package, especially in the light of panics?
    Imagine you are running multiple analyzers on multiple packages . If one of these analyzers panics during the analysis does the Analyze recover the panic?
    If not, it would prevent you from getting any result for any of the analyzers on any package.

  2. Is is possible to make the facts part of the output/input?
    This would allow users to shard the execution of many packages into batches and distribute them between workers (potentially in different processes on different machines).

@adonovan
Copy link
Member Author

adonovan commented Jul 14, 2023

  1. Do you have an opinion how error handling works in with this package, especially in the light of panics?
    Imagine you are running multiple analyzers on multiple packages . If one of these analyzers panics during the analysis does the Analyze recover the panic?

The go/analysis/checker driver uses defer + recover around Analyzer.Run, so if the analyzer is single-threaded (as ~all are), it will catch the panic and treat it as a failure of that Action. Like compilation, all downstream actions will not be executed, but independent actions will continue as usual. All this information can be gleaned from the action graph returned by checker.Analyze.

  1. Is is possible to make the facts part of the output/input?

The Fact data structures are part of the output: see the All{Object,Package}Facts and {Object,Package}Fact methods. (Their encoding isn't, but the point of internal/checker is that it loads, parses, type-checks and analyzes everything from source, so the encoding of types and facts don't come into play.)

This would allow users to shard the execution of many packages into batches and distribute them between workers (potentially in different processes on different machines).

If you want to distribute units of analysis, you should be using unitchecker, or something like it (it's not a lot of code), as it assumes that intermediate results between units are serialized. Perhaps I should add an Example in go/analysis (but not checker) that shows how to construct a system analogous to go vet + unitchecker: in other words, the parent task constructs the graph of units of work and the child task(s) do the actual work, using serialized intermediaries for types and facts.

@lfolger
Copy link
Contributor

lfolger commented Jul 14, 2023

I see. I think I missed this part:

For each analyzer that uses facts, Analyze also runs it on all the dependencies of the initial packages.

Thanks for clarifying (I should learn to read comment 😄 )

@jba
Copy link
Contributor

jba commented Jul 14, 2023

I think checker is a fine name.

Since this is a non-main package, I don't think it should log anything. Consider instead providing that functionality as callbacks. For instance, replace Verbose with OnStep func(s string) (or maybe something richer than a string).

@adonovan
Copy link
Member Author

Since this is a non-main package, I don't think it should log anything.

Good point. On closer inspection, it only logs one message, a constant, at the very beginning. We can simply get rid of Verbose. Thanks for pointing this out.

@Antonboom
Copy link

Antonboom commented Jul 29, 2023

@adonovan Hello. Thank u for proposal!

My question is still the same – how I can reuse helpful flags from single/multichecker and append to them my custom flags?

@adonovan
Copy link
Member Author

adonovan commented Aug 7, 2023

how I can reuse helpful flags from single/multichecker and append to them my custom flags?

This proposal doesn't include a "convenience" API for flags, but it does allow you to control all the relevant pieces of the driver and the analyzers from the outside:

  • The various verbosity flags (-debug) can be be set through fields of checker.Options;
  • Profiling (e.g. -cpuprofile) can be enabled by making the usual calls to the runtime;
  • Tests (-test) can be loaded by setting packages.Config.Tests;
  • Fixes (-fix) can be applied by calling checker.ApplyFixes.
  • Flags for each analyzer can be registered using some variation on :
	for _, a := range analyzers {
		a.Flags.VisitAll(func(f *flag.Flag) {
			flag.Var(f.Value, a.Name + "." + f.Name, f.Usage) // --analyzer.flag=...
		})
	}

	flag.Parse()

We can always add more convenience functions later as clear needs arise, but the primary goal here is to make the tricky logic of the driver available in a form other than a complete application Main function.

@Antonboom
Copy link

Antonboom commented Aug 25, 2023

Thank u, I see.

checker.ApplyFixes and checker.Print* is a great steps to solving the problem I'm talking about.

At the moment we are hostages of multi/singlechecker.Main.

My linter help example:

Flags:
  -V    print version and exit
  -all
        no effect (deprecated)
  -c int
        display offending line with this many lines of context (default -1)
  -checked-types value
        coma separated list (default chan,func,iface,map,ptr)
  -cpuprofile string
        write CPU profile to this file
  -debug string
        debug flags, any subset of "fpstv"
  -fix
        apply all suggested fixes
  -flags
        print analyzer flags in JSON
  -json
        emit JSON output
  -memprofile string
        write memory profile to this file
  -source
        no effect (deprecated)
  -tags string
        no effect (deprecated)
  -test
        indicates whether test files should be analyzed, too (default true)
  -trace string
        write trace log to this file
  -v    no effect (deprecated)

Issues:

  1. You have no possibility to tune flag set. E.g. if linter works only with tests then -test doesn't make sence.
  1. You have no possibility remove a lot of deprecated flags needed only for govet backward compatibility.

  2. It's not obvious what flags are "service" (e.g. -trace, -cpuprofile) and what are related to linter functionality (e.g. -checked-types).

  3. If you want to make custom flag set and don't use singlechecker.Main but you are interested in useful flags, you should write (or copy-paste) on yourself own:

  • profiling code
  • tracing code
  • fixing code
  • printing code (version, flags, diagnostic – plain and json)

@adonovan
Copy link
Member Author

Thanks @Antonboom, I think you are saying that the proposal addresses the four issues you listed and that you are in favor of accepting it. I hope we can get it approved soon. (Please do correct me if I have misunderstood or if you think we should change the proposed API in some way.)

@Antonboom
Copy link

Please do correct me if I have misunderstood

No, no. I'm waiting this change a lot.

I attached this example just with the hope that maybe it will push you to new thoughts.

For example:

  1. Existing checker.Run contains some logic around package loading. But in this proposal it is a headache of package user?
  2. Profiling/tracing code should be copypasted from checker.Run? Maybe it's ok...
  3. Similar for print flags and version functionality.
  4. etc?

@adonovan
Copy link
Member Author

I attached this example just with the hope that maybe it will push you to new thoughts.

Great, thanks. Yes, in all three cases the new API makes these problems the client responsibility:

For example:

  1. Existing checker.Run contains some logic around package loading. But in this proposal it is a headache of package user?

The amount of code in the load function is very small, and client applications are likely going to want fine-grained control over it, so I'd rather not introduce a helper function just yet. (The proposed API does state its requirements for the Mode flag, but that's all it needs to say.)

  1. Profiling/tracing code should be copypasted from checker.Run? Maybe it's ok...

This logic is unrelated to analysis; it's there because singlechecker.Main is a main function, and any significant application has logic like this in its main function.

  1. Similar for print flags and version functionality.

We may want to add some helper functions for flags, but it's surprisingly fiddly to come up with a good API for it, so I thought I would leave it out for now and see what needs arise. The important thing is that clients are unstuck, even if they have to copy a little code.

@rsc
Copy link
Contributor

rsc commented Aug 30, 2023

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@findleyr
Copy link
Contributor

findleyr commented Sep 8, 2023

I am in favor of this proposal. However, I think we should consider having this package operate on an fs.FS, rather than the os files system itself. It seems possible that certain applications (e.g. gopls, or a tool like it) might want to operate on a file state other than the file system. Do you think this would be a reasonable use of this new package?

I suppose we can add FS support later as an Options.FS field, but wanted to pose the question. For example, the ApplyFixes helper is necessarily tightly coupled to the OS. Perhaps we should elide that helper for now, or give it a more extensible API?

In general, I'm not sure about adding all the helpers in this package. They seem more related to a particular command-line interface than they do to the code of the analysis algorithm. But I don't feel strongly.

@adonovan
Copy link
Member Author

adonovan commented Sep 8, 2023

we should consider having this package operate on an fs.FS, rather than the os files system itself.
I suppose we can add FS support later as an Options.FS field

I assume you are primarily referring to ApplyFixes. It does seem reasonable to parameterize it over the file system in some way. An FS would be suitable for reading, but we still need a means to write the results. (There's no writable equivalent of FS, and proposal #45757 was declined.) I'll have a think.

As for Analyze, the client is expected to have already obtained the packages.Packages, so it's too late to affect how files are read. That said, some analyzers do call os.OpenFile and I recently filed a proposal #62292 to add an OpenFile operation to analysis.Pass, in which case this interface would want an FS field so that reads done within the analyzer are consistent with whatever overlays were passed to go/packages.

In general, I'm not sure about adding all the helpers in this package. They seem more related to a particular command-line interface than they do to the code of the analysis algorithm.

The three Print functions aren't fundamental. They could be left for a later proposal, but I've included them to make it easy to build something like unitchecker without having to copy tons of code. The PrintDiagnostics function could also be usefully parameterized over an FS since it prints lines of context which it must read from somewhere.

@findleyr
Copy link
Contributor

findleyr commented Sep 8, 2023

As for Analyze, the client is expected to have already obtained the packages.Packages, so it's too late to affect how files are read.

FWIW go/packages accepts overlays -- that's how gopls uses it.

@adonovan
Copy link
Member Author

adonovan commented Sep 8, 2023

go/packages accepts overlays -- that's how gopls uses it.

Understood, but my point is that the client does the packages.Load call before the call to Analyze, so Analyze has no need for the file system (at least until #62292 is accepted).

@findleyr
Copy link
Contributor

findleyr commented Sep 8, 2023

Understood, but my point is that the client does the packages.Load call before the call to Analyze, so Analyze has no need for the file system (at least until #62292 is accepted).

Ack, sorry I misinterpreted your statement as saying that go/packages would have necessarily read from the OS. In that case, yes it does appear that this is only relevant to ApplyFixes. Should we accept a ReadFile/WriteFile interface?

@adonovan
Copy link
Member Author

adonovan commented Sep 8, 2023

Should we accept a ReadFile/WriteFile interface?

Yes, those two would be almost sufficient, but the implementation of ApplyFixes currently uses the robustio.FileID type (as a map key) to make it safe in the presence of file-system level aliasing: symbolic links, hard links, case insensitivity, path uncleanness. The algorithm could be expressed in terms of os.SameFile but only at the expense of a quadratic number of calls. So in addition to {Read,Write}File we'd need a FileID(filename) (any, error) method that returns an opaque comparable value whose equivalence relation corresponds to "same inode". Seems like a can of worms.

@adonovan
Copy link
Member Author

adonovan commented Sep 9, 2023

The quadratic term can be reduced to a trivial coefficient by using the content as the primary distinguishing key, calling SameFile only when the contents match, which should be rare for Go source files. The FS interfaces (even StatFS) don't seem to have an abstraction of os.SameFile, so we would need to users to provide it. Or we could say that, if users provide a custom FS, then file-system level aliasing becomes their problem.

@adonovan
Copy link
Member Author

After discussion with Russ, I have removed a number of functions and simplified some of the names and signatures. See revised first note in this issue, or updated CL.

@adonovan
Copy link
Member Author

Understood, but my point is that the client does the packages.Load call before the call to Analyze, so Analyze has no need for the file system (at least until #62292 is accepted).

Ack, sorry I misinterpreted your statement as saying that go/packages would have necessarily read from the OS. In that case, yes it does appear that this is only relevant to ApplyFixes. Should we accept a ReadFile/WriteFile interface?

ApplyFixes has been dropped from the proposal for now, in part because of the confusion of abstraction levels: the input is provided as a data structure produced from files read by the client calling packages.Load, but the ApplyFixes operation interacts with the file system directly. We should probably consider all these overlay and FS-related issues holistically in #62292.

@rsc
Copy link
Contributor

rsc commented Oct 27, 2023

Dropping the comments to make the API easier to skim, I think we are down to:

package checker // golang.org/x/tools/go/analysis/checker

func Analyze(analyzers []*analysis.Analyzer, pkgs []*packages.Package, opts *Options) (*Graph, error)

type Options struct {
	Sequential  bool // -p: disable parallelism
	SanityCheck bool // -s: perform additional sanity checks
	ShowFacts   bool // -f: log each exported fact
}

type Graph struct {
	Roots []*Action
}

func (*Graph) Visit(f func(*Action) error) error) 
func (*Graph) JSONDiagnostics(w.io.Writer) 
func (*Graph) TextDiagnostics(w.io.Writer, contextLines int) 

type Action struct {
	Analyzer    *analysis.Analyzer
	Package         *packages.Package // NOTE: was Pkg in earlier draft.
	IsRoot      bool // whether this is a root node of the graph
	Deps        []*Action
	Result      interface{} // computed result of Analyzer.run, if any (and if IsRoot)
	Err         error       // error result of Analyzer.run
	Diagnostics []analysis.Diagnostic
	Duration    time.Duration // execution time of this step
}

func (a *Action) AllObjectFacts() []analysis.ObjectFact
func (a *Action) AllPackageFacts() []analysis.PackageFact
func (a *Action) ObjectFact(obj types.Object, ptr analysis.Fact) bool
func (a *Action) PackageFact(pkg *types.Package, ptr analysis.Fact) bool

The JSONDiagnostics and TextDiagnostics names seem wrong to me. I had suggested PrintJSON and PrintText, which make the action clearer and sort next to each other.

For Options, I am still not sure about mentioning the flags. When we spoke last week I thought I understood you to say that unitchecker or multichecker define those flags, but I can't find any evidence of that in the x/tools repo:

% git grep -n 'flag.*"[psf]"'
cmd/godex/godex.go:20:	source  = flag.String("s", "", "only consider packages from src, where src is one of the supported compilers")
cmd/goyacc/yacc.go:167:	flag.StringVar(&prefix, "p", "yy", "name prefix to use in generated code")
cmd/stress/stress.go:35:	flagP       = flag.Int("p", runtime.NumCPU(), "run `N` processes in parallel")
% git grep -n 'Bool.*"[psf]"'
% 

@adonovan
Copy link
Member Author

adonovan commented Oct 27, 2023

The JSONDiagnostics and TextDiagnostics names seem wrong to me. I had suggested PrintJSON and PrintText, which make the action clearer and sort next to each other.

I can change the names easily enough. The fact that they don't call fmt.Print, but write to a writer, made me think Print wasn't quite the right verb, but I don't feel strongly.

More importantly, they write to an io.Writer but don't return an error. Should I make them return an error, or return their result as a []byte? I am inclined toward the former.

For Options, I am still not sure about mentioning the flags. When we spoke last week I thought I understood you to say that unitchecker or multichecker define those flags, but I can't find any evidence of that in the x/tools repo:

I misspoke, and misremembered when I wrote the doc comments "-p: ..." etc above: they are not regular flags, but single-letter arguments to a single -debug=[fpstv]* flag.

@rsc
Copy link
Contributor

rsc commented Oct 27, 2023

Returning an error is fine. Better not to assume one giant []byte.

@cespare
Copy link
Contributor

cespare commented Oct 27, 2023

@dominikh's feedback may be useful here.

@jba
Copy link
Contributor

jba commented Oct 27, 2023

Here is what the standard library has to say about writing to an io.Writer. I've removed names like Encode and Execute that are clearly wrong.

Write is the most common prefix, though I think To also works.

bufio.Reader.WriteTo(w io.Writer) (n int64, err error)
bytes.Buffer.WriteTo(w io.Writer) (n int64, err error)
bytes.Reader.WriteTo(w io.Writer) (n int64, err error)
strings.Reader.WriteTo(w io.Writer) (n int64, err error)
index/suffixarray.Index.Write(w io.Writer) error
go/types.Scope.WriteTo(w io.Writer, n int, recurse bool)
net.Buffers.WriteTo(w io.Writer) (n int64, err error)
net/http.Request.Write(w io.Writer) error
net/http.Response.Write(w io.Writer) error
net/http.Header.Write(w io.Writer) error
runtime/pprof.Profile.WriteTo(w io.Writer, debug int) error
go/scanner.PrintError(w io.Writer, err error)
go/doc.ToHTML(w io.Writer, text string, words map[string]string)
go/doc.ToText(w io.Writer, text string, prefix, codePrefix string, width int)
encoding/binary.Write(w io.Writer, order ByteOrder, data any) error
runtime/coverage.WriteMeta(w io.Writer) error
runtime/coverage.WriteCounters(w io.Writer) error
runtime/pprof.WriteHeapProfile(w io.Writer) error

@rsc
Copy link
Contributor

rsc commented Nov 1, 2023

scanner.PrintError seems like sufficient justification for PrintJSON and PrintText.

@adonovan
Copy link
Member Author

adonovan commented Nov 1, 2023

Ok, Print it is.

package checker // golang.org/x/tools/go/analysis/checker

func Analyze(analyzers []*analysis.Analyzer, pkgs []*packages.Package, opts *Options) (*Graph, error)

type Options struct {
	Sequential  bool // disable parallelism
	SanityCheck bool // perform additional sanity checks
	ShowFacts   bool // log each exported fact
}

type Graph struct {
	Roots []*Action
}

func (*Graph) Visit(f func(*Action) error) error) 
func (*Graph) PrintJSON(w.io.Writer) error
func (*Graph) PrintText(w.io.Writer, contextLines int)  error

type Action struct {
	Analyzer    *analysis.Analyzer
	Package         *packages.Package // NOTE: was Pkg in earlier draft.
	IsRoot      bool // whether this is a root node of the graph
	Deps        []*Action
	Result      interface{} // computed result of Analyzer.run, if any (and if IsRoot)
	Err         error       // error result of Analyzer.run
	Diagnostics []analysis.Diagnostic
	Duration    time.Duration // execution time of this step
}

func (a *Action) AllObjectFacts() []analysis.ObjectFact
func (a *Action) AllPackageFacts() []analysis.PackageFact
func (a *Action) ObjectFact(obj types.Object, ptr analysis.Fact) bool
func (a *Action) PackageFact(pkg *types.Package, ptr analysis.Fact) bool

@rsc
Copy link
Contributor

rsc commented Nov 2, 2023

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

package checker // golang.org/x/tools/go/analysis/checker

func Analyze(analyzers []*analysis.Analyzer, pkgs []*packages.Package, opts *Options) (*Graph, error)

type Options struct {
	Sequential  bool // disable parallelism
	SanityCheck bool // perform additional sanity checks
	ShowFacts   bool // log each exported fact
}

type Graph struct {
	Roots []*Action
}

func (*Graph) Visit(f func(*Action) error) error) 
func (*Graph) PrintJSON(w.io.Writer) error
func (*Graph) PrintText(w.io.Writer, contextLines int)  error

type Action struct {
	Analyzer    *analysis.Analyzer
	Package     *packages.Package
	IsRoot      bool // whether this is a root node of the graph
	Deps        []*Action
	Result      interface{} // computed result of Analyzer.run, if any (and if IsRoot)
	Err         error       // error result of Analyzer.run
	Diagnostics []analysis.Diagnostic
	Duration    time.Duration // execution time of this step
}

func (a *Action) AllObjectFacts() []analysis.ObjectFact
func (a *Action) AllPackageFacts() []analysis.PackageFact
func (a *Action) ObjectFact(obj types.Object, ptr analysis.Fact) bool
func (a *Action) PackageFact(pkg *types.Package, ptr analysis.Fact) bool

@dominikh
Copy link
Member

dominikh commented Nov 3, 2023

I'm slightly concerned about the proposed Analyze function. It requires having loaded all packages ahead of time, including their type information and ASTs. If checking a large amount of packages, this amounts to a lot of flat memory usage.

In Staticcheck's analysis runner, we instead construct the package graph from packages loaded with

packages.NeedName |
	packages.NeedImports |
	packages.NeedDeps |
	packages.NeedExportFile |
	packages.NeedFiles |
	packages.NeedCompiledGoFiles |
	packages.NeedTypesSizes |
	packages.NeedModule

and then compute ASTs and type information on demand when a package is ready to be analyzed, i.e. all of its dependencies have been analyzed. Once we're done, we discard the AST and type information. This results in overall much lower memory usage.

One downside of our approach is that we load type information for dependencies from export data repeatedly, leading to more garbage and more CPU time spent on parsing export data and GC.

(We do the same discarding and reloading for facts, but that part probably contributes much less to overall memory usage.)

@dominikh
Copy link
Member

dominikh commented Nov 3, 2023

Also, are there any plans to add callbacks to support caching? That is, avoid recomputing facts and diagnostics for packages that haven't changed.

@adonovan
Copy link
Member Author

adonovan commented Nov 6, 2023

Hi @dominikh.

I'm slightly concerned about the proposed Analyze function. It requires having loaded all packages ahead of time, including their type information and ASTs. If checking a large amount of packages, this amounts to a lot of flat memory usage.

You're right that it isn't the most asymptotically efficient solution. It would be more efficient to have Analyze produce (and then evict) type-annotated syntax as it goes, and for {single,multi}checker to take advantage of this optimization too. I suppose nothing stops us from retrofitting this optimization into the API proposed here: if syntax and/or types are missing, then Analyze could parse and type-check as it goes, using a parallel structure to avoid mutating the go/packages graph.

But the motivation for this new API is for users that want to load and analyze a large program, re-using the tricky bits of {single,multi}checker.Main while inserting their own logic in between package.Load and Analyze and between Analyze and Exit, which the existing API makes impossible. So I expect that for them, the cost of holding everything in memory at once is quite acceptable.

As always, asymptotically efficient separate analysis with persistent memoization of subproblems is best done using the unitchecker approach (go vet -vettool=myexecutable), which works like a build system. This makes the cost of repeat analysis proportional to what has changed, though, as you point out, in the "clean build" scenario it does mean that export data is decoded repeatedly, just as in a clean go build. But the unitchecker approach is no good for tools that want to see the whole program before analysis, or all the results after analysis.

@timothy-king
Copy link
Contributor

Result interface{} // computed result of Analyzer.run, if any (and if IsRoot)

Are we sure we want to keep the "and if IsRoot" part? Keeping it does encourage compatibility with unitchecker and facts, but it will make the Actions distinct from how the execution happened.

If we have analyzers A and B where B uses the results of A and a package P and we request the roots [B]x[P], we will have the Actions [(A, P), (B, P)]. The current proposal requires that (A,P).Result == nil despite this potentially not being nil during the computation of (B,P).

Why is Result special in this way and not say Diagnostics?

@adonovan
Copy link
Member Author

adonovan commented Nov 7, 2023

If we have analyzers A and B where B uses the results of A and a package P and we request the roots [B]x[P], we will have the Actions [(A, P), (B, P)]. The current proposal requires that (A,P).Result == nil despite this potentially not being nil during the computation of (B,P).

Correct.

Why is Result special in this way and not say Diagnostics?

Because Results are potentially very large data structures, such as the SSA representation of all the function bodies, and we want to evict them from memory. By contrast, Diagnostics are typically small and few in number.

@rsc
Copy link
Contributor

rsc commented Nov 10, 2023

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

package checker // golang.org/x/tools/go/analysis/checker

func Analyze(analyzers []*analysis.Analyzer, pkgs []*packages.Package, opts *Options) (*Graph, error)

type Options struct {
	Sequential  bool // disable parallelism
	SanityCheck bool // perform additional sanity checks
	ShowFacts   bool // log each exported fact
}

type Graph struct {
	Roots []*Action
}

func (*Graph) Visit(f func(*Action) error) error) 
func (*Graph) PrintJSON(w.io.Writer) error
func (*Graph) PrintText(w.io.Writer, contextLines int)  error

type Action struct {
	Analyzer    *analysis.Analyzer
	Package     *packages.Package
	IsRoot      bool // whether this is a root node of the graph
	Deps        []*Action
	Result      interface{} // computed result of Analyzer.run, if any (and if IsRoot)
	Err         error       // error result of Analyzer.run
	Diagnostics []analysis.Diagnostic
	Duration    time.Duration // execution time of this step
}

func (a *Action) AllObjectFacts() []analysis.ObjectFact
func (a *Action) AllPackageFacts() []analysis.PackageFact
func (a *Action) ObjectFact(obj types.Object, ptr analysis.Fact) bool
func (a *Action) PackageFact(pkg *types.Package, ptr analysis.Fact) bool

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Accepted
Development

No branches or pull requests

10 participants