Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/tools/go/analysis: AllObjectFacts is inconsistent between running Analyzers as standalone binaries vs. via unitchecker #47725

Open
timothy-king opened this issue Aug 16, 2021 · 7 comments
Labels
Analysis Issues related to static analysis (vet, x/tools/go/analysis) NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Tools This label describes issues relating to any tools in the x/tools repository.
Milestone

Comments

@timothy-king
Copy link
Contributor

The ObjectFacts available to an analysis.Analyzer currently depends on how the Analyzer is run. When a {single,multi}checker is run as a standalone binary, it can see different ObjectFacts from AllObjectFacts vs. when it is run incrementally (unitchecker) such as go vet, go vet -vetool=... and the nogo bazel rule.

High level documentation on Facts.

Object facts can be attached during an analysis Pass to any types.Object within the *types.Package for that Pass. This includes variables within functions, functions that are not accessible from a *types.Package (func "_" and "init" funcs), types that are local to functions, etc. Relevant documentation:

	// ExportObjectFact associates a fact of type *T with the obj,
	// replacing any previous fact of that type.
	//
	// ExportObjectFact panics if it is called after the pass is
	// complete, or if obj does not belong to the package being analyzed.
	// ExportObjectFact is not concurrency-safe.
	ExportObjectFact func(obj types.Object, fact Fact)

This allows the Pass to reason about Facts being placed on type.Objects within the same package and types.Objects from different packages that are accessible from the current package.

AllObjectFacts returns all of the object facts for the current Analyzer including from transitive dependencies.

	// AllObjectFacts returns a new slice containing all object facts of the analysis's FactTypes
	// in unspecified order.
	// WARNING: This is an experimental API and may change in the future.
	AllObjectFacts func() []ObjectFact

{single,multi}checker are intended to be easy to use wrappers that support both running as standalone binaries (backed by the library go/analysis/internal/checker) and incrementally (backed by go/analysis/unitchecker). unitchecker supports incremental analysis by serializing Facts for each package. Facts serialization is implemented on top of objectpath. objectpath does not support types.Objects what are not accessible from the *types.Package. There are also places in go/analysis/internal that filter facts: facts.go and checker.go. These two filtering methods are slightly inconsistent.

The upshot of the above is:

  1. We can get different facts, results , diagnostics, etc. based on whether the Analyzer is run incrementally or not if the analyzer uses AllObjectFacts. (It is not clear using AllObjectFacts is the only such condition.)
  2. AllObjectFacts's documentation is likely misleading Analyzer designers. It is not "all object facts". If we want to support all object facts, the serialization that objectpath does needs additional information. Alternatively the documentation can be updated to reflect a more narrow contract.

CL showing how to produce different results with incremental and standalone execution: https://go-review.googlesource.com/c/tools/+/342554

Extra Notes:

  1. There is also a distinction for a standalone checker whether packages p and q which do not transitively import each other can see ObjectFacts from the other via AllObjectFacts. This is architecturally infeasible within incremental builds. It is not clear if go/analysis/internal/checker correctly enforces this.
  2. The "within the same Package" distinction for ExportObjectFact can be subtle as it does not include things like functions coming from embedding an interface from another package. But this is not the focus of this issue.
@timothy-king timothy-king added the Analysis Issues related to static analysis (vet, x/tools/go/analysis) label Aug 16, 2021
@timothy-king timothy-king added this to the Unreleased milestone Aug 16, 2021
@gopherbot gopherbot added the Tools This label describes issues relating to any tools in the x/tools repository. label Aug 16, 2021
@scott-cotton
Copy link

scott-cotton commented Aug 16, 2021

@timothy-king -- Thanks very much for the very clear explanation and reproducer.

I am working incremental pointer analysis @go-air (very much WIP...) and had many questions about the facts analysis interface -- I found it confusing and my current thinking is that for the purpose of that project it is easiest to work around.

(I have opened a tracking issue there)

I think that object path is far too limiting to impose on tools/go/analysis users (in addition to the points your bring up, for example, what about expressions across packages)? I also think that the existence of tools/go/analysis-library imposed cross-package object path filtering logic (embedded types and what not) is itself a problem, though having object path is better than nothing to aid in serialisation.

Anyway, it would indeed be nice if analysers would produce the same results standalone vs as a go vet tool.

Are there thoughts about simply letting Analyzer authors do their own cross-package filtering?

@dominikh does this effect static check?

@mknyszek mknyszek added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Aug 17, 2021
@timothy-king
Copy link
Contributor Author

Are there thoughts about simply letting Analyzer authors do their own cross-package filtering?

@wsc0 I find it really helps to think of x/tools/go/analysis as a compiler framework that happens to target static analysis. Especially when thinking about facts. A pass (Analyzer, Pkg) takes in as input facts from package Pkg imports, parsed source+types+no go files, etc. for Pkg, results for passes (A', Pkg) when Analyzers depends directly on A', and outputs facts, a result, an error value, and a sequence of diagnostics. This allows for a schedule of passes that examines each package incrementally exactly once by always serializing all of the facts about a package and letting results flow through memory between analyzers. This is basically isomorphic to the compiler but replacing object files with facts. This enables simple integration into the go cmd, e.g. go vet, and other build systems like bazel can shadow the compilation graph to get an incremental x/tools/go/analysis graph. That is all a very long wind up to the following question.

If one does not impose the same *types.Package restriction on ExportObjectFact, how do you handle diamond dependencies and collisions on facts? Suppose we had the import graph

  a -> b -> d
  a -> c -> d
  a -> d

where b and c both produce an ObjectFact on a types.Object within d. (Remember c and b are processed independently [likely in different subprocesses].) What does a see when it imports the object fact from d?

Recalling a bit about from compiler+linkers here are some obvious candidate answers if we allow this:

  1. Reject the input as an error. No collisions are allowed.
  2. Select an input arbitrarily.
  3. Demand that collisions produced the same value. Error otherwise.
  4. Have precedence rule for selecting an input.
  5. Have a user specified fold function and have an order over which it is applied.
  6. Have a user specified join function and require that it is commutative. (The last two may require idempotent as well?)

None of these are entirely satisfactory.

(It is possible I have wildly misunderstood your question so please correct me if I have.)

@timothy-king
Copy link
Contributor Author

Another way of looking at ObjectFacts is "why do PackageFacts not suffice"? I think the answer is that serialization and deserialization over AST elements is hard so the framework steps in and does this under the hood. Consider emulating an object fact type "NoReturn" over functions using a package fact type map[*types.Func]NoReturn to see why this might be challenging. This is the problem objectpath is trying to solve.

@scott-cotton
Copy link

scott-cotton commented Aug 17, 2021

@timothy-king

Another way of looking at ObjectFacts is "why do PackageFacts not suffice"? I think the answer is that serialization and deserialization over AST elements is hard so the framework steps in and does this under the hood.

I think the problem of serialisation of everything go/{types,ast,token,...} is "the" root problem in the sense that solving it gives the most productive long term solution. Working with a complex framework with subtle "the framework steps in" solutions to diamond collisions (ie objectpath even as it stands today) which lack basic expressivity like the ability to refer to arbitrary expressions is also a hard problem for many analysis use cases.

Consider emulating an object fact type "NoReturn" over functions using a package fact type map[*types.Func]NoReturn to see why this might be challenging. This is the problem objectpath is trying to solve.

If I understand correctly, one could in theory use a serialisable per-package unique id in place of *types.Func for the map key if we solve "the" root problem above. Otherwise, in this case I guess a disciplined use of token.Pos for the map key may suffice to use a package fact.

But these are bigger picture questions. The more immediate question in this issue:

What does a see when it imports the object fact from d?

IE the coherence of object path facts. Supposing that objectpath + object facts are used (which indeed they are, but I think also they are only suitable only to a certain class of analysers and are complicated), I guess each analyzer could decide how to resolve the collision, taking into consideration its own needs. As the object facts are private to the analyzer, this seems ok to me. Analysers which depend on it would use results (which perhaps confusingly are not facts, but are distinct, letting the analyser using facts transform them into results.)

In terms of a patch, that would take the form of removing all the objectfact filtering, and then fixing any resulting problems in the existing analysers. Some could just throw out indirect info, for example. But the semantics what a fact then means are quite loose or arbitrary.

Perhaps an additional possible solution kind (to the 6 above @timothy-king brought up) would be to define that object fact on d as defined by b is distinct from object fact on d as defined by c, which is distinct from object fact on d as defined by a. Then the analyzer just has to do the right thing according to what functionality it provides. It's like context sensitivity but across packages instead of call graph, and each analyser could abstract that context sensitivity in order to reduce overhead as needed (or not).

This idea could be explored whether or not it is in the form "the framework steps in" or on the shoulders of each analyser.

Personally, though I'm more interested in solving the serialisation problems, at least in the form of an example.

But I agree the problem of coherence of object facts is deep and so solutions or solution directions at this point seem unsatisfactory. Can we add a "thinking" label?

@prattmic
Copy link
Member

This is slightly off-topic, but I'll note that if the "only" problem here were that we can't fully serialize types.Object, etc, then https://pkg.go.dev/gvisor.dev/gvisor/pkg/state may be of use. This package is capable of serializing most complex Go object graphs, and is in fact used by gVisor to serialize almost the entire application state.

Looking at types.Object, I see little that couldn't be automatically serialized out-of-the-box [1]. That said, it looks like you'd ultimately end of serializing the entire package object graph, meaning that the final analysis step could end up holding the entire parsed program in memory, which to some extent seems to defeat the purpose of the segmented analyzers.

[1] go/types.lazyObject.resolve is the only specific thing I noticed. This package can't handle function pointers (but that can usually be replaced with an interface value).

@scott-cotton
Copy link

@prattmic thanks for the reference. gvisor state looks nice. Regarding serialising the whole object graph and segmented analysers, I think that for many purposes serialisation + the ability to refer to unique ids of objects/expressions without needing to load in the whole graph is more along the lines of what I've seen before and am looking for, based on the idea of making approximate summaries( of packages). So to analyse A which imports B and C, and C imports D, you only would need {A, summary-of(B), summary-of(C)} in memory to analyse A; you would not need D and you would only use the "summary-of" the imports.

For example, https://drops.dagstuhl.de/opus/volltexte/2021/14045/pdf/LIPIcs-ECOOP-2021-2.pdf

Those summaries, if persisted as in the paper, and if communicated in separate processes like in go vet, tend to need to make reference to things which are currently in memory like types and ast nodes and declarations and ...

@adonovan
Copy link
Member

I am working incremental pointer analysis https://github.com/go-air and had many questions about the facts analysis interface -- I found it confusing and my go-air/pal#6 (comment) is that for the purpose of that project it is easiest to work around.

Most pointer analyses are expressed as two phases: (1) turn the source into a system of constraints; (2) solve the constraints. The first is essentially a compiler pass over the statements of the program, and the second is usually something like computing graph reachability. The first algorithm is linear, the second is cubic. So I wonder whether it's necessary to optimize the first phase at all.

Also, pointer analyses are usually whole-program analyses that start from main. They are analogous to the deadcode removal step in a linker. By contrast, the go/analysis framework is more like a compiler that analyzes each package in topological order, but at no point gets to see all the derived information ("object code" in the analogy) of the program.

Working with a complex framework with subtle "the framework steps in" solutions to diamond collisions (ie objectpath even as it stands today) which lack basic expressivity like the ability to refer to arbitrary expressions is also a hard problem for many analysis use cases.

I'm not denying the problems of objectpath and the rigidity of the framework, but it seems to me that the go/analysis framework is the wrong foundation for a pointer analysis. You might find go/packages is a better starting point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Analysis Issues related to static analysis (vet, x/tools/go/analysis) NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Tools This label describes issues relating to any tools in the x/tools repository.
Projects
None yet
Development

No branches or pull requests

6 participants