-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x/tools/go/analysis: AllObjectFacts is inconsistent between running Analyzers as standalone binaries vs. via unitchecker #47725
Comments
@timothy-king -- Thanks very much for the very clear explanation and reproducer. I am working incremental pointer analysis @go-air (very much WIP...) and had many questions about the facts analysis interface -- I found it confusing and my current thinking is that for the purpose of that project it is easiest to work around. (I have opened a tracking issue there) I think that object path is far too limiting to impose on tools/go/analysis users (in addition to the points your bring up, for example, what about expressions across packages)? I also think that the existence of tools/go/analysis-library imposed cross-package object path filtering logic (embedded types and what not) is itself a problem, though having object path is better than nothing to aid in serialisation. Anyway, it would indeed be nice if analysers would produce the same results standalone vs as a go vet tool. Are there thoughts about simply letting Analyzer authors do their own cross-package filtering? @dominikh does this effect static check? |
@wsc0 I find it really helps to think of x/tools/go/analysis as a compiler framework that happens to target static analysis. Especially when thinking about facts. A pass (Analyzer, Pkg) takes in as input facts from package Pkg imports, parsed source+types+no go files, etc. for Pkg, results for passes (A', Pkg) when Analyzers depends directly on A', and outputs facts, a result, an error value, and a sequence of diagnostics. This allows for a schedule of passes that examines each package incrementally exactly once by always serializing all of the facts about a package and letting results flow through memory between analyzers. This is basically isomorphic to the compiler but replacing object files with facts. This enables simple integration into the go cmd, e.g. If one does not impose the same *types.Package restriction on ExportObjectFact, how do you handle diamond dependencies and collisions on facts? Suppose we had the import graph
where b and c both produce an ObjectFact on a Recalling a bit about from compiler+linkers here are some obvious candidate answers if we allow this:
None of these are entirely satisfactory. (It is possible I have wildly misunderstood your question so please correct me if I have.) |
Another way of looking at ObjectFacts is "why do PackageFacts not suffice"? I think the answer is that serialization and deserialization over AST elements is hard so the framework steps in and does this under the hood. Consider emulating an object fact type "NoReturn" over functions using a package fact type |
I think the problem of serialisation of everything go/{types,ast,token,...} is "the" root problem in the sense that solving it gives the most productive long term solution. Working with a complex framework with subtle "the framework steps in" solutions to diamond collisions (ie objectpath even as it stands today) which lack basic expressivity like the ability to refer to arbitrary expressions is also a hard problem for many analysis use cases.
If I understand correctly, one could in theory use a serialisable per-package unique id in place of *types.Func for the map key if we solve "the" root problem above. Otherwise, in this case I guess a disciplined use of token.Pos for the map key may suffice to use a package fact. But these are bigger picture questions. The more immediate question in this issue:
IE the coherence of object path facts. Supposing that objectpath + object facts are used (which indeed they are, but I think also they are only suitable only to a certain class of analysers and are complicated), I guess each analyzer could decide how to resolve the collision, taking into consideration its own needs. As the object facts are private to the analyzer, this seems ok to me. Analysers which depend on it would use results (which perhaps confusingly are not facts, but are distinct, letting the analyser using facts transform them into results.) In terms of a patch, that would take the form of removing all the objectfact filtering, and then fixing any resulting problems in the existing analysers. Some could just throw out indirect info, for example. But the semantics what a fact then means are quite loose or arbitrary. Perhaps an additional possible solution kind (to the 6 above @timothy-king brought up) would be to define that object fact on d as defined by b is distinct from object fact on d as defined by c, which is distinct from object fact on d as defined by a. Then the analyzer just has to do the right thing according to what functionality it provides. It's like context sensitivity but across packages instead of call graph, and each analyser could abstract that context sensitivity in order to reduce overhead as needed (or not). This idea could be explored whether or not it is in the form "the framework steps in" or on the shoulders of each analyser. Personally, though I'm more interested in solving the serialisation problems, at least in the form of an example. But I agree the problem of coherence of object facts is deep and so solutions or solution directions at this point seem unsatisfactory. Can we add a "thinking" label? |
This is slightly off-topic, but I'll note that if the "only" problem here were that we can't fully serialize types.Object, etc, then https://pkg.go.dev/gvisor.dev/gvisor/pkg/state may be of use. This package is capable of serializing most complex Go object graphs, and is in fact used by gVisor to serialize almost the entire application state. Looking at types.Object, I see little that couldn't be automatically serialized out-of-the-box [1]. That said, it looks like you'd ultimately end of serializing the entire package object graph, meaning that the final analysis step could end up holding the entire parsed program in memory, which to some extent seems to defeat the purpose of the segmented analyzers. [1] go/types.lazyObject.resolve is the only specific thing I noticed. This package can't handle function pointers (but that can usually be replaced with an interface value). |
@prattmic thanks for the reference. gvisor state looks nice. Regarding serialising the whole object graph and segmented analysers, I think that for many purposes serialisation + the ability to refer to unique ids of objects/expressions without needing to load in the whole graph is more along the lines of what I've seen before and am looking for, based on the idea of making approximate summaries( of packages). So to analyse A which imports B and C, and C imports D, you only would need {A, summary-of(B), summary-of(C)} in memory to analyse A; you would not need D and you would only use the "summary-of" the imports. For example, https://drops.dagstuhl.de/opus/volltexte/2021/14045/pdf/LIPIcs-ECOOP-2021-2.pdf Those summaries, if persisted as in the paper, and if communicated in separate processes like in go vet, tend to need to make reference to things which are currently in memory like types and ast nodes and declarations and ... |
Most pointer analyses are expressed as two phases: (1) turn the source into a system of constraints; (2) solve the constraints. The first is essentially a compiler pass over the statements of the program, and the second is usually something like computing graph reachability. The first algorithm is linear, the second is cubic. So I wonder whether it's necessary to optimize the first phase at all. Also, pointer analyses are usually whole-program analyses that start from main. They are analogous to the deadcode removal step in a linker. By contrast, the go/analysis framework is more like a compiler that analyzes each package in topological order, but at no point gets to see all the derived information ("object code" in the analogy) of the program.
I'm not denying the problems of objectpath and the rigidity of the framework, but it seems to me that the go/analysis framework is the wrong foundation for a pointer analysis. You might find go/packages is a better starting point. |
The ObjectFacts available to an analysis.Analyzer currently depends on how the Analyzer is run. When a {single,multi}checker is run as a standalone binary, it can see different ObjectFacts from AllObjectFacts vs. when it is run incrementally (unitchecker) such as
go vet
,go vet -vetool=...
and the nogo bazel rule.High level documentation on Facts.
Object facts can be attached during an analysis
Pass
to anytypes.Object
within the*types.Package
for thatPass
. This includes variables within functions, functions that are not accessible from a *types.Package (func "_" and "init" funcs), types that are local to functions, etc. Relevant documentation:This allows the Pass to reason about Facts being placed on type.Objects within the same package and types.Objects from different packages that are accessible from the current package.
AllObjectFacts returns all of the object facts for the current Analyzer including from transitive dependencies.
{single,multi}checker are intended to be easy to use wrappers that support both running as standalone binaries (backed by the library go/analysis/internal/checker) and incrementally (backed by go/analysis/unitchecker). unitchecker supports incremental analysis by serializing Facts for each package. Facts serialization is implemented on top of objectpath. objectpath does not support types.Objects what are not accessible from the *types.Package. There are also places in go/analysis/internal that filter facts: facts.go and checker.go. These two filtering methods are slightly inconsistent.
The upshot of the above is:
AllObjectFacts
's documentation is likely misleading Analyzer designers. It is not "all object facts". If we want to support all object facts, the serialization that objectpath does needs additional information. Alternatively the documentation can be updated to reflect a more narrow contract.CL showing how to produce different results with incremental and standalone execution: https://go-review.googlesource.com/c/tools/+/342554
Extra Notes:
p
andq
which do not transitively import each other can see ObjectFacts from the other via AllObjectFacts. This is architecturally infeasible within incremental builds. It is not clear if go/analysis/internal/checker correctly enforces this.The text was updated successfully, but these errors were encountered: