Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/tools/go: create modanalysis package #44753

Open
carnott-snap opened this issue Mar 3, 2021 · 11 comments
Open

x/tools/go: create modanalysis package #44753

carnott-snap opened this issue Mar 3, 2021 · 11 comments
Labels
Analysis Issues related to static analysis (vet, x/tools/go/analysis) FeatureRequest NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Tools This label describes issues relating to any tools in the x/tools repository.
Milestone

Comments

@carnott-snap
Copy link

background

An analysis framework was built out for the go language, to make writing linters (and metalinters easier). go.mod files are another custom dsl, where you may want to enforce standards. Thus it seems reasonable to make an analysis package for modules.

One of the big potential benefits is an official "get me the main go.mod" function, since currently guessing or slurping go env -json are the best bad options. Furthermore, third party go.mod linters have already started popping up: gomodguard.

alternatives

The exact location is not terribly important, it could also be golang.org/x/tools/go/analysis/modanalysis or golang.org/x/mod/analysis, but golang.org/x/tools/go/analysis/modanalysis seems reasonable.

Instead of adding a whole new package and interface, it may be easier to add a ModPass struct based on the go.mod syntax, a new ModRun func(*ModRun) (interface{}, error) struct field, and the plumbing to read in the main go.mod. This seems like a choice for someone more familiar with the current package.

@gopherbot gopherbot added the Tools This label describes issues relating to any tools in the x/tools repository. label Mar 3, 2021
@gopherbot gopherbot added this to the Unreleased milestone Mar 3, 2021
@bcmills
Copy link
Contributor

bcmills commented Mar 3, 2021

CC @jayconrod @matloob @ianthehat

@carnott-snap
Copy link
Author

carnott-snap commented Mar 3, 2021

actually, it looks like somebody has shoehorned this into the existing analysis.Analyzer interface: modfile.Analyzer.

this seems like a hack, but props to the author, should we bless/upstream/document this implementation or is it worth extending the existing interface to better support this use case?

EDIT: this implementation looses out on the analysis.Pass.Report interface and requires one simply return an error, but does centralise the modfile.Parse call.

cc @tenntenn

@jayconrod
Copy link
Contributor

I don't think it makes sense to reuse the interfaces in golang.org/x/tools/go/analysis. That's really geared toward packages, and modules are very different.

From a brief look at the description of gomodguard, it looks like it's protecting whether any "blocked" modules are being used. What are other things this framework needs to do? Would it need to be composable? Could you be more specific about the API you'd like to see?

At the moment, we recommend go list (with -f or -json) as the main interface for loading module and package metadata. That will probably continue to be true unless we can really finalize the internal cmd/go package APIs and pull those packages out into x/mod.

@carnott-snap
Copy link
Author

carnott-snap commented Mar 3, 2021

Agreed, but I am not terribly familiar with this interface and wanted to imply that I do not care about the shape of the final interface. Also, I list gomodguard as prior art, but this started because I wanted to write a noreplace linter. I have two high level asks of different complexity.

One is "find the go.mod for a given directory". We would care about the parsed modfile.File data, so it need not be a path string. Also, most linters use the analysis.Pass interface and (iiuc) this cannot not cross module boundaries, so you could simplify this down to "give me modfile.File for the main module".

The second is the modanalysis.Pass interface. I would love to have as rich and full featured an interface as analysis.Pass, but I agree there is less need for complex linters, as go.mod is not turing complete. Spitballing, I could see linters that: enforce copyright blocks, require comment when pinning to something not @latest, disallow pseudo-releases, disallow exclude/replace/local replace clauses, require @latest, disallow retracted modules (this may be enforced by the compiler?), limit dependencies or versions (like gomodguard), is the module tidy. I think that modfile.File is enough parsed data to create a modanalysis.Diagnostic, so something like should be fine:

package modanalysis

import "golang.org/x/tools/go/analysis"

type Analyser struct {
        Name string
        Doc string
        Flags flag.FlagSet
        Flags flag.FlagSet
        Run func(*Pass) (interface{}, error)
        RunDespiteErrors bool
        Requires []*Analyzer
        ResultType reflect.Type
        FactTypes []analysis.Fact
}

func (a Analyser) String() string { return a.Name }

type Pass struct {
        Analyser *Analyser
        File modfile.File
        Report func(analysis.Diagnostic)
        ResultOf map[*Analyzer]interface{}
        Facts func() []analysis.Fact
}

I reused generic data structures from analysis where possible, though this could create cycles. After fleshing this out, is really seems like new analysis.ModPass struct (like above, though I dislike the amount of duplication with Pass, but you are right the scopes are totally different) and analysis.Analyzer.ModRun would be sufficient, and probably less work? especially for analysis.Analyzer authors that want to reuse existing singlechecker.Main functionality.

@dmitshur dmitshur added FeatureRequest NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Mar 5, 2021
@timothy-king timothy-king added the Analysis Issues related to static analysis (vet, x/tools/go/analysis) label Mar 11, 2021
@timothy-king
Copy link
Contributor

How do you envision modanalysis.Analyzer.Requires and modanalysis.Analyzer.FactTypes being used? Are there clear use cases?

Also, most linters use the analysis.Pass interface and (iiuc) this cannot not cross module boundaries, so you could simplify this down to "give me modfile.File for the main module".

Correct. The analysis.Pass interface is conceptually a single (package, analyzer) pair.

FWIW I have implemented the equivalent of both the ModRun func(*ModRun) (interface{}, error) struct field extension of analysis and the modanalysis.Pass but for whole program analysis. In that context, each action is a ([]Packages, Analysis) pair. Both approaches have pros and cons.

I found the main disadvantage of the ModRun approach is that it complicates the relationships of what depends on what. Can an analyzer have a Run and a ModRun? Does the ModRun have access to facts? Can an analyzer have a requires on an analyzer with Run and ModRun and does this trigger the ModRun as a dependency? There are a decent number of cases to work through. It is possible to do this, but it does complicate things. If you don't need to be able to express the action:(modfile, analyzer) <--depends on-- (package, analyzer), it is not clear why this should the same concept as an analysis.Analyzer.

And I found the big disadvantage of the modanalysis.Analyzer path is code duplication. Also this is no longer compatible with {single,multi,unit}checker.Main for good or ill. To some extent that is kinda reasonable, as the "root" actions seems like they should be given as a list of *modanalysis.Analyzer and a list of modules (instead of a []*analysis.Analyzer and a []*packages.Package).

One kinda hacky option is that the analysis.Pass interface can be extended with module information from the loaded *packages.Package if/when available. Would the info in packages.Module suffice for the checkers you are writing? ( https://pkg.go.dev/golang.org/x/tools@v0.1.0/go/packages#Module ) Thinking out load: this is easy to forward for singlechecker and multichecker. Not sure about unitchecker, which would need to be supported for this to be viable. That makes the information available, but would be repeated in each package within the module, which is a bit wasteful and would need to be deduplicated to not annoy the user. One also needs to have at least 1 package in the module. (maybe not a big deal?)

@carnott-snap
Copy link
Author

carnott-snap commented Mar 11, 2021

How do you envision modanalysis.Analyzer.Requires and modanalysis.Analyzer.FactTypes being used? Are there clear use cases?

Anything calculated by multiple modanalysis.Analyzer, could be broken out into a parent, and passed via modanalysis.Analyzer.Facts. One example that comes to mind is calculating the list of versions for a given dependency. As I understand it this is not provided by modfile.File, and a linters could want restrict redacted versions, meaning some lookup is required. Other possible linters include:

  • no exclude
  • no replace
  • no local replace
  • no pseudo versions
  • no require of redacted versions
  • required versions must be @latest
  • required versions must be @master
  • required versions must be @upgrade (latest major)
  • no multiple single line elements, only blocks
  • (redact) clauses need comments
  • clauses must be in some order (e.g. first go, then require, etc.)
  • catch spelling errors in comments and keywords
  • comparing the go directive to the local toolchain version

I found the main disadvantage of the ModRun approach is that it complicates the relationships of what depends on what.

None of my lints need package and module information, so I think creating a strong separation is possible, which should simplify the implementation. We should just convince ourselves that there will never be a use case before enforcing that invariant. But any way you slice it, you will be left with some duplication.

Can an analyzer have a Run and a ModRun?

The spelling linter hits this mark, though it could also be implemented as two.

Does the ModRun have access to facts?

The above version example makes a case for module linters needing to pass around facts.

Can an analyzer have a requires on an analyzer with Run and ModRun and does this trigger the ModRun as a dependency?

Probably, though currently I have been thinking of the two as separate pipelines.

There are a decent number of cases to work through. It is possible to do this, but it does complicate things. If you don't need to be able to express the action: (modfile, analyzer) <--depends on-- (package, analyzer), it is not clear why this should the same concept as an analysis.Analyzer.

Yeah, that was the main reason I titled the issue as I did. The main reason to consider it at all is to play into the existing ecosystem: xxxchecker.Main is helpful and go vet/golangci-lint require it.

And I found the big disadvantage of the modanalysis.Analyzer path is code duplication. Also this is no longer compatible with {single,multi,unit}checker.Main for good or ill. To some extent that is kinda reasonable, as the "root" actions seems like they should be given as a list of *modanalysis.Analyzer and a list of modules (instead of a []*analysis.Analyzer and a []*packages.Package).

I am not tied to any specific approach; it is just that this use case seems important, and I picked one that seemed the least controversial. Personally, I think that the best option is to make a v2 interface that acts at the module level, but has iteration logic for interacting at the package level too. This would be fundamentally incompatible with with how we think about analysis.Analyzers today, but should be pretty future proofed. That being said, it may simply be too costly to move and compatibility is more important.

One kinda hacky option is that the analysis.Pass interface can be extended with module information from the loaded *packages.Package if/when available.

Yeah, I thought of this too, and we could have docs saying wrap any module specific code in a sync.Once, but while this is minimum viable, it is also brittle and confusing.

Would the info in packages.Module suffice for the checkers you are writing?

Technically yes, but only because packages.Module.GoMod will let you call modfile.Parse. Though you could make this a base analysis.Analyzer that everything else depends on (explicitly).

@timothy-king
Copy link
Contributor

Anything calculated by multiple modanalysis.Analyzer, could be broken out into a parent, and passed via modanalysis.Analyzer.Facts. One example that comes to mind is calculating the list of versions for a given dependency.

I do not understand this example yet. How this would be put into Result, Facts, and eventually turned into a diagnostic? One specific checker would really help me.

Yeah, I thought of this too, and we could have docs saying wrap any module specific code in a sync.Once,

unitchecker is usually 1 process per package so there would still be some duplication of work. Deduplication in an outer layer seems more reliable.

More thinking out loud. I don't know where one would store module Facts for the equivalent of a unitchecker mode for modfiles. Usually facts are stored on the with the export data along with the package graph, and is cached. Is a natural place to store export data on the module graph?

but while this is minimum viable, it is also brittle and confusing.

Fair enough. But it does maintain the (package, analyzer) action graph. This makes integrating into the existing ecosystem quite a bit easier.

Technically yes, but only because packages.Module.GoMod will let you call modfile.Parse. Though you could make this a base analysis.Analyzer that everything else depends on (explicitly).

Alright so if including the *modfile.File is sufficient, I don't see a reason why the framework would not do that instead. This could be accomplished by adding a field to Analyzer RequiresModules bool and a field Mod *modfile.File to Pass that indicates if the field will be populated.

@bcmills
Copy link
Contributor

bcmills commented Mar 11, 2021

Could you give some more detail about the concrete use-cases you have in mind?

golang.org/x/tools/go/analysis provides a framework for propagating facts from transitive declarations. I'm not sure what kind of facts you would need to propagate for modules, though: as far as I can tell, it suffices to know (1) the contents of the main module's go.mod file, and (2) the selected version of each module in the build list. Neither of those propagates in a useful way.

I guess a transitive analysis framework could be useful for computing, say, cut-points to remove a requirement on some version, but I'm not sure how those would be expressed as facts — nor what kinds of facts would be useful, other than “minimum versions transitively imposed by this module”.

@carnott-snap
Copy link
Author

carnott-snap commented Mar 11, 2021

I do not understand this example yet. How this would be put into Result, Facts, and eventually turned into a diagnostic? One specific checker would really help me.

Could you give some more detail about the concrete use-cases you have in mind?

no exclude or replace

These directives are useful in context, but tend to be a bit of an escape hatch and SHOULD NOT be in released modules. Disallow any entries in go.mod:

// suggest line removal
replace golang.org/x/mod v0.4.2 => golang.org/x/mod v0.4.1

// suggest line removal
exclude golang.org/x/mod v0.4.2

no local replace

Much like the previous example, local replace statements are anything but portable, so disallow them:

 // suggest line removal
replace golang.org/x/mod v0.4.2 => /path/to/golang.org/x/mod

no pseudo versions

When a valid tag has been cut for a repository, pseudo versions should not be required:

require(
        // suggest v0.4.2
        golang.org/x/mod v0.4.3-0.20210310185834-19d50cac98aa

        //ok
        golang.org/x/exp v0.0.0-20210220032938-85be41e4509f
)

no pre-release versions

Only stable releases should be required:

// suggest v1.4.0 if released, otherwise last released tag v0.3.5
require github.com/golang/protobuf v1.4.0-rc.4

no require of redacted versions

// suggest next smallest tag, say v1.5.3, otherwise @latest
require k8s.io/client-go v1.5.2

required versions must be latest patch

Consuming bugfixes is good:

// suggest v0.4.2
require golang.org/x/mod v0.4.1

required versions must be @latest (latest minor and patch)

Staying up to date is good:

// suggest v0.4.2
require golang.org/x/mod v0.3.0

required versions must be @upgrade (latest major)

Some repos move fast and we would like to fix breaking changes as quickly as possible:

// suggest v0.4.2
require golang.org/x/mod v0.3.0

required versions must be @master

Maybe you want to maintain GOPATH behaviour:

require (
        // suggest v0.4.3-0.20210310185834-19d50cac98aa
        golang.org/x/mod v0.4.2

        // suggest v0.0.0-20210220032938-85be41e4509f
        golang.org/x/exp v0.0.0-20210212053707-62dc52270d37
)

no multiple single line elements, only blocks

IMO, reads better:

// suggest
/* require (
        golang.org/x/mod v0.4.2
        golang.org/x/exp v0.0.0-20210212053707-62dc52270d37
) */
require golang.org/x/mod v0.4.2

require golang.org/x/exp v0.0.0-20210212053707-62dc52270d37

(redact) clauses need comments

Useful to document why you are pulling a release:

// suggest adding comment
redact v1.2.3

spaces between blocked lines with comments

For a big list, it is hard determine if the comment applies above or below:

// suggest
/* require(
        // message one
        golang.org/x/mod v0.4.3-0.20210310185834-19d50cac98aa

        // message two
        golang.org/x/exp v0.0.0-20210220032938-85be41e4509f
)*/
require(
        // message one
        golang.org/x/mod v0.4.3-0.20210310185834-19d50cac98aa
        // message two
        golang.org/x/exp v0.0.0-20210220032938-85be41e4509f
)

clauses must be in some order (e.g. first go, then require, etc.)

Old toolchains did not always add go clause, thus it can end up at the end of the file, plus people do not have a consistent ordering for require, replace, redact, and exclude:

redact v1.2.3

// move above redact clause
go 1.16

catch spelling errors in comments and keywords

Because unknown keywords are ignored, comments are unstructured, and english is hard, typos may introduce bugs or reading issues:

// suggest redact v1.2.3
redatc v1.2.3

// suggest performance
require golang.org/x/mod v0.4.1 // preformance bug introduced

comparing the go directive to the local toolchain version

// suggest 1.16 when running with go1.16
go 1.15

@timothy-king
Copy link
Contributor

@carnott-snap Thank you for these examples. Those are pretty compelling for supporting linting the modfiles in some form.

If we went the modanalysis.Analyzer route, I am not seeing a checker that would benefit from passing information between passes via either (modfile, analyzer_1)->(modfile, analyzer_2) or (modfile_1, analyzer)->(modfile_2, analyzer). Those would presumably be Analyzer.Result and Analyzer.FactTypes respectively. Maybe I am still misunderstanding the proposal?

@carnott-snap
Copy link
Author

Anything that works with external versions suggestions can really benefit from Analyzer.Result, as you may have to resolve "which versions of this dependency are retracted" or "what is the latest version of this dependency", and if a parent generated those results once multiple Analyzers could depend on that data. However, since modules only have a single modfile, I agree that Analyzer.FactTypes seems moot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Analysis Issues related to static analysis (vet, x/tools/go/analysis) FeatureRequest NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Tools This label describes issues relating to any tools in the x/tools repository.
Projects
None yet
Development

No branches or pull requests

6 participants