Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: module proxy protocol, extending the information endpoint #41902

Closed
0x2b3bfa0 opened this issue Oct 10, 2020 · 10 comments
Closed

cmd/go: module proxy protocol, extending the information endpoint #41902

0x2b3bfa0 opened this issue Oct 10, 2020 · 10 comments
Labels
FeatureRequest FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@0x2b3bfa0
Copy link

0x2b3bfa0 commented Oct 10, 2020

Module proxy protocol: extending the information endpoint

Given that proxy.golang.org serves as some sort of central registry for modules, it would be great if it could provide additional information in an universally readable format through the information endpoint:

GET $GOPROXY/<module>/@v/<version>.info returns JSON-formatted metadata about that version of the given module.

These fields may be or may be not be part of the base proxy specification, but it would be nice to have them at least on the official implementation.

Proposed additional response fields:

  • Documentation: a short sentence describing the module purpose, can be extracted in the same way as go list -json.

  • Licenses: the same license strings shown in pkg.go.dev, or [] for unknown licenses; ideally including the github.com/google/licensecheck to SPDX conversion used here.

  • Dependencies: a list with the same fields as requires in the go.mod file, including the // indirect comment as an additional field.

  • Hashes: for both the package and every requirement, as per the go.sum format.

  • Additional fields: every field as produced by go mod edit -json.

What did you do?

$ curl https://proxy.golang.org/golang.org/x/benchmarks/@v/v0.0.0-20191128100916-8f6035fd2e05.info

What did you expect to see?

{
    "Module": "golang.org/x/benchmarks",
    "Version": "v0.0.0-20191128100916-8f6035fd2e05",
    "Time": "2019-11-28T10:09:16Z",
    "Go": "1.13",
    "Packages": [
        {
            "ImportPath": "golang.org/x/benchmarks/build",
            "Doc": "Build is a benchmark that examines compiler and linker performance",
            "Licenses": ["BSD-3-Clause"]
        },
        {
            "ImportPath": "golang.org/x/benchmarks/driver",
            "Doc": "Package driver provides common benchmarking logic shared between benchmarks",
            "Licenses": ["BSD-3-Clause"]
        },
        {
            "ImportPath": "golang.org/x/benchmarks/garbage",
            "Doc": "Garbage is a benchmark that stresses garbage collector",
            "Licenses": ["BSD-3-Clause"]
        },
        {
            "ImportPath": "golang.org/x/benchmarks/http",
            "Doc": "HTTP is a benchmark that examines client/server http performance",
            "Licenses": ["BSD-3-Clause"]
        },
        {
            "ImportPath": "golang.org/x/benchmarks/json",
            "Doc": "JSON benchmark marshals and unmarshals ~2MB json string with a tree-like object hierarchy, in 4*GOMAXPROCS goroutines",
            "Licenses": ["BSD-3-Clause"]
        }
    ],
    "Require": [
        {
            "Path": "golang.org/x/sys",
            "Version": "v0.0.0-20190312061237-fead79001313",
            "Sum": {
                "GoMod": "h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=",
                "Module": "h1:eh4x/QwiUD8mqV3DqaO2syciENiawp92l44B9vVCLAk="
            },
            "Indirect": false
        }
    ],
    "Replace": null,
    "Exclude": null,
    "Sum": {
        "GoMod": "h1:K/nWeB/DgmmOIDotXYHRsXa8OeuO9KAJf8AT3/wwq7o=",
        "Module": "h1:3NzKrPel479N37l438cSECz8erwULlLLQdr1d2+Hj4s="
    },
    "Licenses": ["BSD-3-Clause"]
}

What did you see instead?

{
    "Version": "v0.0.0-20191128100916-8f6035fd2e05",
    "Time": "2019-11-28T10:09:16Z"
}

Omitted fields:

I've omitted the following fields because they don't seem to be useful for this proposal:

  • What operating system and processor architecture are you using?
  • Does this issue reproduce with the latest release?
  • What version of Go are you using?
@heschi
Copy link
Contributor

heschi commented Oct 10, 2020

I think we'd need a strong reason to consider adding more information. Also, many of the things you've asked for, like import path and documentation, are more about packages than modules. There's no guarantee that there is a package at the root of a module. Even licenses are something that pkg.go.dev shows per-package rather than per-module.

As for requires, downloading and parsing the go.mod shouldn't be that much of a burden? And that gives you the module name, which is essentially the import path you asked for.

@katiehockman @hyangah

@0x2b3bfa0
Copy link
Author

0x2b3bfa0 commented Oct 10, 2020

I think we'd need a strong reason to consider adding more information. Also, many of the things you've asked for, like import path and documentation, are more about packages than modules. There's no guarantee that there is a package at the root of a module. Even licenses are something that pkg.go.dev shows per-package rather than per-module.

In my opinion, that maybe strong reason would be offering consistent metadata for modules and packages in universally readable format, allowing an easier interoperation with other package managers and analysis tools. Maybe not to the point of crates.io, but at least with enough information to build dependency graphs.

I've updated the proposal in order to remove references to packages and talk only about modules as, effectively, there were some misconceptions in the original issue.

As for requires, downloading and parsing the go.mod shouldn't be that much of a burden? And that gives you the module name, which is essentially the import path you asked for.

That's my last resort, though manually parsing sui generis formats from another programming language might not be the cleanest solution.

@0x2b3bfa0
Copy link
Author

0x2b3bfa0 commented Oct 12, 2020

@heschik, I've solved my use case with a mix of regular expressions and proper selectors for extracting licenses and documentation from pkg.go.dev. 😅 Nevertheless, I've updated my proposal again to reflect a realistic package and module structure, including useful fields for external dependency managers.

For a bit of background, this proposal originated during a tentative refactoring of the Guix go-build-system. Though it's a niche use case, this proposal might have many other uses for tools that deal with Go packages and modules from another languages.

Feel free to close the issue if you deem it unproductive, though it would be no surprise finding a new use case for this functionality in a near future.

Thank you for your valuable advice!

@hyangah
Copy link
Contributor

hyangah commented Oct 12, 2020

The main purpose of proxy protocol is to serve the go command and we want to keep it minimal.

In my opinion, the proposed endpoint belongs to package/module discovery services such as pkg.go.dev.
#36785 is a relevant issue.
cc @julieqiu

@0x2b3bfa0
Copy link
Author

0x2b3bfa0 commented Oct 12, 2020

💯 @hyangah, certainly, there is some overlap between the official module proxy and the official package/module discovery service, and #36785 is really apropos. Thank you!

If pkg.go.dev implements the proposed enhancement along with a dedicated code endpoint like .zip on the module proxy, the overlap would be complete, but we would have a solid registry/discovery service with the ability to serve a copy of both code and metadata for every known package in an easily readable format.

Do you think that proxying/caching useful metadata along with the code would be a good idea?

@dmitshur dmitshur changed the title Module proxy protocol: extending the information endpoint cmd/go: module proxy protocol, extending the information endpoint Oct 13, 2020
@dmitshur dmitshur added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. FeatureRequest labels Oct 13, 2020
@dmitshur dmitshur added this to the Backlog milestone Oct 13, 2020
@dmitshur
Copy link
Contributor

CC @bcmills, @jayconrod, @matloob per owners.

@jayconrod
Copy link
Contributor

The main reason .info exists is so the go command has a way to resolve revision names like devbranch and 01234abcde to actual versions. When you run a command like go get example.com/mod@devbranch, it hits the .info endpoint to get back a canonical version, which it can then use with .mod and .zip.

One reason we're hesitant to extend the .info endpoint is that its content is not authenticated by go.sum or the checksum database. Proxies aren't trusted, and they're allowed to change what they serve for .info. So we especially don't want to serve anything redundant with .mod or .zip since that would entice users to discard a useful security guarantee.

About the specific fields you asked for:

  • Documentation - there's no module-wide standard for documentation. pkg.go.dev shows README files from module zips. I think a few different extensions are recognized.
  • Licenses - the module zip may include a file named LICENSE, though I think pkg.go.dev recognizes files with a few other extensions (.txt, and so on). We especially don't want to add anything in the proxy protocol automatically; that may put us and other implementers of the proxy protocol in the position of providing legal advice.
  • Dependencies - already in go.mod. Use golang.org/x/mod/modfile to parse or use the grammar from the reference documentation.
  • Hashes - use sum.golang.org instead. See Checksum database for protocol details.
  • Additional fields - parse go.mod instead.

@0x2b3bfa0
Copy link
Author

0x2b3bfa0 commented Oct 17, 2020

The main reason .info exists is so the go command has a way to resolve revision names like devbranch and 01234abcde to actual versions. When you run a command like go get example.com/mod@devbranch, it hits the .info endpoint to get back a canonical version, which it can then use with .mod and .zip.

That's a nice feature. In fact, I'm using it in order to resolve commit hashes and tags and get long versions for some modules.

One reason we're hesitant to extend the .info endpoint is that its content is not authenticated by go.sum or the checksum database. Proxies aren't trusted, and they're allowed to change what they serve for .info. So we especially don't want to serve anything redundant with .mod or .zip since that would entice users to discard a useful security guarantee.

That's a very interesting point: we can't obviate the importance of having integrity information for every module and its metadata, though relying on custom formats like go.sum and go.mod will force non-Go programs to parse them without the official parser in most situations.

Though it would be feasible to verify every package on the server and sign the JSON responses with JWS or similar methods, that would add server-side overhead and require a trusted proxy, directory or discovery service.

The only long-term solution I see for this issue would probably involve replacing both go.mod and go.sum with a JSON-based format, and this doesn't look like a good idea, given the insane amounts of packages that would need to change and the decrease in human readability of the new format when compared to the current one.

  • Documentation - there's no module-wide standard for documentation. pkg.go.dev shows README files from module zips. I think a few different extensions are recognized.

I'm getting it from pkg.go.dev, but web scraping might not be the most elegant solution.

  • Licenses - the module zip may include a file named LICENSE, though I think pkg.go.dev recognizes files with a few other extensions (.txt, and so on). We especially don't want to add anything in the proxy protocol automatically; that may put us and other implementers of the proxy protocol in the position of providing legal advice.

That's a really delicate issue, and the less friction path would probably be serving this information through pkg.go.dev by exposing a public API, as that site is already serving extracted license names in a nearly-SPDX format.

The issue is that golang.org/x/mod/modfile doesn't play nicely outside of the Go ecosystem, and reimplementations of custom formats might not be a good idea in this case.

I'll use it for now, thanks!

@jayconrod
Copy link
Contributor

Closing this issue in favor of #36785, which seems like the best way to move forward in the long term.

I don't think implementing go.mod and go.sum parsers in other languages should be a huge barrier though. Both formats were chosen to be easily readable and editable by both humans and programs.

@0x2b3bfa0
Copy link
Author

Completely agree, @jayconrod. Thanks!

@golang golang locked and limited conversation to collaborators Oct 19, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FeatureRequest FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

6 participants