Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: cmd/go: add .proxy endpoint to the module proxy spec #35400

Open
katiehockman opened this issue Nov 6, 2019 · 21 comments
Open

proposal: cmd/go: add .proxy endpoint to the module proxy spec #35400

katiehockman opened this issue Nov 6, 2019 · 21 comments

Comments

@katiehockman
Copy link
Contributor

Users would benefit from more transparency around whether or not a specific module version is being temporarily cached in a proxy or whether it is being permanently mirrored. There are a number of reasons why a proxy may choose not to mirror something forever: licensing is one notable example.

The proposal would be to add an additional optional endpoint to the proxy spec (ie. go help goproxy), which proxies could implement if they choose to, which would give this information. For example:
https://proxy.golang.org/golang.org/x/text/@v/v0.3.2/mirrored
would return "true" or "false" as plaintext.

This is something we could pair with a utility in x/mod which would accept a go.sum file and indicate which versions aren't being permanently mirrored by any of the proxies listed in GOPROXY. That might help you decide to use a different version of the module, vendor that dependency, or encourage you to file an issue against the module if you see that a suitable license is missing, for example.

/cc @jayconrod @bcmills @heschik @hyangah @rsc

@bradfitz
Copy link
Contributor

bradfitz commented Nov 6, 2019

The Expires headers could/should(?) say the same thing.

e.g. Expires 1 week vs Expires 100 years.

@katiehockman
Copy link
Contributor Author

In many cases there is a difference between when the response is considered "stale", ie. Expires, and when the underlying server may be unable to continue serving a zip. The common case would be if a proxy is using a CDN where the cached response provided by the CDN may be stale after a few hours but the proxy server intends to continue serving that zip for much longer.

@ankushchadha
Copy link

Having module versions cached temporarily means non-deterministic builds for the users.

IMO availability is one of the fundamental requirements of a public goproxy. And this is true atleast in gocenter.io.

Also, what does it mean for a user who also uses a local goproxy that further points to a public goproxy? Should they selectively clean up the local goproxy's cache always given this new endpoint? Maybe let users decide what they want to consume based on the metadata provided by the goproxy.

@arschles
Copy link

arschles commented Nov 6, 2019

+1 to the above. Except for extenuating circumstances like DMCA takedowns etc..., why would a module not be stored (purposefully not using the term "cache" because it implies expiration) forever on proxy.golang.org?

If there were modules that proxies/mirrors might not or did not store, then as @ankushchadha said, builds become nondeterministic. One of the major apparent benefits of proxy.golang.org right now is that it enables deterministic builds.

Edit: the proxy enables deterministic builds

@bcmills
Copy link
Contributor

bcmills commented Nov 6, 2019

@arschles, a module might not be stored if the proxy maintainer is not confident that the module's license permits it to be stored.

Builds in that case do not become “nondeterministic”: they may either succeed or fail, depending on whether the needed modules are available (locally or from any configured remote source), but if they succeed they will produce the same result as any other successful build.

@arschles
Copy link

arschles commented Nov 7, 2019

@bcmills understood, I agree that this feature may be useful for on-prem proxies.

I'm talking about this endpoint in the context of public, hosted proxies. It introduces the possibility that a host may cache modules, and if you get a false back, that module@version could expire at any time. As the developer of an application who relies heavily on proxy.golang.org (we use it to build github.com/gomods/athens), I would prefer that everything returns true at all times (exceptions being made for special cases)

@hyangah
Copy link
Contributor

hyangah commented Nov 7, 2019

I agree we need a way to tell users what proxy.golang.org will do about a specific module version. But I am not 100% sure about whether this endpoint belongs to the proxy protocol - at this moment, it seems too specific to proxy.golang.org.

It will sound more convincing if there are proxies other than proxy.golang.org that would utilize this new endpoint in a meaningful way.

The endpoint doesn't make much sense for enterprise and private proxies.

gocenter.io is trying to mirror everything once it decides to serve a module version. Most of other public proxies I've seen didn't make any official commitment about their data retention policy. Can other public proxy owners chime in?

@oiooj
Copy link
Member

oiooj commented Nov 13, 2019

goproxy.io is here, but as @bcmills said, we are not confident that the module's license permits it to be stored, and space is always limited.

@katiehockman
Copy link
Contributor Author

@arschles

As the developer of an application who relies heavily on proxy.golang.org (we use it to build github.com/gomods/athens), I would prefer that everything returns true at all times (exceptions being made for special cases)

Since we can't do this (re: licensing), the best alternative would be to inform users if there is a genuine risk that their dependency will disappear if it's removed from the origin server. You mentioned that this endpoint would say "that module@version could expire at any time", so one alternative might be to give a timestamp for how much longer this cached copy will live, instead of true/false?

@oiooj

we are not confident that the module's license permits it to be stored, and space is always limited.

Can you clarify? Are you saying that goproxy.io also doesn't mirror things forever, depending on the size of the module and the license? If that's the case, then users of your service may also benefit from this kind of transparency.


Thanks for everyone's comments. As @hyangah said, it's going to be difficult to justify this if it's not something other proxies would benefit from, and if that's the case, this may just be something that proxy.golang.org should do itself if users are asking for it.

@hyangah
Copy link
Contributor

hyangah commented Nov 13, 2019

@oiooj @katiehockman If gocenter.io wants to preserve rights to evict some of the module versions to alleviate storage usage pressure in the future, I expect gocenter.io to return 'false' for the proposed /mirrored endpoint for all module versions. Then I don't think this endpoint is very useful for its users either.

@bcmills
Copy link
Contributor

bcmills commented Nov 13, 2019

@hyangah, to the contrary! If some tool uses /mirrored to, say, recommend whether or not users should mirror their dependencies locally (or vendor them), then a /mirrored endpoint that always returns false could still be a useful input to such a tool.

@hyangah
Copy link
Contributor

hyangah commented Nov 13, 2019

@bcmills Shouldn't the user of the proxy already know about the promise of the public proxy they are using? As far as I know, proxy.golang.org is the only one that may have different answers for modules/versions.

BTW if we are talking about the users who want to distribute the source code of binaries/libraries and control the dependencies, they don't know what proxy "their users" will depend on to build their source code. In this case, will they still need to vendor, or instruct their users to always use specific proxies they verified all their dependencies are mirrorred in?

@rsc
Copy link
Contributor

rsc commented Nov 13, 2019

What's the reason for using /mirrored instead of a field in the info file?

@bcmills
Copy link
Contributor

bcmills commented Nov 13, 2019

As far as I know, proxy.golang.org is the only one that may have different answers for modules/versions.

There is no intrinsic reason why that must be the case, and having an endpoint would make it easier for users to detect if (say) the proxy that they are using changes its policy to provide longer-term mirroring.

if we are talking about the users who want to distribute the source code of binaries/libraries […], will they still need to vendor, or instruct their users to always use specific proxies they verified all their dependencies are [mirrored] in?

Their users (transitively) can use the same endpoint to decide what to do.

@bcmills
Copy link
Contributor

bcmills commented Nov 13, 2019

What's the reason for using /mirrored instead of a field in the info file?

Proxies may reasonably re-serve .info files from other proxies. What presumably matters to users is the policy of the “last hop” proxy, not any intermediaries.

@katiehockman
Copy link
Contributor Author

What's the reason for using /mirrored instead of a field in the info file?

We've treated the .info file as an immutable object that is provided by the go command and given to clients unchanged. If we start adding custom fields to the .info file which differ between proxies, then I'm not sure what the consequences of this could be. For example, it could mean that a proxy like Athens which chooses to proxy our .info endpoints to their clients will be serving answers that are specific to our server instead of theirs.

I've also always viewed this file as "metadata about the module version" which is proxy independent, rather than "metadata about the module version as it relates to the proxy you got it from" which could change.

@jayconrod @heschik and I were discussing this the other day.

@rsc
Copy link
Contributor

rsc commented Nov 13, 2019

The existing convention in these URLs is to disambiguate based on the file extension not a new path element, so it would be v0.3.2.mirrored not v0.3.2/mirrored.

Beyond that, though, I wonder if maybe there will be need to send more than a single bit at some point (thus my question about .info). If we don't extend .info then it seems like we should instead define a new JSON-formatted .proxy file for proxy-specific information about the given module. It could start with just one field (Expires?) and add more as needed.

@katiehockman
Copy link
Contributor Author

Agreed that there is very likely to be a time where more than one bit should be provided. Perhaps the date of expiration, or the detected licenses, for example.

I like the idea of a generalized .proxy file similar to the .info file. I'm not sure if this belongs in the proxy protocol at this point though, especially if no other proxies will want to use this. But in general it sounds like a good approach to start with even if just proxy.golang.org serves it.

@rsc rsc added this to Incoming in Proposals (old) Nov 27, 2019
@rsc
Copy link
Contributor

rsc commented Nov 27, 2019

It sounds like there is general agreement to add a .proxy file with JSON.
The benefit of defining what it contains is that then cmd/go can potentially present that.

@katiehockman, would you rather:

  1. Put this proposal on hold and have proxy.golang.org start serving this file to gain some experience.
  2. Use this proposal to define the .proxy file and get to acceptance before serving from proxy.golang.org
    ?

Your call. Thanks.

@rsc rsc changed the title proposal: cmd/go: add optional /mirrored endpoint to the module proxy spec proposal: cmd/go: add .proxy endpoint to the module proxy spec Nov 27, 2019
@katiehockman
Copy link
Contributor Author

Thanks. Let's go with option 1 for now.

I'll go ahead and work on exposing a .proxy endpoint for proxy.golang.org that can share some extra metadata about cache expiration. We can learn from this, and if it ends up making sense to establish a more formal behavior for .proxy in the future, then we can reassess.

@rsc rsc moved this from Incoming to Active in Proposals (old) Dec 4, 2019
@rsc
Copy link
Contributor

rsc commented Dec 4, 2019

Putting this on hold. Katie, feel free to remove the hold label when you are ready for more discussion.

@rsc rsc moved this from Active to Hold in Proposals (old) Dec 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Hold
Development

No branches or pull requests

8 participants