Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

index.golang.org: exclude unavailable module versions from default results #59320

Closed
hyangah opened this issue Mar 29, 2023 · 1 comment
Closed
Assignees
Labels
NeedsFix The path to resolution is known, but the work has not been done. proxy.golang.org

Comments

@hyangah
Copy link
Contributor

hyangah commented Mar 29, 2023

Background

index.golang.org serves the index of module versions stored in proxy.golang.org and sum.golang.org. Users can access the data using the API ((doc)[https://proxy.golang.org]).

  https://index.golang.org/index?since=<RFC3339Timestamp>

The API returns up to 1000 {module, version, timestamp} tuples added since the specified timestamp.

This index is the data source of pkg.go.dev, and some Go community members rely on it to retrieve data from the proxy and the checksum db for ecosystem-wide data analysis or mirroring.

The index reports all module versions that proxy.golang.org and sum.golang.org have ever served. As of today (March 2023), proxy.golang.org has observed and served ~16.7 million unique module versions

Problems

The current API is designed primarily for users who want to use it as a live feed of newly added module versions. For example, pkg.go.dev periodically polls the API endpoint to learn about newly added module versions.

This API is not ideal if users want the list of module versions available for redistribution from proxy.golang.org:

  • The result includes all module versions that proxy.golang.org has ever observed. Some may no longer exist in the proxy.golang.org’s storage, or in the source repositories. For example, module versions without recognized licenses may be evicted from proxy's storage after not being requested for some time.

  • Users of the index.golang.org API most likely send follow-up queries to proxy.golang.org to download the actual contents of the module versions. If these users attempt to download evicted module versions, proxy has to refetch them from the origin servers and service only if the checksums match what it observed before. Refetching causes increased response time and adds extra load to proxy.golang.org and source source hosting servers. The users could set Disable-Module-Fetch HTTP header to avoid refetch But few know about it.

Proposal

  • Change https://index.golang.org/index to return only module versions proxy.golang.org can serve without refetching. That means the module versions with recognized licenses or whose cached copies are still available in proxy's
    storage at the API request time.
  • Users who need the current behavior must explicitly specify the include=all parameter.

Impact on users

We expect users that watch the index as feed won’t be affected. Users who are interested only in module versions that have recognized licenses or are actually used by go users won't be affected either.

Alternatives Considered

Change proxy.golang.org's default not to refetch

We considered the option to disable module fetch by default and enable module fetch explicitly only for the requests sent by the go command. Given the main goal of this proposal was to help users avoid accidental refetches, this could be the ideal solution to the problem. Unfortunately, HTTP requests coming from the go command are not distinguishable from other HTTP requests constructed using net/http. (https://go.dev/issues/35699)

Change index.golang.org's default to return only redistributable module versions

This could help us make the change without introducing the extra column to track the lifetime of the cached copies. However, proxy.golang.org detects licenses heuristically, and it may result in dropping some popular module versions from the index. This limitation would lead users to resort to using the include=all param and bypass this work.

Publish curated module version lists

Many users access index.golang.org to retrieve data for their Go ecosystem analysis, but they often find that only a small fraction of the 16.6 million module versions are relevant to their needs. This is because many of these versions no longer exist, are not valid Go modules (e.g. https://go.dev/issues/31866), or are no longer actively used by the Go community. To address this, we could consider compiling a curated list of module versions periodically and making it available through a different API endpoint.

This would help users who need the real Go modules and reduce the chance of accidental refetching. It would also make proxy.golang.org more transparent about the module versions it has in its long-term storage. However, this would require new feature work, and we cannot prioritize non-trivial new features at this time.

@hyangah hyangah added this to the proxy.golang.org/later milestone Mar 29, 2023
@hyangah hyangah changed the title Exclude unavailable module versions from index.golang.org result index.golang.org: exclude unavailable module versions from default results Mar 29, 2023
@hyangah hyangah removed the Proposal label Mar 29, 2023
@mknyszek mknyszek added the NeedsFix The path to resolution is known, but the work has not been done. label Apr 3, 2023
@hyangah hyangah self-assigned this May 11, 2023
@hyangah
Copy link
Contributor Author

hyangah commented Jun 20, 2023

This is complete. index.golang.org provides updated documentation including the new include parameter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsFix The path to resolution is known, but the work has not been done. proxy.golang.org
Projects
None yet
Development

No branches or pull requests

2 participants