You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
index.golang.org serves the index of module versions stored in proxy.golang.org and sum.golang.org. Users can access the data using the API ((doc)[https://proxy.golang.org]).
The API returns up to 1000 {module, version, timestamp} tuples added since the specified timestamp.
This index is the data source of pkg.go.dev, and some Go community members rely on it to retrieve data from the proxy and the checksum db for ecosystem-wide data analysis or mirroring.
The index reports all module versions that proxy.golang.org and sum.golang.org have ever served. As of today (March 2023), proxy.golang.org has observed and served ~16.7 million unique module versions
Problems
The current API is designed primarily for users who want to use it as a live feed of newly added module versions. For example, pkg.go.dev periodically polls the API endpoint to learn about newly added module versions.
This API is not ideal if users want the list of module versions available for redistribution from proxy.golang.org:
The result includes all module versions that proxy.golang.org has ever observed. Some may no longer exist in the proxy.golang.org’s storage, or in the source repositories. For example, module versions without recognized licenses may be evicted from proxy's storage after not being requested for some time.
Users of the index.golang.org API most likely send follow-up queries to proxy.golang.org to download the actual contents of the module versions. If these users attempt to download evicted module versions, proxy has to refetch them from the origin servers and service only if the checksums match what it observed before. Refetching causes increased response time and adds extra load to proxy.golang.org and source source hosting servers. The users could set Disable-Module-Fetch HTTP header to avoid refetch But few know about it.
Proposal
Change https://index.golang.org/index to return only module versions proxy.golang.org can serve without refetching. That means the module versions with recognized licenses or whose cached copies are still available in proxy's
storage at the API request time.
Users who need the current behavior must explicitly specify the include=all parameter.
Impact on users
We expect users that watch the index as feed won’t be affected. Users who are interested only in module versions that have recognized licenses or are actually used by go users won't be affected either.
Alternatives Considered
Change proxy.golang.org's default not to refetch
We considered the option to disable module fetch by default and enable module fetch explicitly only for the requests sent by the go command. Given the main goal of this proposal was to help users avoid accidental refetches, this could be the ideal solution to the problem. Unfortunately, HTTP requests coming from the go command are not distinguishable from other HTTP requests constructed using net/http. (https://go.dev/issues/35699)
Change index.golang.org's default to return only redistributable module versions
This could help us make the change without introducing the extra column to track the lifetime of the cached copies. However, proxy.golang.org detects licenses heuristically, and it may result in dropping some popular module versions from the index. This limitation would lead users to resort to using the include=all param and bypass this work.
Publish curated module version lists
Many users access index.golang.org to retrieve data for their Go ecosystem analysis, but they often find that only a small fraction of the 16.6 million module versions are relevant to their needs. This is because many of these versions no longer exist, are not valid Go modules (e.g. https://go.dev/issues/31866), or are no longer actively used by the Go community. To address this, we could consider compiling a curated list of module versions periodically and making it available through a different API endpoint.
This would help users who need the real Go modules and reduce the chance of accidental refetching. It would also make proxy.golang.org more transparent about the module versions it has in its long-term storage. However, this would require new feature work, and we cannot prioritize non-trivial new features at this time.
The text was updated successfully, but these errors were encountered:
hyangah
changed the title
Exclude unavailable module versions from index.golang.org result
index.golang.org: exclude unavailable module versions from default results
Mar 29, 2023
Background
index.golang.org
serves the index of module versions stored inproxy.golang.org
andsum.golang.org
. Users can access the data using the API ((doc)[https://proxy.golang.org]).The API returns up to 1000 {module, version, timestamp} tuples added since the specified timestamp.
This index is the data source of pkg.go.dev, and some Go community members rely on it to retrieve data from the proxy and the checksum db for ecosystem-wide data analysis or mirroring.
The index reports all module versions that proxy.golang.org and sum.golang.org have ever served. As of today (March 2023), proxy.golang.org has observed and served ~16.7 million unique module versions
Problems
The current API is designed primarily for users who want to use it as a live feed of newly added module versions. For example, pkg.go.dev periodically polls the API endpoint to learn about newly added module versions.
This API is not ideal if users want the list of module versions available for redistribution from proxy.golang.org:
The result includes all module versions that proxy.golang.org has ever observed. Some may no longer exist in the proxy.golang.org’s storage, or in the source repositories. For example, module versions without recognized licenses may be evicted from proxy's storage after not being requested for some time.
Users of the index.golang.org API most likely send follow-up queries to proxy.golang.org to download the actual contents of the module versions. If these users attempt to download evicted module versions, proxy has to refetch them from the origin servers and service only if the checksums match what it observed before. Refetching causes increased response time and adds extra load to proxy.golang.org and source source hosting servers. The users could set
Disable-Module-Fetch
HTTP header to avoid refetch But few know about it.Proposal
storage at the API request time.
include=all
parameter.Impact on users
We expect users that watch the index as feed won’t be affected. Users who are interested only in module versions that have recognized licenses or are actually used by go users won't be affected either.
Alternatives Considered
Change proxy.golang.org's default not to refetch
We considered the option to disable module fetch by default and enable module fetch explicitly only for the requests sent by the go command. Given the main goal of this proposal was to help users avoid accidental refetches, this could be the ideal solution to the problem. Unfortunately, HTTP requests coming from the go command are not distinguishable from other HTTP requests constructed using
net/http
. (https://go.dev/issues/35699)Change index.golang.org's default to return only redistributable module versions
This could help us make the change without introducing the extra column to track the lifetime of the cached copies. However, proxy.golang.org detects licenses heuristically, and it may result in dropping some popular module versions from the index. This limitation would lead users to resort to using the
include=all
param and bypass this work.Publish curated module version lists
Many users access index.golang.org to retrieve data for their Go ecosystem analysis, but they often find that only a small fraction of the 16.6 million module versions are relevant to their needs. This is because many of these versions no longer exist, are not valid Go modules (e.g. https://go.dev/issues/31866), or are no longer actively used by the Go community. To address this, we could consider compiling a curated list of module versions periodically and making it available through a different API endpoint.
This would help users who need the real Go modules and reduce the chance of accidental refetching. It would also make proxy.golang.org more transparent about the module versions it has in its long-term storage. However, this would require new feature work, and we cannot prioritize non-trivial new features at this time.
The text was updated successfully, but these errors were encountered: