Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/pkgsite: devtools/cmd/seeddb -keep_going defers termination until all modules have had a fetch attempt #48587

Closed
sfllaw opened this issue Sep 23, 2021 · 3 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. pkgsite

Comments

@sfllaw
Copy link
Contributor

sfllaw commented Sep 23, 2021

Current behaviour

When devtools/cmd/seeddb encounters any errors at all, it immediately aborts:

username@host:pkgsite$ go run ./devtools/cmd/seeddb -seed <(echo github.com/quantcast/g2@all)
...
2021/09/23 11:11:35 Info (map[fetch:github.com/quantcast/g2@v0.0.1]): Updated module version state for github.com/quantcast/g2@v0.0.1: code=491, num_packages=0, err=fetchAndInsertModule("github.com/quantcast/g2", "v0.0.1"): FetchModule("github.com/quantcast/g2", "v0.0.1"): module path=github.com/quantcast/g2, go.mod path=github.com/ssmccoy/g2: alternative module; timings: db.UpdateModuleVersionState=0.214s, fetch.FetchModule=0.362s, worker.deleteModule=0.462s, worker.updatedVersionMap=0.132s
2021/09/23 11:11:35 Critical: fetchFunc(ctx, f, "github.com/quantcast/g2", "v0.0.1"): FetchAndUpdateState("github.com/quantcast/g2", "v0.0.1", ""): fetchAndInsertModule("github.com/quantcast/g2", "v0.0.1"): FetchModule("github.com/quantcast/g2", "v0.0.1"): module path=github.com/quantcast/g2, go.mod path=github.com/ssmccoy/g2: alternative module

This is because the run function uses an errgroup.Group, for early termination:

https://cs.opensource.google/go/x/pkgsite/+/fc5196d5:devtools/cmd/seeddb/main.go;l=91-114

	r := results{}
	g := new(errgroup.Group)
	f := &worker.Fetcher{
		ProxyClient:  proxyClient,
		SourceClient: sourceClient,
		DB:           postgres.New(db),
	}
	for path, vers := range versionsByPath {
		path := path
		vers := vers
		// Process versions of the same module sequentially, to avoid DB contention.
		g.Go(func() error {
			for _, v := range vers {
				if err := fetch(ctx, db, f, path, v, &r); err != nil {
					return err
				}
			}
			return nil
		})
	}
	if err := g.Wait(); err != nil {
		return err
	}
	log.Infof(ctx, "Successfully fetched all modules: %v", time.Since(start))

Since historical versions of github.com/quantcast/g2 are broken, you cannot seed with “github.com/quantcast/g2@all”. In addition, if there is any module that is broken in the seed listing, it must be excluded and the seeddb command must be run again. In our codebase, this seems quite common due to early confusion around the “module” directive in go.mod files.

Desired behaviour

When seeding with a list of modules, an operator probably wants to fetch as many valid modules as possible. Understandably, this is the opposite behavior from what a developer wants.

I suggest a -keep_going flag, inspired by GNU make’s --keep-going, which would collect errors until the program has tried to fetch every version:

	r := results{}
	var (
		mu     sync.Mutex
		errors []error
	)
	g := new(errgroup.Group)
	f := &worker.Fetcher{
		ProxyClient:  proxyClient,
		SourceClient: sourceClient,
		DB:           postgres.New(db),
	}
	for path, vers := range versionsByPath {
		path := path
		vers := vers
		// Process versions of the same module sequentially, to avoid DB contention.
		g.Go(func() error {
			for _, v := range vers {
				if err := fetch(ctx, db, f, path, v, &r); err != nil {
					if *keep_going {
						mu.Lock()
						errors = append(errors, err)
						mu.Unlock()
						continue
					}
					return err
				}
			}
			return nil
		})
	}
	if err := g.Wait(); err != nil {
		return err
	}
	if len(errors) != 0 {
		return multierror.New(errors)
	}
	log.Infof(ctx, "Successfully fetched all modules: %v", time.Since(start))

Workaround

Users may manually specify valid versions to seed, but that requires them to know which versions are valid. For our example repository, our seed.txt contains:

github.com/quantcast/g2@v0.6.3
github.com/quantcast/g2@v0.6.4
github.com/quantcast/g2@v0.6.5
github.com/quantcast/g2@v0.6.6
github.com/quantcast/g2@v0.6.7
@gopherbot gopherbot added this to the Unreleased milestone Sep 23, 2021
@jamalc jamalc added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Sep 23, 2021
@jamalc jamalc modified the milestones: Unreleased, pkgsite/unplanned Sep 23, 2021
@jamalc jamalc assigned jamalc and julieqiu and unassigned jamalc Sep 23, 2021
@julieqiu julieqiu assigned jba and unassigned julieqiu Sep 23, 2021
@gopherbot
Copy link

Change https://golang.org/cl/352489 mentions this issue: devtools/cmd/seeddb: add -keep_going

@jba
Copy link
Contributor

jba commented Sep 27, 2021

Thanks for the clear report and the fix.

gopherbot pushed a commit to golang/pkgsite that referenced this issue Sep 27, 2021
Add a flag that continues fetching modules even if there is an error.

For golang/go#48587

Change-Id: I9df1070ee9bb0e206ddd569228239bde5bcb05cd
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/352489
Trust: Jonathan Amsterdam <jba@google.com>
Run-TryBot: Jonathan Amsterdam <jba@google.com>
Reviewed-by: Julie Qiu <julie@golang.org>
@sfllaw
Copy link
Contributor Author

sfllaw commented Sep 29, 2021

:shipit: 🎉

@sfllaw sfllaw closed this as completed Nov 5, 2021
@rsc rsc unassigned jba Jun 23, 2022
@golang golang locked and limited conversation to collaborators Jun 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. pkgsite
Projects
None yet
Development

No branches or pull requests

5 participants