Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/pkgsite: SQL error on insert #43899

Closed
jba opened this issue Jan 25, 2021 · 7 comments
Closed

x/pkgsite: SQL error on insert #43899

jba opened this issue Jan 25, 2021 · 7 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. pkgsite

Comments

@jba
Copy link
Contributor

jba commented Jan 25, 2021

"DB.InsertModule(ctx, Module("git.halogenos.org/xdevs23/blueprint", "v0.11.2")): saveModule(ctx, tx, Module("git.halogenos.org/xdevs23/blueprint", "v0.11.2")): Transact(Default): txFunc(tx): insertUnits(ctx, tx, "git.halogenos.org/xdevs23/blueprint", "v0.11.2"): insertUnits: DB.BulkInsertReturning(ctx, "units", [path_id module_id v1path_id name license_types license_paths redistributable], [105 values], "ON CONFLICT (path_id, module_id) DO UPDATE SET path_id=excluded.path_id, module_id=excluded.module_id, v1path_id=excluded.v1path_id, name=excluded.name, license_types=excluded.license_types, license_paths=excluded.license_paths, redistributable=excluded.redistributable", [id path_id], scanFunc): ERROR: ON CONFLICT DO UPDATE command cannot affect row a second time (SQLSTATE 21000)"

at 2021-01-24T20:48:17.318677958Z

@jba jba added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. pkgsite labels Jan 25, 2021
@jba jba added this to the pkgsite/unplanned milestone Jan 25, 2021
@jba jba self-assigned this Jan 25, 2021
@seankhliao seankhliao changed the title pkgsite: SQL error on insert x/pkgsite: SQL error on insert Jan 25, 2021
@jba
Copy link
Contributor Author

jba commented Jan 25, 2021

I wasn't able to duplicate this running locally against the dev DB.

Nothing to do right now; leaving open in case it happens again.

@jba
Copy link
Contributor Author

jba commented Jan 29, 2021

Saw it again for src.elv.sh@v0.7.0. Could not repro.

@gopherbot
Copy link

Change https://golang.org/cl/287793 mentions this issue: internal/postgres: add logging to debug conflict duplicates

gopherbot pushed a commit to golang/pkgsite that referenced this issue Jan 29, 2021
Add logging to the insertUnits function to see if we can understand
why we occasionally see "ON CONFLICT DO UPDATE command cannot affect
row a second time". According to
https://pganalyze.com/docs/log-insights/app-errors/U126,
it happens because we have duplicate conflict columns.

For golang/go#43899

Change-Id: Idba3018708c3f5ee451e574b2c747929a5e2c30d
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/287793
Trust: Jonathan Amsterdam <jba@google.com>
Run-TryBot: Jonathan Amsterdam <jba@google.com>
TryBot-Result: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jamal Carvalho <jamal@golang.org>
@gopherbot
Copy link

Change https://golang.org/cl/290269 mentions this issue: internal/postgres: run module insert transaction at RepeatableRead

gopherbot pushed a commit to golang/pkgsite that referenced this issue Feb 9, 2021
Increase the isolation level of the transaction that inserts a module
to a level which does not allow non-repeatable reads.

This fixes a bug in which insertPaths, which purports to return a map
of all paths relevant to the module, fails to include some paths.

To understand the bug, it is necessary to understand exactly how
insertPaths works. To avoid a large and slow bulk insert, it first
reads all the paths relevant to the module, then inserts only those
that are missing. The first part is done with a SELECT, and the second
with an INSERT ... ON CONFLICT DO NOTHING ... RETURNING. The paths it
returns combine the ones it read in the SELECT with the new ones that
are inserted. The INSERT statement only returns data for rows it
inserted, not rows it skipped due to a conflict.

The key point to notice here is that insertPaths performs two reads of
the database. The first is, of course, the SELECT. The second occurs
inside the INSERT statement: the database must read the table for
potentially conflicting rows.

At the default isolation level, a transaction can experience
non-repeatable reads. As defined in
https://www.postgresql.org/docs/11/transaction-iso.html, that is when
"a transaction re-reads data it has previously read and finds that
data has been modified by another transaction (that committed since
the initial read)."

To see how a non-repeatable read can happen in insertPaths, imagine
that two different versions of the same module are being inserted at
the same time. They will probably have many of the same paths;
certainly they will share the module path itself. Also imagine that
this is the first time we've seen this module.

The insertPaths call for one version reads the paths and does not
find any of them. It prepares its insert statement.

Meanwhile, the transaction for the other version runs all the way to
completion, committing all the module's paths to the database.

Now the insert statement for the first version runs. It finds that
many (perhaps all) of its paths are already present, because the other
transaction committed them and this second read is not obligated to
repeat the results of the first read. Each path that is present
results in a conflict, and conflicts do not result in a returned
row. These paths were not in the initial SELECT, and they are not
returned by the INSERT, so they do not appear in the result.

By running the transaction at a level that disallows non-repeatable
reads, we can be sure that the second read sees exactly what the first
read saw. Every path will appear in either the SELECT or the rows
returned from the INSERT.

I was able to confirm that this worked with high probability by
repeatedly running a test that inserted multiple versions of the same
module. It failed most of the time at default isolation and never
failed at RepeatableRead. The test will be in a forthcoming CL.

Although this fixes the problem with insertPaths, there may
still be a second problem that results in duplicate-row problem
described in the issue below. So not marking it fixed just yet.

For golang/go#43899

Change-Id: Ia3f0f89da1f2327d9f3d16d1826c5376ee6e5b47
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/290269
Trust: Jonathan Amsterdam <jba@google.com>
Run-TryBot: Jonathan Amsterdam <jba@google.com>
TryBot-Result: kokoro <noreply+kokoro@google.com>
Reviewed-by: Julie Qiu <julie@golang.org>
@gopherbot
Copy link

Change https://golang.org/cl/291609 mentions this issue: internal/database: catch missing retriable errors

gopherbot pushed a commit to golang/pkgsite that referenced this issue Feb 12, 2021
We've noticed that we aren't retrying errors that have the right
serialization code. The likely reason is that the error type is not
what we expect.  Get information about the type so we can recognize it
in the code.

For golang/go#43899

Change-Id: I72aceb12002eecbf580654b77c68d845759e8182
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/291609
Trust: Jonathan Amsterdam <jba@google.com>
Run-TryBot: Jonathan Amsterdam <jba@google.com>
TryBot-Result: kokoro <noreply+kokoro@google.com>
Reviewed-by: Julie Qiu <julie@golang.org>
@jba
Copy link
Contributor Author

jba commented Mar 17, 2021

I think this may have been due to bad logging: we were logging serialization-failure errors at a low level, without realizing that they would be retried higher up. https://golang.org/cl/302190 fixed that.

@jba jba closed this as completed Mar 17, 2021
@golang golang locked and limited conversation to collaborators Mar 17, 2022
@rsc rsc unassigned jba Jun 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. pkgsite
Projects
None yet
Development

No branches or pull requests

2 participants