Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

database/sql: sporadic RST when CPU throttled to 0 #45102

Open
sethvargo opened this issue Mar 18, 2021 · 3 comments
Open

database/sql: sporadic RST when CPU throttled to 0 #45102

sethvargo opened this issue Mar 18, 2021 · 3 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@sethvargo
Copy link
Contributor

What version of Go are you using (go version)?

1.16.2

Does this issue reproduce with the latest release?

We've since mitigated the issue by moving away from using database/sql, but it was still reproducible in 1.16.1.

What operating system and processor architecture are you using (go env)?

Single-static binary packaged into a scratch container.

What did you do?

Despite configuring the maximum idle connections and maximum connection lifetime, we continued to encounter edge cases where the database/sql package would use a connection that had exceeded the max connection lifetime. Since the upstream database had already closed the connection, we'd get an RST. It could happen at any code path and any query, so adding retry logic proved difficult.

The database/sql package isn't very observable, so we don't have true evidence to provide. However, this behavior occurred most commonly when our orchestration platform scaled the CPU on the container to 0. The container was still running (and time was still passing), but the container was unable to do any "work". Then, when the orchestrator increases the CPU, a race occurs. Sometimes the connection would correctly be marked as unusable because its max time had failed, but sometimes it would be returned from the pool as "healthy" even though it had exceeded the maximum lifetime. We attempted to trace the package to understand how this could happen, but we were unsuccessful (@jeremyfaller).

We moved away from using database/sql to using pgx (and pgxpool) directly. That library has various hooks into connections that allow us to customize observability and, most importantly, verify the connection is healthy immediately before use. Since switching libraries, we haven't experienced any RSTs.

@cherrymui cherrymui added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Mar 18, 2021
@cherrymui cherrymui added this to the Backlog milestone Mar 18, 2021
@cherrymui
Copy link
Member

cc @bradfitz @kardianos @kevinburke

@ulikunitz
Copy link
Contributor

Apparently you are connecting to a PostgreSQL database. What database/sql driver are you using? Have you used the pgx database/sql driver?

@sethvargo
Copy link
Contributor Author

We moved away from using database/sql to using pgx (and pgxpool) directly, and the problem does not exist in those libraries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

3 participants