database/sql: sporadic RST when CPU throttled to 0 #45102
Labels
NeedsInvestigation
Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
We've since mitigated the issue by moving away from using
database/sql
, but it was still reproducible in 1.16.1.What operating system and processor architecture are you using (
go env
)?Single-static binary packaged into a
scratch
container.What did you do?
Despite configuring the maximum idle connections and maximum connection lifetime, we continued to encounter edge cases where the
database/sql
package would use a connection that had exceeded the max connection lifetime. Since the upstream database had already closed the connection, we'd get an RST. It could happen at any code path and any query, so adding retry logic proved difficult.The
database/sql
package isn't very observable, so we don't have true evidence to provide. However, this behavior occurred most commonly when our orchestration platform scaled the CPU on the container to 0. The container was still running (and time was still passing), but the container was unable to do any "work". Then, when the orchestrator increases the CPU, a race occurs. Sometimes the connection would correctly be marked as unusable because its max time had failed, but sometimes it would be returned from the pool as "healthy" even though it had exceeded the maximum lifetime. We attempted to trace the package to understand how this could happen, but we were unsuccessful (@jeremyfaller).We moved away from using
database/sql
to using pgx (and pgxpool) directly. That library has various hooks into connections that allow us to customize observability and, most importantly, verify the connection is healthy immediately before use. Since switching libraries, we haven't experienced any RSTs.The text was updated successfully, but these errors were encountered: