Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: piController can produce NaNs if the error term grows in an unbounded manner #51061

Closed
mknyszek opened this issue Feb 8, 2022 · 1 comment
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. release-blocker
Milestone

Comments

@mknyszek
Copy link
Contributor

mknyszek commented Feb 8, 2022

If something goes horribly wrong with the assumptions surrounding the PI controller in the runtime, its internal error state might accumulate in an unbounded manner. In practice this means unexpected Inf and NaN values.

This is more likely to happen in the scavenger, because the assumption there is a proportional response in adjustments to sleep time affecting measured CPU usage, and our measurements of time here have some significant caveats. Namely, we use nanotime for everything and make a pretty naive assumption about what CPU time looks like (which mostly holds). These assumptions break down in, for example, over-subscribed systems (even if they're only transiently over-subscribed).

I think we should just handle this case and fall back to a conservative setting. I also think that we should try to come back to using the controller after some time, because it really does pay to be more aggressive.

The GC pacer also uses this controller, but for a somewhat different purpose. AFAICT this is not an issue there because the controlled value is the output of the controller, so the input and output are always very directly correlated. Also, I was unable to break the controller with the same parameters under a fuzzer.

We've seen this happen very occasionally in internal Google services and lead to a runtime crash trying to set a timer with a garbage value because we try to convert a NaN to an int.

@mknyszek mknyszek added the NeedsFix The path to resolution is known, but the work has not been done. label Feb 8, 2022
@mknyszek mknyszek added this to the Go1.18 milestone Feb 8, 2022
@gopherbot
Copy link

Change https://go.dev/cl/383954 mentions this issue: runtime: catch and avoid piController error overflow

@golang golang locked and limited conversation to collaborators Feb 10, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. release-blocker
Projects
None yet
Development

No branches or pull requests

2 participants