Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: osRelax reduces precision of short sleeps #20937

Closed
johnrs opened this issue Jul 7, 2017 · 16 comments
Closed

runtime: osRelax reduces precision of short sleeps #20937

johnrs opened this issue Jul 7, 2017 · 16 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. OS-Windows
Milestone

Comments

@johnrs
Copy link

johnrs commented Jul 7, 2017

What version of Go are you using (go version)?

What operating system and processor architecture are you using (go env)?

go version go1.9beta2 windows/amd64 on a Windows 7 PC

What did you do?

I ran this test program: https://play.golang.org/p/H6z0X-aAYw

What did you expect to see?

All of the results to be 1000us, as they are under v1.8.3.

What did you see instead?

time.Now() monotonic resolution:  15600us    <== "A"
time.Now() monotonic resolution:   1000us
time.Now() monotonic resolution:   1000us
time.Now() monotonic resolution:   1000us
time.Now() monotonic resolution:   1000us

time.Sleep(3ms)
time.Now() monotonic resolution: 15600us <== "B"
time.Now() monotonic resolution: 15600us <== "B"
time.Now() monotonic resolution: 15600us <== "B"
time.Now() monotonic resolution: 1000us
time.Now() monotonic resolution: 1000us
time.Sleep(3ms)
time.Now() monotonic resolution: 15600us <== "B"
time.Now() monotonic resolution: 15600us <== "B"
time.Now() monotonic resolution: 1000us
time.Now() monotonic resolution: 1000us
time.Now() monotonic resolution: 1000us

Notes:
The exact times may vary based on the speed of your Windows clock.
The results vary a bit for different runs.
Except for the 2 time.Sleep's the program does not block.

Discussion

From Change 38403: "... it only reduces the timer resolution when the Go process is entirely idle. When the process is idle, nothing needs a high resolution [1ms] timer." Well, a time.Sleep(1ms) does!

I very much like the intent, but I see some cases based on monotonic time (ex: time.Sleep, time.After) where it doesn't seem to work correctly. And it will break some code. Indeed it breaks one of my programs which gathers stats every 1ms. Looking at the results above:

"A") When the program starts the clock should be 1ms, not 15.6ms.

"B") Considering time.Sleep() to be idle time means that you can't reliably sleep for time.Durations of less than 15.6ms, even if you have been running continuously for over 30ms (it takes a while for the clock speed switch to take effect).

How to fix this? I have a few thoughts.

"A": Put the startup back to 1ms.

"B": Let the user decide. I envision 3 possible modes (via a new function in runtime):
Always enabled, regardless of the time.Duration (as it is now)
Auto: Enabled for time.Durations > 50ms, where the extra 15ms isn't a problem (default)
Always disabled, regardless of the time.Duration

Or, just use the middle mode and don't consider any time duration of less than 50ms (my testing shows this to be a safe value, but this needs to be validated) to be "idle" when deciding to switch. I think that this would probably solve all of the problems I've been seeing, but leave the change functional for all other cases.

Note: I don't think that telling the use to use the Windows TimeBeginPeriod and TimeEndPeriod calls is wise. It might interfere with Go's internal switching. The nesting could get really messed up, especially if the user decided he liked a different clock rate entirely, like 5ms.

Microsoft: "You must match each call to timeBeginPeriod with a call to timeEndPeriod, specifying the same minimum resolution in both calls. An application can make multiple timeBeginPeriod calls as long as each call is matched with a call to timeEndPeriod."

Of course the user could always spin up a goroutine which was simply an endless loop. That effectively locks the clock at 1ms, but it costs a CPU and the power to run it constantly! I found it handy while doing my testing, however, as it clearly makes the problem disappear, thus confirming that this change is the cause.

Thanks.

@bradfitz bradfitz added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jul 7, 2017
@bradfitz bradfitz added this to the Go1.9 milestone Jul 7, 2017
@bradfitz
Copy link
Contributor

bradfitz commented Jul 7, 2017

@aclements, could you take a look?

@alexbrainman
Copy link
Member

How to fix this? I have a few thoughts.

"A": Put the startup back to 1ms.

CL 38403 pretty much fixed #8687. If we revert CL 38403, then we are back to draining people's laptop batteries (as described in #8687). I prefer to have time.Sleep(1ms) sleep for 15ms, than empty battery.

"B": Let the user decide. I envision 3 possible modes (via a new function in runtime):

"B": Let the user decide. I envision 3 possible modes (via a new function in runtime):
Always enabled, regardless of the time.Duration (as it is now)
Auto: Enabled for time.Durations > 50ms, where the extra 15ms isn't a problem (default)
Always disabled, regardless of the time.Duration

I think you really only need last option. And that option can be achieved by calling TimeBeginPeriod from your program. No?

Alex

@johnrs
Copy link
Author

johnrs commented Jul 7, 2017

No. As I described above it's probably unsafe, especially if used with a different requested speed (like 5ms).
Read the Microsoft quote again. Has anyone studied this and concluded otherwise?

I don't think that either extreme (always fast or always slow) is satisfactory. I love that Go would be able to automatically switch, thus saving the user the trouble of having to put switching calls all over the place. So I'm very much in favor of keeping CL 38403, but changing it to avoid the problem cases I described. With the 50ms threshold you don't lose any speed and you get most of the power savings (all, if you avoid coding short sleeps).

So why not have both? Chrome switches dynamically on my laptop even though it's AC powered. Keeping my CPUs cooler makes then run faster when they do run (the CPU speed is temperature sensitive). And saving power, even AC power, if there is no speed downside makes good sense. It will save a noticeable amount of money in a datacenter, I would think.

@aclements
Copy link
Member

It's really disappointing that the OS can't just figure this out. :(

I don't want to expose any new APIs. The fact that the OS foists this on user space is terrible enough; I don't want to further foist it on Go's users. I like your "auto" suggestion.

How about I simply make it wait for 50 ms of idleness before relaxing the interrupt rate? That's easy to implement (which is important this late in the release cycle) and insensitive to exactly what's causing it to wake up from being idle.

@johnrs
Copy link
Author

johnrs commented Jul 7, 2017

Sounds good to me!

And for the user who wants to disable switching entirely - a goroutine with a tight loop that does a time.Sleep(10ms) should do the trick. :)

@aclements aclements changed the title runtime: Problems due to Change 38403 runtime: osRelax reduces precision of short sleeps Jul 7, 2017
@aclements aclements added the NeedsFix The path to resolution is known, but the work has not been done. label Jul 7, 2017
@gopherbot gopherbot removed the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jul 7, 2017
@gopherbot
Copy link

CL https://golang.org/cl/47832 mentions this issue.

@gopherbot
Copy link

CL https://golang.org/cl/47833 mentions this issue.

gopherbot pushed a commit that referenced this issue Jul 7, 2017
Currently, sysmon waits 60 ms during idle before relaxing. This is
primarily to avoid reducing the precision of short-duration timers. Of
course, if there are no short-duration timers, this wastes 60 ms
running the timer at high resolution.

Improve this by instead inspecting the time until the next timer fires
and relaxing the timer resolution immediately if the next timer won't
fire for a while.

Updates #20937.

Change-Id: If4ad0a565b65a9b3e8c4cdc2eff1486968c79f24
Reviewed-on: https://go-review.googlesource.com/47833
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@johnrs
Copy link
Author

johnrs commented Jul 26, 2017

@aclements
I tested with the new Go 1.9rc1. The main problem is solved, but I'm still seeing a problem at startup. A program appears to start in the 15ms mode instead of the 1ms mode.

Here's what I see: https://play.golang.org/p/Bl2SCvbArH

time.Sleep(0ms)
time.Now() monotonic resolution: 15600us <== Bad
time.Sleep(1ms)
time.Now() monotonic resolution: 1000us <== Good
time.Sleep(10ms)
time.Now() monotonic resolution: 1000us <== Good
time.Sleep(55ms)
time.Now() monotonic resolution: 1000us <== Good
time.Sleep(65ms)
time.Now() monotonic resolution: 15600us <== Good
time.Sleep(100ms)
time.Now() monotonic resolution: 15600us <== Good
time.Sleep(1000ms)
time.Now() monotonic resolution: 15600us <== Good

Workaround: If you start main with 2 "time.Sleep(1)" warmup calls, this will cause the switch to 1ms mode and allow it to complete before continuing. I'm not suggesting that we require speed-sensitive users to do this, however.

Side Note: I'm running Chrome on Windows 7. Normally Chrome rests at 15ms, switching to 1ms occassionaly when it has work to do. For some reason, however, when I display the #20937 page Chrome stays at 1ms - even though there is nothing changing on the page. If I switch to another tab, Chrome goes back to 15ms. This does not happen on #20938, so it must be due to some content on the page. Strange!

Shouldn't the details of the auto-switching behavior be mentioned in the runtime or time package's docs?

I'd also like to revisit a possible problem I mentioned in my initial post. What if a user wants to lock the timing at 1ms, 15ms, or he just wants to control when it switches? Although unusual I admit, I can envision situations where all three of these modes would be desired: Speed regardless of power, power regardless of speed, custom tailored to give best speed only where needed (as per the user).

I don't think that telling them to use the Windows TimeBeginPeriod and TimeEndPeriod calls is wise. It might interfere with Go's auto-switching. The nesting could get really messed up, especially if the user decided he liked a different clock rate entirely, like 5ms.

Microsoft: "You must match each call to timeBeginPeriod with a call to timeEndPeriod, specifying the same minimum resolution in both calls. An application can make multiple timeBeginPeriod calls as long as each call is matched with a call to timeEndPeriod."

So I'm thinking that there either needs to be a way to disable Go's auto-switching (thus letting them use the TimeBegin/EndPeriod calls safely) or a way to tell Go which mode to use: 1ms, 15ms, or auto.

I prefer the second method because: 1) It's easier for the user (no Windows call to deal with). 2) It can be expanded to allow any legal time (like 5ms). 3) Always running at 15ms has previously been shown to sometimes cause a scheduler problem, if memory serves. This option means that when needed, Go can shift to 1ms mode then switch back to what the user wants.

Joke: Another goroutine which is an endless loop will keep you in the 1ms mode, but now you can use a time.Sleep(20ms) for the loop, thus reducing the overhead to almost zip. :)

Whatever solution, I believe that it should be in the standard library.

@aclements
Copy link
Member

A program appears to start in the 15ms mode instead of the 1ms mode.

Hmm. This certainly isn't the intended behavior, and I'm not sure what could cause that. We set the timer resolution to 1ms really early in runtime initialization, and it doesn't look like this program should even be able to enter global idle mode before the 1ms sleep. Is it convenient for you to hack the runtime to add a println(relax) to func osRelax in runtime/os_windows.go and see when that happens?

Shouldn't the details of the auto-switching behavior be mentioned in the runtime or time package's docs?

In my opinion, it's completely ridiculous that the Windows kernel doesn't take care of this transparently already (and making it a global process property? good grief). I really, really don't want to make this even more peoples' problem if it's at all possible to make it transparent.

What if a user wants to lock the timing at 1ms, 15ms, or he just wants to control when it switches?

The runtime gets grumpy when the timer resolution isn't good, so at best we could let users request a higher resolution to stay at when idle. But, not being a Windows developer, I don't understand why someone would want to do this (assuming we can fix whatever bug you ran into a program start up). Can you explain?

I don't think that telling them to use the Windows TimeBeginPeriod and TimeEndPeriod calls is wise. It might interfere with Go's auto-switching. ... So I'm thinking that there either needs to be a way to disable Go's auto-switching (thus letting them use the TimeBegin/EndPeriod calls safely) or a way to tell Go which mode to use: 1ms, 15ms, or auto.

Something I might be okay with would be to expose TimeBeginPeriod and TimeEndPeriod from the syscall package. Those could interact with the runtime to do the right thing (some things in syscall already do this). That would give developers the API they expect without breaking the runtime.

@ianlancetaylor
Copy link
Contributor

I think that any discussion of user control over the time period should be a separate issue directed at 1.10.

@aclements
Copy link
Member

Good point @ianlancetaylor. @johnrs, mind opening a new issue?

@johnrs
Copy link
Author

johnrs commented Jul 26, 2017

Is it convenient for you to hack the runtime to add a println(relax) to func osRelax in runtime/os_windows.go and see when that happens?

I should be able to give it a try later today.

I really, really don't want to make this even more peoples' problem if it's at all possible to make it transparent.

I understand, and I think that the auto-switching is great for most people. But there will be some who want to optimize for lowest power and others who want to optimize for greatest speed (no loss of compute time due to >60ms sleep extended by 15ms). At the moment I'm not so worried about the "power" guys because as it is now it's much better than it was. But the "speed" guys might notice the timing jitter.

Consider a very simple program which just samples something every 100ms. 15ms of jitter in that measurement time affects the quality (accuracy) of the data collected for each sample, even though the grand total will still be correct. That's why I'm keen on documenting it so they know how to avoid the problem in current and/or future code. They can use multiple shorter sleeps for example. Or even the extra goroutine with the 10ms sleep loop trick.

And perhaps this is enough - just documenting it - and it isn't worth giving them any direct control over the switching. But the auto-switching is changing behavior, so I think they deserve to know.

But, not being a Windows developer, I don't understand why someone would want to do this (assuming we can fix whatever bug you ran into a program start up). Can you explain?

The goal isn't Windows specific, just the problem. Someone running off a battery and not doing any heavy computing will probably be most interested in saving every bit of power he can. The auto-switch does that for him pretty well - if he knows to avoid <60ms sleeps. 3rd part libraries have to be checked, too.

On the other hand, someone doing something time sensitive (as per my example above) might not ever want to run at anything but the fastest clock rate.

Something I might be okay with would be to expose TimeBeginPeriod and TimeEndPeriod from the syscall package. Those could interact with the runtime to do the right thing (some things in syscall already do this). That would give developers the API they expect without breaking the runtime.

This would avoid the nesting problem? If so, interesting!

@johnrs
Copy link
Author

johnrs commented Jul 26, 2017

Good point @ianlancetaylor. @johnrs, mind opening a new issue?

Glad to. I'll wait till the dust settles on this first.

@johnrs
Copy link
Author

johnrs commented Jul 27, 2017

FYI: Legal clock intervals (in ms) on my PC: 0.5, 1.0, 1.25, 2.5, 5.0, 10.0, 15.6. At startup, monotonic tick is 15ms, but wall clock tick is 1ms. The same 15ms monotonic tick is also the case after a long (>60ms) sleep.

Is it convenient for you to hack the runtime to add a println(relax) to func osRelax in runtime/os_windows.go and see when that happens?

Done. Here is what I see:

false
time.sleep(1ms)  ==>  sleep = 15600us    mono = 15600us  <==  Slow
time.sleep(1ms)  ==>  sleep =  1000us    mono =  1000us  <==  OK
time.sleep(1ms)  ==>  sleep =  1000us    mono =  1000us  <==  OK
time.sleep(70ms) ==>
true
false
    [...]             sleep = 78401us    mono = 15600us  <==  OK
time.sleep(1ms)  ==>  sleep = 15600us    mono = 15600us  <==  Slow
time.sleep(1ms)  ==>  sleep =  1000us    mono =  1000us  <==  OK
time.sleep(1ms)  ==>  sleep =  1000us    mono =  1000us  <==  OK

It seems that the monotonic tick takes about 15-30ms to switch from slow to fast. If this is so, I can't think of a solution. It's just something that we'll have to live with.

Note: This does underscore the need by some for a method to lock the mono tick at 1ms. I'll carry this forward to the upcoming user control issue.

Meanwhile, it also underscores the need to document the behavior and present the workaround. A goroutine with time.sleep(50ms) in an infinite loop does work. The Windows TimeBeginPeriod would be better - but ONLY if it is called with 1ms, else the nesting problem and Microsoft didn't describe the consequences (in what I read). So exposing the runtime functions (hard coded for 1ms) sounds good.

@aclements
Copy link
Member

aclements commented Jul 27, 2017

Something I might be okay with would be to expose TimeBeginPeriod and TimeEndPeriod from the syscall package. Those could interact with the runtime to do the right thing (some things in syscall already do this). That would give developers the API they expect without breaking the runtime.

This would avoid the nesting problem? If so, interesting!

I'm simply thinking that we would emulate the effect of having two different processes changing the timer resolution, where one "process" is the runtime and the other "process" is user code calling syscall.TimeBeginPeriod. Internally, we can combine the two and feed the result to the actual syscall.

It seems that the monotonic tick takes about 15-30ms to switch from slow to fast. If this is so, I can't think of a solution. It's just something that we'll have to live with.

That's unfortunate, but at least explains what's going on.

Note: This does underscore the need by some for a method to lock the mono tick at 1ms. I'll carry this forward to the upcoming user control issue.

Thanks.

@johnrs
Copy link
Author

johnrs commented Jul 27, 2017

I'm simply thinking that we would emulate the effect of having two different processes changing the timer resolution, where one "process" is the runtime and the other "process" is user code calling syscall.TimeBeginPeriod. Internally, we can combine the two and feed the result to the actual syscall.

If they are combined I don't see how this prevent the nesting problem in general. But for those (most?) who would just want to set it to "fast" or "slow" at the start of the program it sounds good.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. OS-Windows
Projects
None yet
Development

No branches or pull requests

6 participants