testing: benchmark is using low resolution time on Windows #31160

egonelbre · 2019-03-30T09:17:26Z

time.Now has a large resolution on Windows, so the benchmark results end up with a timing error.

As a demonstration here's code that measures QueryPerformanceCounter vs. time.Now: https://play.golang.org/p/FN4AtJ9b51m.

f:\Go\src\sandbox\benchmark>go version
go version devel +2bd767b102 Thu Mar 28 09:36:43 2019 +0000 windows/amd64

f:\Go\src\sandbox\benchmark>go run main.go
                             50%       90%       99%
time.Now    Z=9999935    1000000   1005100   1014600
QPC         Z=0              100       200       200

Which roughly says that my getting measurements at 1000000ns or 1ms granularity.

Fixing nanotime would be problematic (as seen in #8687), however benchmarks could use QueryPerformanceCounter instead.

The text was updated successfully, but these errors were encountered:

egonelbre · 2019-03-30T09:22:26Z

I could make an easy patch like I did in https://github.com/rakyll/hey/blob/master/requester/now_windows.go#L24. However since runtime already contains definitions for QPC was wondering whether it can be organized better.

egonelbre · 2019-03-30T09:27:19Z

As a more practical example:

func Benchmark(b *testing.B) {
	for i := 0; i < b.N; i++ {
		fmt.Sprintf("hello world")
	}
}

As we vary benchtime it shows quite a significant timing error:

f:\Go\src\sandbox\benchmark>go test -bench . -benchtime 1000x .
goos: windows
goarch: amd64
pkg: sandbox/benchmark
Benchmark-32                1000                 0.00 ns/op
PASS
ok      sandbox/benchmark       0.261s

f:\Go\src\sandbox\benchmark>go test -bench . -benchtime 100000x .
goos: windows
goarch: amd64
pkg: sandbox/benchmark
Benchmark-32              100000                80.4 ns/op
PASS
ok      sandbox/benchmark       0.285s

f:\Go\src\sandbox\benchmark>go test -bench . -benchtime 1000000x .
goos: windows
goarch: amd64
pkg: sandbox/benchmark
Benchmark-32             1000000                67.0 ns/op
PASS
ok      sandbox/benchmark       0.342s

networkimprov · 2019-03-31T01:13:25Z

Related: #29714

bcmills · 2019-04-12T15:53:43Z

CC @mpvl @josharian

egonelbre · 2019-04-17T18:22:24Z

Ping.

I could make a PR and discuss a specific fix, if that's requires less effort from core maintainers.

josharian · 2019-04-17T18:40:14Z

I don't have much to say about this. A higher resolution benchmarking timer certainly seems desirable. What are the downsides?

cc @alexbrainman for windows

egonelbre · 2019-04-17T18:56:59Z

Some duplicated code for windows and handling for other platforms to be compatible with that.

Specifically the qpc part

go/src/runtime/os_windows.go

Line 381 in 9b6e9f0

    
           // useQPCTime controls whether time.now and nanotime use QueryPerformanceCounter.

. Unless there's a nice way to share that code with testing package.

josharian · 2019-04-17T20:51:26Z

We could use //go:linkname to get access to it. Or an internal package, perhaps.

egonelbre · 2019-04-17T22:42:21Z

After poking it a bit it seems using nanotimeQPC at the moment works only for Wine, since the implementation converts a division into multiplication that doesn't work in the general case.

Usually the way to convert QPC value is result = qpcCounter * 1e9 / qpcFrequency. Currently the implementation uses qpcMultiplier = 1e9 / qpcFrequency and then result = qpcCounter * qpcMultiplier, which only works correctly when frequency is a power of 10.

https://github.com/golang/go/blob/master/src/runtime/os_windows.go#L447

Ideas so far:

add QPC funcs into internal/syscall/windows and use them rather than try to modify runtime to fit the needs
use division inside nanotimeQPC to avoid the error and accept the cost of division
find a faster way to calculate division by qpcFrequency, if there exists one

1 seems the smallest and easiest change. 2 requires adjusting things inside runtime... also, I'm not sure exactly how big the cost of additional division would be in practice. 3 seems slightly overkill, and I don't have a really good idea at the moment.

Based on these thoughts, my instincts tell me to go with 1.

alexbrainman · 2019-04-19T02:00:22Z

As a demonstration here's code that measures QueryPerformanceCounter vs. time.Now: https://play.golang.org/p/FN4AtJ9b51m.

I agree. I also see QueryPerformanceCounter performers better than time.Now on my Windows 10 computer. Each qpcDeltas[i] is about 600-700 (ns), while most timeDeltas[i] are 0.

Fixing nanotime would be problematic (as seen in #8687)

I don't see why we cannot change nanotime to use QueryPerformanceCounter. Remind me, please. I just changed runtime

$ git diff
diff --git a/src/runtime/os_windows.go b/src/runtime/os_windows.go
index d3e84fe..8f1373c 100644
--- a/src/runtime/os_windows.go
+++ b/src/runtime/os_windows.go
@@ -251,10 +251,10 @@ func loadOptionalSyscalls() {
                throw("WSAGetOverlappedResult not found")
        }

-       if windowsFindfunc(n32, []byte("wine_get_version\000")) != nil {
+       //if windowsFindfunc(n32, []byte("wine_get_version\000")) != nil {
                // running on Wine
                initWine(k32)
-       }
+       //}
 }

 //go:nosplit
$

and it seems to be working fine.

As we vary benchtime it shows quite a significant timing error:

f:\Go\src\sandbox\benchmark>go test -bench . -benchtime 1000x .
goos: windows
goarch: amd64
pkg: sandbox/benchmark
Benchmark-32                1000                 0.00 ns/op
PASS
ok      sandbox/benchmark       0.261s

I would not use -benchtime 1000x for this test on Windows. Just let your benchmark run for long enough (default of 1 second is good), and you should be good.

Ideas so far:

add QPC funcs into internal/syscall/windows and use them rather than try to modify runtime to fit the needs

use division inside nanotimeQPC to avoid the error and accept the cost of division

find a faster way to calculate division by qpcFrequency, if there exists one

1 seems the smallest and easiest change. 2 requires adjusting things inside runtime... also, I'm not sure exactly how big the cost of additional division would be in practice. 3 seems slightly overkill, and I don't have a really good idea at the moment.

Based on these thoughts, my instincts tell me to go with 1.

I would (if I had free time) try 2 or 3 first. Maybe division is not expensive enough? Can we do division in every 100's nanotime call to adjust for the drift?

Making runtime.nanotime and time.Now more precise might be beneficial in other areas (not just for benchmarks).

I am also worried about creating too many functions that read time. They all use different approach to collect time. How do you know the time they return is all in sync? And different implementations might diverge.

Alex

egonelbre · 2019-04-19T07:49:19Z

As far as I recall QPC is unreliable for measuring long periods of times.

Best reference mentioning issues I was able to find https://stackoverflow.com/a/5297163/192220. More in depth info on timing issues https://github.com/chromium/chromium/blob/08dbb44e81454b6d67c3b6f4989e7e58e88f4b0b/base/time/time_win.cc#L14. MSDN also only suggests it for measuring short periods of time https://docs.microsoft.com/en-us/windows/desktop/SysInfo/acquiring-high-resolution-time-stamps.

I would not use -benchtime 1000x for this test on Windows. Just let your benchmark run for long enough (default of 1 second is good), and you should be good.

Yeah, I wouldn't use that value either. This is to demonstrate the make the issue more obvious.

alexbrainman · 2019-06-10T02:35:11Z

As far as I recall QPC is unreliable for measuring long periods of times.

Best reference mentioning issues I was able to find https://stackoverflow.com/a/5297163/192220. More in depth info on timing issues https://github.com/chromium/chromium/blob/08dbb44e81454b6d67c3b6f4989e7e58e88f4b0b/base/time/time_win.cc#L14. MSDN also only suggests it for measuring short periods of time https://docs.microsoft.com/en-us/windows/desktop/SysInfo/acquiring-high-resolution-time-stamps.

None of this looks authoritative enough for me. In fact, Microsoft suggests to use QPC for high resolution timers pretty much unconditionally

https://docs.microsoft.com/en-us/windows/desktop/winmsg/about-timers

https://docs.microsoft.com/en-us/windows/desktop/SysInfo/acquiring-high-resolution-time-stamps

Also, given #31528, we will, probably, change nanotime implementation anyway.

Maybe it is good time to address your issue and #31528 together? In particular, see my suggestion about QueryUnbiasedInterruptTimePrecise in #31528 (comment) ? In particular, if QueryUnbiasedInterruptTimePrecise is too slow / expensive to call from nanotime, maybe we could call it only from time.now, and call QueryUnbiasedInterruptTime from nanotime? Would that make Windows benchmarks resolution higher (I did not look at benchmarking code) ?

Thank you/

Alex

egonelbre · 2019-06-10T05:57:47Z

@alexbrainman Yes, changing nanotime would fix this issue. The fix would definitely be better overall, but as I said, it is somewhat scary to me. Then again, maybe the unreliability situation has improved significantly.

alexbrainman · 2019-06-10T06:28:49Z

Then again, maybe the unreliability situation has improved significantly

My feeling exactly. Things change. Do not forget QueryUnbiasedInterruptTimePrecise is only available on Windows 10. But I will take that.

Anyway, we have to come up with a solution for #31528, so lets wait for that before deciding what to do here.

Alex

zx2c4 · 2019-08-25T16:04:18Z

Please don't use QueryUnbiasedInterruptTimePrecise, per #31528 (comment)

egonelbre · 2020-04-07T14:58:33Z

The #31528 was closed with a different solution.

I updated testing to use QueryPerformanceCounter in https://go-review.googlesource.com/c/go/+/227499, but I'm not sure whether this is the best way to go about it. Other ideas how to make benchmarking less noisy on quick runs would helpful.

gopherbot · 2020-04-07T15:00:16Z

Change https://golang.org/cl/227499 mentions this issue: testing: use QueryPerformanceCounter for benchmarks

alexbrainman · 2020-04-12T08:55:47Z

I updated testing to use QueryPerformanceCounter in https://go-review.googlesource.com/c/go/+/227499, but I'm not sure whether this is the best way to go about it. Other ideas how to make benchmarking less noisy on quick runs would helpful.

I spent some time investigating this. And, I agree, if we use QueryPerformanceCounter, we could improve our time.Duration precision from 1 millisecond to around few microsecond. Same with either QueryInterruptTimePrecise or QueryUnbiasedInterruptTimePrecise. But using syscall.Syscall to implement this code (similar to your CL 227499) is 10 times slower than current time.Now implementation. This is what I see

BenchmarkNow
BenchmarkNow-4                                  161259765                7.30 ns/op
BenchmarkQPC
BenchmarkQPC-4                                  11327251               106 ns/op
BenchmarkUnbiasedInterruptTime
BenchmarkUnbiasedInterruptTime-4                13963765                86.8 ns/op
BenchmarkUnbiasedInterruptPreciseTime
BenchmarkUnbiasedInterruptPreciseTime-4         11112705               107 ns/op

I don't think it is acceptable for time.Now. Maybe it is acceptable for testing package benchmark code. I am not sure.

Unrelated to this. Austin tried replacing our current time.Now implementation with asm version of QueryUnbiasedInterruptTime (see https://go-review.googlesource.com/c/go/+/211307 ) - and it is fast enough replacement. Maybe we could do something similar for *Precise version - if only for time.Now implementation, not runtime.nanotime implementation.

Alex

egonelbre · 2020-04-13T10:30:35Z

Based on https://github.com/golang/go/wiki/MinimumRequirements#windows, the minimum required is Windows 2008R2. QueryInterruptTimePrecise is available starting from Windows 10 and Windows Server 2016 (https://docs.microsoft.com/en-us/windows/win32/api/realtimeapiset/nf-realtimeapiset-queryinterrupttimeprecise). The same goes for QueryInterruptTime.

With regards to using QueryInterruptTimePrecise as a time.Now source, I'm not sure what the effects will be.

I also agree that the performance of QPC (and similar) with syscall is not suitable for time.Now. With regards to benchmarking code, the performance is not ideal, however it is a significant improvement over a 1ms error in measurement.

gopherbot · 2024-01-22T08:43:42Z

Change https://go.dev/cl/557315 mentions this issue: testing: use QueryPerformanceCounter on Windows

alexbrainman · 2024-04-27T07:54:08Z

It appears the fix for this issue in CL 557315 broke all arm systems. I created #67077 to keep track of the breakage.

Let me know if I should revert the CL.

Thank you.

Alex

egonelbre · 2024-04-27T08:16:53Z

I'm fine with reverting the CL to avoid blocking other work, but I don't think that's the underlying issue. If there's no immediate fix, then we can revert to get things building again. Anyways, let's continue the discussion in #67077.

alexbrainman · 2024-04-27T08:30:56Z

I'm fine with reverting the CL to avoid blocking other work, but I don't think that's the underlying issue. If there's no immediate fix, then we can revert to get things building again.

I agree. Let's wait for others before we create a revert. We cannot submit the revert anyways without Googlers.

Anyways, let's continue the discussion in #67077.

SGTM.

Alex

gopherbot · 2024-04-27T15:10:31Z

Change https://go.dev/cl/582135 mentions this issue: testing: use time.Since in highPrecisionTime

time.Since has optimizations for measuring monotonic time. For #31160. Change-Id: I0529b9f69b9f008e3414b8e386b6faa64af4a008 Reviewed-on: https://go-review.googlesource.com/c/go/+/582135 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org> Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>

egonelbre changed the title ~~testing: benchmark is using time.Now on Windows for measuring time.~~ testing: benchmark is using low resolution time on Windows for measuring time. Mar 30, 2019

egonelbre changed the title ~~testing: benchmark is using low resolution time on Windows for measuring time.~~ testing: benchmark is using low resolution time on Windows Mar 30, 2019

bcmills added NeedsInvestigation Performance Testing labels Apr 12, 2019

bcmills added this to the Unplanned milestone Apr 12, 2019

alexbrainman mentioned this issue Sep 20, 2020

runtime: windows system clock resolution issue #8687

Closed

dmitshur added NeedsFix FixPending and removed NeedsInvestigation labels Apr 24, 2024

dmitshur modified the milestones: Unplanned, Go1.23 Apr 24, 2024

egonelbre mentioned this issue Apr 26, 2024

time, runtime: use QueryPerformanceCounter in time.Now on Windows to improve time resolution #67066

Open

gopherbot closed this as completed in 1c47049 Apr 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

testing: benchmark is using low resolution time on Windows #31160

testing: benchmark is using low resolution time on Windows #31160

egonelbre commented Mar 30, 2019 •

edited

Loading

egonelbre commented Mar 30, 2019

egonelbre commented Mar 30, 2019 •

edited

Loading

networkimprov commented Mar 31, 2019

bcmills commented Apr 12, 2019

egonelbre commented Apr 17, 2019

josharian commented Apr 17, 2019

egonelbre commented Apr 17, 2019

josharian commented Apr 17, 2019

egonelbre commented Apr 17, 2019

alexbrainman commented Apr 19, 2019

egonelbre commented Apr 19, 2019

alexbrainman commented Jun 10, 2019

egonelbre commented Jun 10, 2019

alexbrainman commented Jun 10, 2019

zx2c4 commented Aug 25, 2019

egonelbre commented Apr 7, 2020

gopherbot commented Apr 7, 2020

alexbrainman commented Apr 12, 2020

egonelbre commented Apr 13, 2020

gopherbot commented Jan 22, 2024

alexbrainman commented Apr 27, 2024

egonelbre commented Apr 27, 2024

alexbrainman commented Apr 27, 2024

gopherbot commented Apr 27, 2024

testing: benchmark is using low resolution time on Windows #31160

testing: benchmark is using low resolution time on Windows #31160

Comments

egonelbre commented Mar 30, 2019 • edited Loading

egonelbre commented Mar 30, 2019

egonelbre commented Mar 30, 2019 • edited Loading

networkimprov commented Mar 31, 2019

bcmills commented Apr 12, 2019

egonelbre commented Apr 17, 2019

josharian commented Apr 17, 2019

egonelbre commented Apr 17, 2019

josharian commented Apr 17, 2019

egonelbre commented Apr 17, 2019

alexbrainman commented Apr 19, 2019

egonelbre commented Apr 19, 2019

alexbrainman commented Jun 10, 2019

egonelbre commented Jun 10, 2019

alexbrainman commented Jun 10, 2019

zx2c4 commented Aug 25, 2019

egonelbre commented Apr 7, 2020

gopherbot commented Apr 7, 2020

alexbrainman commented Apr 12, 2020

egonelbre commented Apr 13, 2020

gopherbot commented Jan 22, 2024

alexbrainman commented Apr 27, 2024

egonelbre commented Apr 27, 2024

alexbrainman commented Apr 27, 2024

gopherbot commented Apr 27, 2024

egonelbre commented Mar 30, 2019 •

edited

Loading

egonelbre commented Mar 30, 2019 •

edited

Loading