Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

time: Now is "slow" #57749

Closed
pierrre opened this issue Jan 11, 2023 · 16 comments
Closed

time: Now is "slow" #57749

pierrre opened this issue Jan 11, 2023 · 16 comments
Labels
FrozenDueToAge WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.

Comments

@pierrre
Copy link

pierrre commented Jan 11, 2023

What version of Go are you using (go version)?

$ go version
go version go1.20rc2 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/pierre/.cache/go-build"
GOENV="/home/pierre/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/pierre/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/pierre/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/pierre/.gimme/versions/go1.20rc2.src"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/pierre/.gimme/versions/go1.20rc2.src/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.20rc2"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
GOWORK=""
CGO_CFLAGS="-O2 -g"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-O2 -g"
CGO_FFLAGS="-O2 -g"
CGO_LDFLAGS="-O2 -g"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build1508083160=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Disclaimer: yes I know, I'm trying to do something "silly" (benchmarking time.Now()).
But it has a real impact in production for me.

Context: I'm trying to measure the execution time of a function that is "fast" (below 1 microsecond).

I ran a benchmark on my function (with testing.B), so I know it's fast (below 1 microsecond).
But I want to measure its execution time while it's running in production (and report it in my metrics).

So I did something simple:

start := time.Now()
// call my function
dur := time.Since(start)
// report in my metrics

Then I noticed that the measured duration was much higher than the time measured in my benchmark (around 5 microseconds, vs 1 microsecond in my benchmark).

So I did a benchmark that calls time.Now():

func BenchmarkTimeNow(b *testing.B) {
	var res time.Time
	for i := 0; i < b.N; i++ {
		res = time.Now()
	}
	benchRes = res
}
➜  timebench go test -v -bench=.
goos: linux
goarch: amd64
pkg: timebench
cpu: AMD Ryzen 7 PRO 4750U with Radeon Graphics
BenchmarkTimeNow
BenchmarkTimeNow-16       543583              2249 ns/op
PASS
ok      timebench       1.249s

(Yes I closed all other running apps)

I'm very surprised, because this time (2249ns) is very high.
If I remember correctly, on my other computer it was 2 orders of magnitude lower (30ns)

I did a profiling of my benchmark.
All the CPU time is spent in time.now() (yes, lower case now)

My question: how can I measure the execution time of a function, if calling time.Now() is slower than calling my function ?

Feel free to close this issue if you think it's not a problem.

What did you expect to see?

Calling time.Now() should take approximately 30ns.

This is the benchmark result on my other computer:

➜  timebench go test -v -bench=.
goos: linux
goarch: amd64
pkg: timebench
cpu: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
BenchmarkTimeNow
BenchmarkTimeNow-12     35953132                31.34 ns/op
PASS
ok      timebench       1.164s
➜  timebench go version
go version go1.20rc2 linux/amd64

What did you see instead?

Calling time.Now() takes 2249ns, which is slower than the execution time of the function I'm trying to measure.

@ianlancetaylor
Copy link
Contributor

For what it's worth we already have a benchmark for time.Now in the time package. it's called BenchmarkNow.

When I run go test time -test.run=xxx -test.bench=BenchmarkNow on my amd64-linux laptop I get

BenchmarkNow-16             	44539380	        25.24 ns/op

You got similar results on your "other computer."

So tell us more about the computer on which time.Now is slow.

@pierrre
Copy link
Author

pierrre commented Jan 11, 2023

Here is the result:

➜  ~ go test time -test.run=xxx -test.bench=BenchmarkNow
goos: linux
goarch: amd64
pkg: time
cpu: AMD Ryzen 7 PRO 4750U with Radeon Graphics
BenchmarkNow-16             	  528529	      2261 ns/op
BenchmarkNowUnixNano-16     	  532738	      2254 ns/op
BenchmarkNowUnixMilli-16    	  531685	      2253 ns/op
BenchmarkNowUnixMicro-16    	  522962	      2286 ns/op
PASS
ok  	time	7.798s

Both computers run a similar OS: ArchLinux, kernel updated today.

@ianlancetaylor
Copy link
Contributor

There must be some relevant difference between them.

Try building a trivial Go program that calls time.Now and run it on both systems under strace -f. See if there are any differences.

@bradfitz
Copy link
Contributor

@pierrre, are you in a VM? What type?

@ianlancetaylor ianlancetaylor added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Jan 12, 2023
@pierrre
Copy link
Author

pierrre commented Jan 12, 2023

Hello, I will try to run strace, but I don't have access to the other computer today, so I will come back later.

@bradfitz no I'm not using a VM
Both computers are normal laptops with ArchLinux installed, same kernel version.
The "slow" one is a Lenovo Thinkpad (T14S Gen 1, less than 1 year old).
The "fast" one is a Dell XPS 15 (few years old).

Yes my production environment is running in a VM, but I didn't talk about it in this issue.

@pierrre
Copy link
Author

pierrre commented Jan 12, 2023

FYI on my "slow" computer, strace shows

[pid 45410] clock_gettime(CLOCK_MONOTONIC, {tv_sec=12677, tv_nsec=359642702}) = 0
[pid 45410] clock_gettime(CLOCK_REALTIME, {tv_sec=1673525811, tv_nsec=135457247}) = 0

repeated thousands of times with almost no interruption.

I will try on my "fast" computer later (I don't have access to it right now)

@ianlancetaylor
Copy link
Contributor

That looks like the "slow" computer is somehow missing the VDSO that contains a non-syscall implementation of clock_gettime. I'm not sure offhand what could cause that. The VDSO is normally supplied by the kernel. See man vdso for background.

@prattmic
Copy link
Member

You could try searching dmesg for messages containing "VDSO" or "TSC" (or "time", if those yield nothing). My guess would be that the kernel is deciding that the TSC is not available/reliable, thus forcing the VDSO to fall back to system calls.

Given that you see the system calls with strace, that means that VDSO is indeed falling back to system calls (or, less likely, the VDSO doesn't exist at all, but I don't think modern Linux ever omits it entirely). This is the source of the slowness (it is not a Go issue), the remaining question is why it is falling back.

@prattmic
Copy link
Member

You can look at:

/sys/devices/system/clocksource/clocksource0/available_clocksource
/sys/devices/system/clocksource/clocksource0/current_clocksource

To see the available and current clock source. The clock source needs to be tsc for the VDSO to work without fallback (or pvclock/hvclock, but those are only relevant to VM guests).

@prattmic
Copy link
Member

It could be related to one of https://bugzilla.kernel.org/show_bug.cgi?id=202525 or https://bugzilla.kernel.org/show_bug.cgi?id=216146 (both of which should result in dmesg warnings).

@pierrre
Copy link
Author

pierrre commented Jan 13, 2023

Here is the result of strace on my "fast" computer:

[pid  5814] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=5812, si_uid=1000} ---
[pid  5813] <... tgkill resumed>)       = 0
[pid  5814] rt_sigreturn({mask=[]} <unfinished ...>
[pid  5813] nanosleep({tv_sec=0, tv_nsec=10000000},  <unfinished ...>
[pid  5814] <... rt_sigreturn resumed>) = 387849827
[pid  5813] <... nanosleep resumed>NULL) = 0
[pid  5813] epoll_pwait(3, [], 128, 0, NULL, 0) = 0
[pid  5813] nanosleep({tv_sec=0, tv_nsec=10000000}, NULL) = 0
[pid  5813] epoll_pwait(3, [], 128, 0, NULL, 0) = 0
[pid  5813] getpid()                    = 5812
[pid  5813] tgkill(5812, 5814, SIGURG <unfinished ...>

repeated thousands of times

@pierrre
Copy link
Author

pierrre commented Jan 13, 2023

I asked several coworkers at my company to run the same benchmark on various computer models:

5 of them have an Intel CPU, and the benchmark result is "fast": 30ns

1 has exactly the same model/CPU as I have: ThinkPad 14S Gen 1, AMD Ryzen 7 PRO 4750U
His Linux distribution is Fedora 37

$ uname -a
Linux fedora 6.0.18-300.fc37.x86_64 #1 SMP PREEMPT_DYNAMIC Sat Jan 7 17:10:00 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

And he can reproduce the "slow" benchmark result: ~2000ns !

$ go version && go test time -test.run=xxx -test.bench=BenchmarkNow
go version go1.19.2 linux/amd64
goos: linux
goarch: amd64
pkg: time
cpu: AMD Ryzen 7 PRO 4750U with Radeon Graphics
BenchmarkNow-16             	  540297	      2172 ns/op
BenchmarkNowUnixNano-16     	  540292	      2220 ns/op
BenchmarkNowUnixMilli-16    	  547329	      2206 ns/op
BenchmarkNowUnixMicro-16    	  557311	      2191 ns/op
PASS
ok  	time	8.648s

@pierrre
Copy link
Author

pierrre commented Jan 13, 2023

About the clock source:

On my "slow" computer:

➜  ~ cat /sys/devices/system/clocksource/clocksource0/available_clocksource
hpet acpi_pm 
➜  ~ cat /sys/devices/system/clocksource/clocksource0/current_clocksource
hpet

On my "fast" computer:

➜  ~ cat /sys/devices/system/clocksource/clocksource0/available_clocksource
tsc acpi_pm 
➜  ~ cat /sys/devices/system/clocksource/clocksource0/current_clocksource
tsc

As far as I know, I didn't configure anything related to the clock source when I installed these 2 computers

@pierrre
Copy link
Author

pierrre commented Jan 13, 2023

About dmesg:

On my "slow" computer:

➜  ~ sudo dmesg | grep -iE "(vdso|tsc|clocksource)"
[    0.000000] tsc: Fast TSC calibration using PIT
[    0.000000] tsc: Detected 1696.689 MHz processor
[    0.036105] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 6370452778343963 ns
[    0.096170] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 133484873504 ns
[    0.112863] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x1874efa0393, max_idle_ns: 440795291471 ns
[    0.232866] TSC synchronization [CPU#0 -> CPU#1]:
[    0.232866] Measured 3406960412 cycles TSC warp between CPUs, turning off TSC clock.
[    0.232866] tsc: Marking TSC unstable due to check_tsc_sync_source failed
[    0.240266] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 6370867519511994 ns
[    0.302925] clocksource: Switched to clocksource hpet
[    0.309641] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
[   17.853505] SVM: TSC scaling supported

On my "fast" computer:

➜  ~ sudo dmesg | grep -iE "(vdso|tsc|clocksource)"     
[    0.000000] tsc: Detected 2200.000 MHz processor
[    0.000000] tsc: Detected 2199.996 MHz TSC
[    0.041814] TSC deadline timer available
[    0.041877] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 6370452778343963 ns
[    0.123401] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x1fb62f12e8c, max_idle_ns: 440795238402 ns
[    0.151562] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 6370867519511994 ns
[    0.531587] clocksource: Switched to clocksource tsc-early
[    0.559199] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
[    1.587084] tsc: Refined TSC clocksource calibration: 2207.998 MHz
[    1.587089] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x1fd3b6211b4, max_idle_ns: 440795230313 ns
[    1.587143] clocksource: Switched to clocksource tsc

I don't really know how to interpret this

@pierrre
Copy link
Author

pierrre commented Jan 13, 2023

This is the source of the slowness (it is not a Go issue), the remaining question is why it is falling back.

OK I understand, feel free to close this issue if there is nothing we can do on Go side.
Thank you for your help 👍

@prattmic
Copy link
Member

[    0.232866] Measured 3406960412 cycles TSC warp between CPUs, turning off TSC clock.
[    0.232866] tsc: Marking TSC unstable due to check_tsc_sync_source failed

This is the immediate problem. The kernel has decided that the TSC is not reliable and disabled its use.

There is nothing we can do in Go. I suggest reaching out on one of the kernel bugs. https://bugzilla.kernel.org/show_bug.cgi?id=202525 has the same kernel warnings, though a slightly different CPU model.

@prattmic prattmic closed this as not planned Won't fix, can't repro, duplicate, stale Jan 13, 2023
@golang golang locked and limited conversation to collaborators Jan 13, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Projects
None yet
Development

No branches or pull requests

5 participants