Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: use CLOCK_MONOTONIC_FAST on FreeBSD? #22942

Closed
bradfitz opened this issue Nov 30, 2017 · 7 comments
Closed

runtime: use CLOCK_MONOTONIC_FAST on FreeBSD? #22942

bradfitz opened this issue Nov 30, 2017 · 7 comments
Labels
FrozenDueToAge NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. OS-FreeBSD
Milestone

Comments

@bradfitz
Copy link
Contributor

bradfitz commented Nov 30, 2017

sys_freebsd_amd64.s says:

TEXT runtime·nanotime(SB), NOSPLIT, $32
        MOVL    $232, AX
        // We can use CLOCK_MONOTONIC_FAST here when we drop                                                                                                    
        // support for FreeBSD 8-STABLE.                                                                                                                        
        MOVQ    $4, DI          // CLOCK_MONOTONIC                                                                                                              
        LEAQ    8(SP), SI
        SYSCALL
        MOVQ    8(SP), AX       // sec                                                                                                                          
        MOVQ    16(SP), DX      // nsec                                                                                                                         

        // sec is in AX, nsec in DX                                                                                                                             
        // return nsec in AX                                                                                                                                    
        IMULQ   $1000000000, AX
        ADDQ    DX, AX
        MOVQ    AX, ret+0(FP)
        RET

We now require FreeBSD 10.3+.

Switch to CLOCK_MONOTONIC_FAST?

I don't know what that is.

@bradfitz bradfitz added NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. OS-FreeBSD labels Nov 30, 2017
@bradfitz bradfitz added this to the Go1.11 milestone Nov 30, 2017
@paulzhol
Copy link
Member

paulzhol commented Dec 1, 2017

CLOCK_MONOTONIC_FAST will use getnanouptime, while CLOCK_MONOTONIC will use nanouptime
https://github.com/freebsd/freebsd/blob/release/11.1.0/sys/kern/kern_time.c#L345.

Functions with the "get" prefix returns a less precise result
much faster than the functions without "get" prefix and should
be used where a precision of 1/hz seconds is acceptable or where
performance is priority. (NB: "precision", not "resolution" !)

according to https://github.com/freebsd/freebsd/blob/master/sys/sys/time.h#L450

In a nutshell nanouptime will also read a TSC/HPET/ACPI timecounter (as configured by the system) and use it's value in addition to the pre-calculated "timehand" value available to getnanouptime.

I don't have any numbers but we're paying the syscall cost, so maybe we should let it do the full work?

@domodwyer
Copy link

domodwyer commented Jan 29, 2018

Hi all,

Just wanted to give some numbers to help this discussion along, we're running:

  • FreeBSD 11.0-RELEASE-p1 #0 r306420
  • kern.eventtimer.timer: HPET
  • kern.hz: 1000
  • go version go1.9.3 freebsd/amd64

We have a frontend HTTP component that handles large amounts of incoming traffic that is then placed into various backends (DB, queues, etc). While profiling CPU stalls we found that 16% was attributed to hpet_get_timecount():

  PMC: [RESOURCE_STALLS.ANY] Samples: 278326 (100.0%) , 73 unresolved

%SAMP IMAGE      FUNCTION             CALLERS
 16.0 kernel     hpet_get_timecount   binuptime:7.8 nanouptime:4.1 nanotime:1.6

After patching to use CLOCK_MONOTONIC_FAST it halved:

  PMC: [RESOURCE_STALLS.ANY] Samples: 1128771 (100.0%) , 5067 unresolved

%SAMP IMAGE      FUNCTION             CALLERS
 8.4 kernel     hpet_get_timecount   binuptime:7.1 nanotime:1.0

Even though we're not CPU bound the above change made a pretty decent improvement to the 99th% latency, largely because the packages used to communicate with the backends are calling time.Now() while holding locks either directly, or indirectly via context.WithDeadline() and others.

When running a go benchmark calling time.Now():

benchmark              old ns/op     new ns/op     delta
BenchmarkTimeNow-8     1513          916           -39.46%

I understand that the reduced precision is an important consideration, but given the default kern.hz is 1000 this works out to be losing sub-ms precision - I'm not sure how many people expect more precision than this, but it isn't an issue for us - we're quite happy running a patched go binary if this change doesn't make it into master, I just thought it would be helpful to share!

Dom

@paulzhol
Copy link
Member

paulzhol commented Feb 9, 2018

@domodwyer I'm working on https://golang.org/cl/93156 as an alternative. The initial version is for a kern.timercounter.hardware=TSC-low though.

I wanted to note that the libc implementation does not differentiate between
_FAST and _PRECISE (the default if _FAST is not used).
They both will get the last available timehand provided by the kernel, and then proceed to read the timecounter to get the delta.

I've asked about this on efnet #bsdcode, the answer I've got is:

jilles: __vdso_clock_gettime always uses the TSC which is quite fast to begin with
jilles: _FAST was originally mainly created to avoid slow hardware like the i8254

I'm assuming HPET is similarly considered fast, when used with mmap to read the counter.

@domodwyer
Copy link

Hi @paulzhol

Using vdso and skipping the syscall entirely sounds like a great idea - if you'd like us to run a comparison when you're ready just let us know. I would expect similar gains with HPET as the source as you say, though I think we could probably use TSC-low in the Go components without any problem.

Dom

@gopherbot
Copy link

Change https://golang.org/cl/108095 mentions this issue: runtime: FreeBSD fast clock_gettime HPET timecounter support

@paulzhol
Copy link
Member

@domodwyer the TSC code is in master, you can give it a try.
https://golang.org/cl/108095 is for HPET timecounter support. It is not nearly as fast as the TSC version but still around 20% less ns/op compared to the syscall path on my AMD FX-8300.

If you can use TSC you really should, but it depends on the hardware/hypervisor providing

kern.timecounter.smp_tsc: 1
kern.timecounter.invariant_tsc: 1

@domodwyer
Copy link

Hi @paulzhol

The patch makes a substantial difference! My environment is now running FreeBSD 11.1-RELEASE #0 r321309 but the relevant sysctls are the same.

Below are comparisons between tags/go1.10.1 and 58c231f running a simple time.Now() benchmark:

name     old time/op  new time/op  delta
Time-40   469ns ± 0%    99ns ± 1%  -79.01%  (p=0.000 n=9+10)

That is an impressive difference, thanks very much for the hard work!

Dom

@golang golang locked and limited conversation to collaborators Apr 26, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. OS-FreeBSD
Projects
None yet
Development

No branches or pull requests

4 participants