Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: use vDSO to accelerate time.Now on linux/386 #22190

Closed
somersf opened this issue Oct 9, 2017 · 9 comments
Closed

runtime: use vDSO to accelerate time.Now on linux/386 #22190

somersf opened this issue Oct 9, 2017 · 9 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@somersf
Copy link
Contributor

somersf commented Oct 9, 2017

Last updated: 2017-10-09

Abstract

Use the Linux vDSO __vdso_clock_gettime function (if available) to accelerate calls to time.Now() on linux/386.

Background

The Linux kernel can provide a "fast path" for some heavily used system calls which can be satisfied in user space more efficiently. The vDSO is an ELF-formatted virtual dynamic shared library injected into a process address space by the kernel, usually provided through an auxv entry at process startup. Several clock and time-related functions are included in this set of functions. When the vDSO is not present, normal syscalls must be used.

This mechanism is already in use on linux/amd64 to accelerate time.Now().

Proposal

The proposal is to use the same approach as used on linux/amd64 to locate the relevant vDSO function, and use it if available, to accelerate time.Now() on linux/386.

The proposal is to only accelerate the clock functions required to implement time.Now().
No other calls will be affected.

Rationale

There is a significant performance difference between a syscall to obtain a clock, and a corresponding vDSO-based call.

A prototype implementation found that the vDSO path is 5x to 10x faster than the syscall equivalent, depending on processor, virtualization etc.

For certain applications that make heavy use of timestamping (for example, metrics and telemetry), improving the performance of timestamping can make a significant performance improvement overall.

As of go 1.9, time.Now() on linux/386 requires two syscalls, which has doubled the call cost over previous versions. Adding vDSO support would more than pay for this.

Compatibility

If the vDSO-accelerated function is not found at runtime, then the existing syscall implementation will automatically be used as fallback.

The change will be limited to the time functions provided internally in runtime, and used by time.Now(), so that other calls will not be affected.

Implementation

Adapt the code currently in src/runtime/vdso_linux_amd64.go so that it can also be used for ELF32 on linux/386. The initial implementation will be based on a code copy-and-edit so that only linux/386 is affected by the change.

Adapt the runtime.walltime() and runtime.nanotime() functions (in src/runtime/sys_linux_386.s) to check for and use __vdso_clock_gettime if it was found during startup, or fallback to the existing syscall if not.

Refactor The vDSO ELF symbol lookup code to eliminate duplication between linux/386 and linux/amd64. The ELF structure definitions, and required symbols differ between 32-bit and 64-bit, but the lookup code is the same.

Open issues

Number of changesets

I propose implementing this with two changesets:

  • support linux/386 by duplicating code from linux/amd64, so that 386 support can be reviewed/added without disturbing code for other platforms.
  • refactor code affecting both linux/amd64 and linux/386 to eliminate code duplication.

Is this OK, or should a single changeset be used?

Tests

There don't appear to be any explicit tests for linux/amd64 to verify that the fallback path can be called. I'll include a basic test for this covering linux/386 and linux/amd64, though I am unsure if it is necessary, or alternatively - if the test should be enhanced further.

@gopherbot gopherbot added this to the Proposal milestone Oct 9, 2017
@gopherbot
Copy link

Change https://golang.org/cl/69390 mentions this issue: runtime: use vDSO on linux/386 to improve time.Now performance

@gopherbot
Copy link

Change https://golang.org/cl/69391 mentions this issue: runtime: refactor vdso_linux_* to share common code

@mdempsky
Copy link
Member

mdempsky commented Oct 9, 2017

This seems obviously desirable to me.

Some concrete microbenchmark numbers showing that time.Now() is actually faster on linux/386 would be good though.

@ianlancetaylor
Copy link
Contributor

Yes, I'm going to drop this out of the proposal process and make it an ordinary issue.

Thanks for tackling this.

@ianlancetaylor ianlancetaylor changed the title Proposal: Use vDSO to accelerate time.Now() on linux/386 runtime: use vDSO to accelerate time.Now on linux/386 Oct 10, 2017
@ianlancetaylor ianlancetaylor added NeedsFix The path to resolution is known, but the work has not been done. and removed Proposal labels Oct 10, 2017
@ianlancetaylor ianlancetaylor modified the milestones: Proposal, Unplanned Oct 10, 2017
@somersf
Copy link
Contributor Author

somersf commented Oct 10, 2017

I have a VM-based benchmark to hand. I will try to follow up with more comprehensive results tomorrow.

Ubuntu 32-bit running under VMWare on macOS. Proposed patches applied to go.

$ uname -a
Linux enki-ubuntu32 4.10.0-35-generic #39-Ubuntu SMP Wed Sep 13 07:45:58 UTC 2017 i686 i686 i686 GNU/Linux

$ cat /proc/cpuinfo | grep "model name"
model name	: Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz

$ GOOS=linux GOARCH=386 go test -v -bench=. 
=== RUN   TestClockVDSOAndFallbackPaths
--- PASS: TestClockVDSOAndFallbackPaths (0.00s)
goos: linux
goarch: 386
BenchmarkClockVDSOAndFallbackPaths/vDSO         	20000000	        56.8 ns/op
BenchmarkClockVDSOAndFallbackPaths/Fallback     	 3000000	       545 ns/op
PASS
ok  	_/home/frank/local-copy/vtest	3.421s

@somersf
Copy link
Contributor Author

somersf commented Oct 12, 2017

As promised, here are some indicative benchmark results, comparing the go 1.9.1 release, and a version with the proposed vDSO changes. The benchmark code is very simple:

package benchmark

import (
        "testing"
        "time"
)

func BenchmarkTimeNow(b *testing.B) {
        for i := 0; i < b.N; i++ {
                time.Now()
        }
}

Binaries are pre-built with their respective compiler versions, and each test run on a single CPU with:

$ options="-test.cpu=1 -test.count=20 -test.bench=."
$ ./timetest.191 $options > results.191
$ ./timetest.vdso $options > results.vdso
$ benchstat results.191 results.vdso

Intel Celeron

This shows the performance increase on i686, and that amd64 performance on the same hardware is unchanged.

$ cat /proc/cpuinfo | grep "model name"
model name  : Intel(R) Celeron(R) CPU  J1900  @ 1.99GHz
(4x)

Kernel: linux/i686, GOOS=linux GOARCH=386 (6.89x)

name     old time/op  new time/op  delta
TimeNow   978ns ± 0%   142ns ± 0%  -85.48%  (p=0.000 n=16+20)

Kernel: linux/x86_64, GOOS=linux GOARCH=amd64 (1x)

name     old time/op  new time/op  delta
TimeNow   125ns ± 0%   125ns ± 0%   ~     (all equal)

Virtualized

The performance increase in a virtualized environment is more dramatic, presumably due to the overheads of virtualizing the syscall used in go 1.9.1 for GOARCH=386.

VMWare

Host: darwin/amd64

$ cat /proc/cpuinfo | grep "model name"
model name	: Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz

Guest Kernel: linux/i686, GOOS=linux GOARCH=386 (9.36x)

name     old time/op  new time/op  delta
TimeNow   524ns ± 1%    56ns ± 2%  -89.25%  (p=0.000 n=20+19)

Guest Kernel: linux/x86_64, GOOS=linux GOARCH=amd64 (1x)

name     old time/op  new time/op  delta
TimeNow  38.0ns ± 1%  38.0ns ± 1%   ~     (p=0.677 n=19+20)

Docker

Host Kernel: linux/x86_64

$  cat /proc/cpuinfo | grep "model name"
model name	: Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz
(32x)

Guest Kernel: linux/i686, GOOS=linux GOARCH=386 (10.23x)

name     old time/op  new time/op  delta
TimeNow   614ns ± 2%    60ns ± 1%  -90.24%  (p=0.000 n=19+17)

@tsuna
Copy link
Contributor

tsuna commented Oct 18, 2017

This is gonna be a long shot: what are the odds we can get this before Go 1.10, i.e. in a possible 1.9.2? 😅

@dominikh
Copy link
Member

@tsuna None. Point releases are for security fixes and important bug fixes only.

@tsuna
Copy link
Contributor

tsuna commented Oct 19, 2017

Alright alright, I figured that would be the answer but at least I asked 😅

Thanks for getting this in though, we have some linux/i386 environments where this change is more than welcome.

@golang golang locked and limited conversation to collaborators Oct 19, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

6 participants