Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: consider CNTVCT_EL0 to implement cputicks on ARM64 #67937

Open
rhysh opened this issue Jun 11, 2024 · 3 comments
Open

runtime: consider CNTVCT_EL0 to implement cputicks on ARM64 #67937

rhysh opened this issue Jun 11, 2024 · 3 comments
Labels
arch-arm64 compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@rhysh
Copy link
Contributor

rhysh commented Jun 11, 2024

To get the current time in seconds, the runtime implements runtime.nanotime as a call to the OS (the vdso on GOOS=linux, libc on GOOS=darwin).

Profiling and tracing benefit from efficient access to a clock, but don't require a particular scale or offset. (We're in the habit of rescaling it after the fact, by comparing against another clock.) On GOARCH=amd64, the runtime implements runtime.cputicks as RDTSCP (or RDTSC plus memory fences). It's a bit faster to only read the timer, than to read the timer and then do scaling math.

But on GOARCH=arm64, for GOOS=linux, darwin, and several others, we implement runtime.cputicks as a call to runtime.nanotime. It looks like AArch64's CNTVCT_EL0 register [1] might have what we need for cputicks (monotonic, also monotonic across cores, static frequency). The Linux vdso [2] appears to use it (plus ISB), or the self-synchronizing version CNTVCTSS_EL0.

There's also CNTVCT for GOARCH=arm [3].

Initial benchmarking on an Apple M1 (darwin) and on Raspberry Pi models 5 and 3B (linux) show that reading CNTVCT_EL0 is bit faster than calling nanotime (14 vs 24ns, 28 vs 43ns, and 45 vs 126ns on those three platforms). I don't know how much of a difference this will make in complete applications, but cheaper clocks means less worry when adding profiling/tracing points.

CC @golang/runtime @golang/arm

[1] https://developer.arm.com/documentation/ddi0595/2021-03/AArch64-Registers/CNTVCT-EL0--Counter-timer-Virtual-Count-register

[2] https://elixir.bootlin.com/linux/v6.9/source/arch/arm64/include/asm/vdso/gettimeofday.h#L69

[3] https://developer.arm.com/documentation/ddi0601/2024-03/AArch32-Registers/CNTVCT--Counter-timer-Virtual-Count-register

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jun 11, 2024
@prattmic prattmic added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jun 12, 2024
@prattmic prattmic added this to the Backlog milestone Jun 12, 2024
@mauri870
Copy link
Member

I wonder if https://www.felixcloutier.com/x86/rdtsc can be used on x86 as well.

@rhysh
Copy link
Contributor Author

rhysh commented Jun 12, 2024

Yes, GOARCH=amd64 and 386 use it (or RDTSCP) already! I don't know about other architectures, but if we use a more direct/efficient cputicks implementation on arm/arm64 then we'll have covered all of the first-class ports.

https://github.com/golang/go/blob/go1.22.4/src/runtime/asm_amd64.s#L1174
https://github.com/golang/go/blob/go1.22.4/src/runtime/asm_386.s#L870

@mauri870
Copy link
Member

Thanks, I was unaware we were already using it! I think covering all the first class ports would be great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-arm64 compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
Development

No branches or pull requests

4 participants