runtime: consider CNTVCT_EL0 to implement cputicks on ARM64 #67937
Labels
arch-arm64
compiler/runtime
Issues related to the Go compiler and/or runtime.
NeedsInvestigation
Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Performance
Milestone
To get the current time in seconds, the runtime implements
runtime.nanotime
as a call to the OS (the vdso on GOOS=linux, libc on GOOS=darwin).Profiling and tracing benefit from efficient access to a clock, but don't require a particular scale or offset. (We're in the habit of rescaling it after the fact, by comparing against another clock.) On GOARCH=amd64, the runtime implements
runtime.cputicks
asRDTSCP
(orRDTSC
plus memory fences). It's a bit faster to only read the timer, than to read the timer and then do scaling math.But on GOARCH=arm64, for GOOS=linux, darwin, and several others, we implement
runtime.cputicks
as a call toruntime.nanotime
. It looks like AArch64'sCNTVCT_EL0
register [1] might have what we need forcputicks
(monotonic, also monotonic across cores, static frequency). The Linux vdso [2] appears to use it (plusISB
), or the self-synchronizing versionCNTVCTSS_EL0
.There's also
CNTVCT
for GOARCH=arm [3].Initial benchmarking on an Apple M1 (darwin) and on Raspberry Pi models 5 and 3B (linux) show that reading
CNTVCT_EL0
is bit faster than callingnanotime
(14 vs 24ns, 28 vs 43ns, and 45 vs 126ns on those three platforms). I don't know how much of a difference this will make in complete applications, but cheaper clocks means less worry when adding profiling/tracing points.CC @golang/runtime @golang/arm
[1] https://developer.arm.com/documentation/ddi0595/2021-03/AArch64-Registers/CNTVCT-EL0--Counter-timer-Virtual-Count-register
[2] https://elixir.bootlin.com/linux/v6.9/source/arch/arm64/include/asm/vdso/gettimeofday.h#L69
[3] https://developer.arm.com/documentation/ddi0601/2024-03/AArch32-Registers/CNTVCT--Counter-timer-Virtual-Count-register
The text was updated successfully, but these errors were encountered: