Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

all: add GOARM64=v8.1 and so on #60905

Closed
aktau opened this issue Jun 20, 2023 · 26 comments
Closed

all: add GOARM64=v8.1 and so on #60905

aktau opened this issue Jun 20, 2023 · 26 comments
Labels
arch-arm64 compiler/runtime Issues related to the Go compiler and/or runtime. Proposal Proposal-Accepted
Milestone

Comments

@aktau
Copy link
Contributor

aktau commented Jun 20, 2023

Similar to: #45453.

Each ARM sub-architecture iteration adds some new instructions, some of which may be useful for Go. For example, ARMv8.2 introduced:

  • v8.2: FEAT_LSE, Large System Extensions (though it looks like there have been changes to this extension in later iterations)

These instructions are already used after a capability check (thanks @prattmic for pointing this out). These capability check jumps predict perfectly, but the blocks aren't tiny either (godbolt):

image

It might be useful to define some new possible GOARM values akin to GOAMD to elide these checks, potentially increasing performance and reducing binary sizes by some tiny amount. I have not done any performance measurements, but received hearsay from colleagues that using these instructions was a significant win for them (but I did not ask whether they used a capability check or whether they were comparing with always using the less capable version).

Other potential candidate:

  • v8.1: FEAT_CRC32, Changes to CRC32 instructions (made mandatory)
  • v8.8: FEAT_MOPS, Standardization of memory operations, slides). runtime.memmove is >1% of cycles when looking at data available here. Using FEAT_MOPS unconditionally could also have good icache effects.
  • v8.8: FEAT_HBC, Hinted conditional branches (perhaps indented branches or anything that can be proven to contain a non-nil error via dataflow could be hinted "unlikely").
@aktau aktau added the Proposal label Jun 20, 2023
@prattmic prattmic added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jun 20, 2023
@aktau aktau changed the title all: add GOARM selectors for ARMv8.2 all: add GOARM selectors for ARMv8.1 (et al.) Jun 20, 2023
@ianlancetaylor ianlancetaylor changed the title all: add GOARM selectors for ARMv8.1 (et al.) proposal: all: add GOARM selectors for ARMv8.1 (et al.) Jun 20, 2023
@ianlancetaylor
Copy link
Contributor

The currently valid values for GOARM are 5, 6, and 7. Are you suggesting that we add 8 and 8.1 as valid values?

@gopherbot gopherbot added this to the Proposal milestone Jun 20, 2023
@Jorropo
Copy link
Member

Jorropo commented Jun 21, 2023

It would be surprising to have GOARM used both by GOARCH=arm and GOARCH=arm64 imo should be a GOARM64.

@ianlancetaylor
Copy link
Contributor

Good point, thanks, it should be GOARM64.

That may mean that we don't need to be compatible with GOARM, and could use v8, v8.1, ....

@rsc rsc changed the title proposal: all: add GOARM selectors for ARMv8.1 (et al.) proposal: all: add GOARM64=v8.1 and so on Jun 21, 2023
@rsc
Copy link
Contributor

rsc commented Jun 21, 2023

The leading v seems like it emphasizes that this is the ARMv8.1 v numbers not any other kind of ARM numbering (like ARM11 which implements ARMv6). It also makes sure that nothing tries to strconv.Atoi("8") and then get messed up by strconv.Atoi("8.1"). So I think we should keep the leading v.

@rsc
Copy link
Contributor

rsc commented Jun 21, 2023

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@scweng
Copy link

scweng commented Jun 22, 2023

I would like to sound extra support for retaining v and emphasize that it's not a number -- a number is not enough to represent Arm capabilities.

Currently Arm defines Armv8.0~Armv8.9 and Armv9.0~Armv9.3, but Armv9.0 is not the next version of Armv8.9, it actually branched off Armv8.5. Parsing the version into number will create an illusion of linear progression.

Further, each of these versions are actually "extensions." Each extension defines a number of "features," e.g. FEAT_LSE, with some being mandatory and some optional. Some, like LSE, are optional up to a version (8.0) and then become mandatory later (8.1).

Because of this, both gcc and clang have complicated ways to precisely specify what Arm version to target: https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html#index-march-2, https://lists.llvm.org/pipermail/llvm-dev/2018-September/126346.html

In this system, the compiler cannot assume the target machine has LSE instructions if we only compile with -march=armv8-a and thus needs to emit capability tests. But -march=armv8-a+lse means targeting a machine that implements all the mandatory features defined in the base v8 ISA, but also assume the machine has LSE, which would be enough for the compiler to skip the capability test and use instructions like LDADD directly.

My suggestion is we also implement similar notations in GOARM64 so that GOARM64=v8.1-a and GOARM64=v8-a+lse both suffice in using LSE instructions directly.

@rsc
Copy link
Contributor

rsc commented Jun 28, 2023

Is there a list of all the ARM64 versions somewhere, so we know what to accept?

As far as adding other modifiers, the other architectures have set the precedent of using comma-separated lists, so it would be GOARM64=v8.1,lse not v8.1+lse.

@prattmic
Copy link
Member

I believe the canonical source is the ARM Architecture Reference Manual. Chapter A2 describes the versions. It is not at all consise; a summary table would be nice, but it does describe the requirements if you dive into the details. e.g., "FEAT_SB, Speculation Barrier ... This feature is OPTIONAL in Armv8.0 implementations and mandatory in Armv8.5 implementations."

The full list of features is very long. I'm personally a bit skeptical about having GOARM64 accept the full set when we'll likely only adjust compilation based on a few of them.

@rsc
Copy link
Contributor

rsc commented Jun 28, 2023

Sorry, I wasn't asking for the full set of features, only for the full set of version numbers, which I hope is much smaller.

@rsc
Copy link
Contributor

rsc commented Jun 28, 2023

We need the full set so we know which (older) build tags to enable when we have GOARM64=v8.3 for example.

@prattmic
Copy link
Member

Those are easier! From the same chapter, I believe the full set of versions is v8.0 through v8.9 and v9 [1] through v9.3.

[1] The document uses the name v8.0 for the first v8 version, but v9 (not v9.0) for the first v9 version. I'm not sure if that is an intentional difference or not.

@prattmic
Copy link
Member

prattmic commented Jun 28, 2023

It is also worth noting that v8 and v9 overlap.

  • v9 implements v8.5
  • v9.1 implements v8.6
  • v9.2 implements v8.7
  • v9.3 implements v8.8

(I don't know why there is no v9.4 for v8.9...) Edit: https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/arm-a-profile-architecture-2022 mentions a v9.4 to accompany v8.9, but it seems to be missing from the manual.

i.e., the v9 build should not include the v8.6 build tag.

@rsc
Copy link
Contributor

rsc commented Jul 5, 2023

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

@rsc
Copy link
Contributor

rsc commented Jul 12, 2023

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

@rsc rsc changed the title proposal: all: add GOARM64=v8.1 and so on all: add GOARM64=v8.1 and so on Jul 12, 2023
@rsc rsc modified the milestones: Proposal, Backlog Jul 12, 2023
@erifan
Copy link

erifan commented Jul 13, 2023

[1] The document uses the name v8.0 for the first v8 version, but v9 (not v9.0) for the first v9 version. I'm not sure if that is an intentional difference or not.

I think V9.0 is the first version of V9, and V9 is the major version number. There is also V9.0 in the document, non-strictly we can say that V9 refers to V9.0, but V9.0 is more accurate. https://developer.arm.com/documentation/102378/0201/What-do-Armv8-x-A-and-Armv9-x-mean-?lang=en

@erifan
Copy link

erifan commented Jul 13, 2023

(I don't know why there is no v9.4 for v8.9...) Edit: https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/arm-a-profile-architecture-2022 mentions a v9.4 to accompany v8.9, but it seems to be missing from the manual.

From here https://developer.arm.com/documentation/ka005398/1-0/?lang=en v9.4 should be compatible with v8.9.

@zhangfannie
Copy link
Contributor

[1] The document uses the name v8.0 for the first v8 version, but v9 (not v9.0) for the first v9 version. I'm not sure if that is an intentional difference or not.

The v9.0 is still the first version of v9. This document https://developer.arm.com/documentation/102378/0201/Processor-implementation?lang=en mentions that Armv9.0-A is the base specification and original release.

@mknyszek
Copy link
Contributor

Given that this is accepted, in triage I think we've determined that we just need someone to work on it. If you plan to work on it, please reply here and I'll remove the help wanted label. Thanks!

@mauri870
Copy link
Member

mauri870 commented Aug 8, 2023

I'm thinking about some implementation details for this, would be good to clarify if this makes sense from what I gathered from the comments here.

Sounds like we have two main toggles, the version number (vX.X) and the feature set (e.g FEAT_LSE).

These two can be mixed and matched in a couple ways:

  • Versions might have a certain feature out of the box (eg v8.1 ships with LSE)
  • Others might have optional features (eg v8.1 has optional FEAT_SB)
  • Some versions implement protocols from another version (eg v9.0 inherits the set of features from v8.5, its own plus any optional features)

So we need to take proper care in order to validate that this mix of pairs is enforced properly.

Syntax would be GOARM64=vX.X,lse,sb, version followed by a comma separated list of optional features.

From what I understood the idea is to reduce the amount of instructions being put on the final binary by selectively filtering certain cpu capabilities. Regarding the actual implementation, should we follow the same approach as amd64 flags, for example avx2, where there is a proper guard at runtime to prevent certain instructions from being used if the cpu does not support them?

eg

#ifndef hasArm64LSE
	CMPB	internalcpu·arm64+const_offsetArm64HasLSE(SB), $1
	JEQ     optimized_atomic
	JMP	generic_atomic
#else
	JMP	optimized_atomic
#endif

Also the SSA should be aware of these flags and versions in order to emit instructions based on the selected features?

@andreybokhanko
Copy link
Contributor

I have first version of a patch adding support for GOARM64 + unconditional atomics for ARM8.1/LSE ready. Unless someone else is also happen to work on this at the moment and ready to publish his/her changes (@mauri870 ?), let me get stub on this and publish my patch early next week (I still need to comb it before public code review; also, I will split the patch to two: one just adding GOARM64 and another that actually uses GOARM64 for atomics). @mknyszek , please remove "help wanted" label.

For reference, "go test -bench" expectedly shows huge gains for runtime/internal/atomic package:

goos: linux
goarch: arm64
pkg: runtime/internal/atomic
                 │    old.txt    │                 new.txt                 │
                 │    sec/op     │    sec/op      vs base                  │
AtomicLoad64-96    0.4814n ± ∞ ¹   0.7701n ± ∞ ¹         ~ (p=1.000 n=1) ²
AtomicStore64-96    3.851n ± ∞ ¹    3.851n ± ∞ ¹         ~ (p=1.000 n=1) ³
AtomicLoad-96      0.7698n ± ∞ ¹   0.4812n ± ∞ ¹         ~ (p=1.000 n=1) ²
AtomicStore-96      3.851n ± ∞ ¹    3.850n ± ∞ ¹         ~ (p=1.000 n=1) ²
And8-96            10.010n ± ∞ ¹    7.314n ± ∞ ¹         ~ (p=1.000 n=1) ²
And-96             10.010n ± ∞ ¹    7.313n ± ∞ ¹         ~ (p=1.000 n=1) ²
And8Parallel-96     47.28n ± ∞ ¹   308.70n ± ∞ ¹         ~ (p=1.000 n=1) ²
AndParallel-96      44.21n ± ∞ ¹   246.80n ± ∞ ¹         ~ (p=1.000 n=1) ²
Or8-96             10.010n ± ∞ ¹    7.315n ± ∞ ¹         ~ (p=1.000 n=1) ²
Or-96              10.010n ± ∞ ¹    7.319n ± ∞ ¹         ~ (p=1.000 n=1) ²
Or8Parallel-96      42.97n ± ∞ ¹   247.20n ± ∞ ¹         ~ (p=1.000 n=1) ²
OrParallel-96       43.32n ± ∞ ¹   248.20n ± ∞ ¹         ~ (p=1.000 n=1) ²
Xadd-96             42.79n ± ∞ ¹   243.20n ± ∞ ¹         ~ (p=1.000 n=1) ²
Xadd64-96           42.45n ± ∞ ¹   242.40n ± ∞ ¹         ~ (p=1.000 n=1) ²
Cas-96              85.03n ± ∞ ¹    93.75n ± ∞ ¹         ~ (p=1.000 n=1) ²
Cas64-96            84.99n ± ∞ ¹    91.80n ± ∞ ¹         ~ (p=1.000 n=1) ²
Xchg-96             42.45n ± ∞ ¹   242.50n ± ∞ ¹         ~ (p=1.000 n=1) ²
Xchg64-96           42.38n ± ∞ ¹   243.30n ± ∞ ¹         ~ (p=1.000 n=1) ²
geomean             16.06n          33.04n        +105.69%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05
³ all samples are equal

but more importantly, this patch shows repeatable +4.6% performance gain for a real-world application (etcd) on my ARM64 server (Kunpeng920 chip).

@mauri870
Copy link
Member

Thanks for your efforts, @andreybokhanko! I've only begun collecting some notes and haven't started working on an implementation. I'll be happy to review yours!

@gopherbot
Copy link

Change https://go.dev/cl/559555 mentions this issue: cmd/dist,internal: add ARM64 environment variable

@andreybokhanko
Copy link
Contributor

https://go-review.googlesource.com/c/go/+/559555 submitted.

@rsc @mauri870 Please kindly review.

@andreybokhanko
Copy link
Contributor

andreybokhanko commented Feb 8, 2024

Change https://go.dev/cl/559555 mentions this issue: cmd/dist,internal: add ARM64 environment variable

Note that current code review discussion leans towards dropping support for ",lse" and ",crypto", as nobody gave an example of real ARM hardware that implements LSE and not fully supports 8.1 -- as well as real ARM hardware that implements crypto and not fully supports 8.2.

gopherbot pushed a commit that referenced this issue Mar 6, 2024
Adds GOARM64 environment variable with accepted range of values "v8.{0-9}",
"v9.{0-5}" and optional ",lse" and ",crypto" suffixes.

Right now it doesn't affect anything, but can be used in the future to
selectively target specific versions of different ARM64 hardware.

For #60905

Change-Id: I6d530041b6931aa884e34f719f8ec41b1cb03ece
Reviewed-on: https://go-review.googlesource.com/c/go/+/559555
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Shu-Chun Weng <scw@google.com>
Reviewed-by: Fannie Zhang <Fannie.Zhang@arm.com>
@andreybokhanko
Copy link
Contributor

A fix is landed to mainline (https://go-review.googlesource.com/c/go/+/559555), so this can be closed.

@ALTree
Copy link
Member

ALTree commented Mar 6, 2024

Assuming this issue only tracked the addition of the GOARM64 variable, it looks like it can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-arm64 compiler/runtime Issues related to the Go compiler and/or runtime. Proposal Proposal-Accepted
Projects
Status: Accepted
Development

No branches or pull requests