New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
internal/cpu: new package to expose processor capabilities #15403
Comments
I'm happy to work on adding this (particularly the I'm interested in what thoughts people have for the API. The three options that occur to me are:
I'm not sure if we want to prefix with the CPU type to avoid naming conflicts. If so then perhaps It might also be nice if the function (or bools) could be inlined and constant folded. In that case I suspect we'd need to limit the number of features to be checked to one per call. This could be useful in scenarios where a feature is optional on say Another possibility is that we do this in the BTW @minux when you say:
Do you mean you want the API to be processor dependent? |
I mean the constants will be cpu dependent. There is no constant folding
though, this package is to check cpu capability at runtime, so I think an
API that looks like this is fine:
cpu.Has(feats cpu.Feature...)
E.g.:
cpu.Has(cpu.FloatingPoint, cpu.AES, ...)
Yeah, we need to figure out a way to name and category the CPU features.
Should each arch be in a separate package (e.g. internal/CPU/arm?) Or
prefix each feature with arch name? Do we consolidate common features
across arches?
|
I think prefixing them with the arch name would be good. That seems to be the convention used for many other constants in golang (opcodes, relocations, etc.). That way there would be no confusion in case some cpu features are close but don't mean exactly the same thing on different architectures. |
One idea is to use the same naming conventions that we have in glibc, for consistency (i.e. glibc/sysdeps/powerpc/dl-procinfo.c:_dl_powerpc_cap_flags). This way, the names match the ones shown when LD_SHOW_AUXV=1: AT_HWCAP: vsx arch_2_06 dfp ic_snoop smt mmu fpu altivec ppc64 ppc32 Querying for the capabilities should take into account whether an arch has a HWCAP2 or not, because there is no way to know (a priori) if a capability bit is in HWCAP or HWCAP2. My suggestion here is to use a concatenated 64-bit HWCAP+HWCAP2 mask, so we can easily map the bits to the capabilities. That's what we did in glibc/gcc to implement __builtin_cpu_is() / __builtin_cpu_supports() for Power. |
@minux are you already working on this? Thanks. |
No, I'm not. Please go ahead if you want to. Thanks.
|
@mundaym I think that starting with a runtime/internal/sys approach is a good idea for now. My idea was to create something that would work similarly to __builtin_cpu_supports() in gcc. What do you think? |
I had a play with this last weekend. I ended up getting a bit lost in circular dependencies... You might want to just add (perhaps hackily) what you need directly into the Along those lines it would be easy to add the variables The constants representing features could go in the |
@mundaym OK. We'll need these checks later, when we start to add the new ISA 3.0 (POWER9) instructions and write runtime optimizations using those. For now, I was thinking about something trivial, like: ceseo@ca09310 then evolve from that. |
CL https://golang.org/cl/31149 mentions this issue. |
This is a more robust method for obtaining the availability of vx. Since this variable may be checked frequently I've also now padded it so that it will be in its own cache line. I've kept the other check (in hash/crc32) the same for now until I can figure out the best way to update it. Updates #15403. Change-Id: I74eed651afc6f6a9c5fa3b88fa6a2b0c9ecf5875 Reviewed-on: https://go-review.googlesource.com/31149 Reviewed-by: Austin Clements <austin@google.com>
This implements a check that can be done at runtime for the ISA level and hardware capability. It follows the same implementation as in s390x. Updates golang#15403 Fixes golang#16643
This implements a check that can be done at runtime for the ISA level and hardware capability. It follows the same implementation as in s390x. Updates golang#15403 Fixes golang#16643
This implements a check that can be done at runtime for the ISA level and hardware capability. It follows the same implementation as in s390x. Updates golang#15403 Fixes golang#16643
CL https://golang.org/cl/32330 mentions this issue. |
…CAP2 This implements a check that can be done at runtime for the ISA level and hardware capability. It follows the same implementation as in s390x. These checks will be important as we enable new instructions and write go asm implementations using those. Updates #15403 Fixes #16643 Change-Id: Idfee374a3ffd7cf13a7d8cf0a6c83d247d3bee16 Reviewed-on: https://go-review.googlesource.com/32330 Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
For an internal package, it seems fine to experiment through the usual code review process. No approval needed here. There's runtime/internal/sys, for example. If some of that gets promoted to plain internal/sys or some other name, that seems OK. |
@minux @laboger @ceseo If not i would like to start on a CL for a small internal/cpu package and then have seperate CLs to clean up the std lib uses for cpu feature detection as listed in #19739. I would just start by providing: FLAG would be named after the names Linux uses but in upper case. which results in: cpu.AMD64.AVX2 and cpu.AMD64.SSE4_1 In further iterations we can add HWCAP bit vectors and if needed more complex query functions or change the init such that the runtime queries the information first and we copy from there. |
@martisch no, it's not in my list. I already have a functional workaround for ppc64x. Feel free to start working on your CL. I can help later by adding any code to make it work for ppc64x. |
@martisch, sounds reasonable, if LOUD. |
After exploring the implementation details and replacing the uses of feature flags in the std lib - i changed the approach for the first version to: This allows us to use the bools directly in assembler and go code. Different architectures can implement additional flags. When we find a name that conflicts between architectures they can share it as i guess for the foreseeable future go will only ever run on one architecture at the same time within a single go runtime. I plan to have the first version ready for review next week after i finalized the initialization code and tested it on a few different cpus. |
Implements detection of x86 cpu features that are used in the go standard library. Changes all standard library packages to use the new cpu package instead of using runtime internal variables to check x86 cpu features. Updates: #15403 Change-Id: I2999a10cb4d9ec4863ffbed72f4e021a1dbc4bb9 Reviewed-on: https://go-review.googlesource.com/41476 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
Basic support for feature flag detection has been added and the std lib usage of cpuid for x86 has been unified. The runtime x86 cpuid detection has been consolidated a bit too and is separate from internal/cpu. There are some more cleanups i like to add in go1.10. e.g. padding of runtime cpu flags and moving the rest of cpuflags_amd64.go into rt0.go. I also plan to work on extending internal/cpu in go1.10 by allowing to mask cpu features in internal/cpu so we can test different code paths in the std lib #12805. |
I have a question: what's the "correct" way of implementing this for architectures that do not have a cpuid instruction? For example, on ppc64x, we currently rely on HWCAP/HWCAP2 bits for that. |
@ceseo, whatever the hardware provides, I guess. If needed (if the instruction is expensive) we can call it once on process startup and cache the results in a global variable. |
@randall77 Power doesn't have anything in the hardware for identifying capabilities/ISA level. That's why we use the HWCAP/HWCAP2 bits exposed by the kernel. In glibc, for instance, I had to modify the TLS ABI and write the HWCAP/HWCAP2 bits inside the TCB, so that __builtin_cpu_is/__builtin_cpu_supports in gcc can read the information from somewhere. In Go, we currently initialize a struct in runtime/os_linux_ppc64x.go and read from there in runtime to identify capabilities. However, that's heavily tied to the initialization procedures in runtime/os_linux.go. Do you have any suggestions for adapting what we currently have so that it can reside inside the new internal/cpu package? |
For internal/ you can just use linkname to expose otherwise unexported parts of of its api. For instance, you could have the runtime call into internal/cpu and pass it the auxv array so it can initialize itself. Initialization order is tricky but that's always the case with early initialization like this. |
I would favor we keep as much isolation of code between runtime and internal/cpu as possible. Since runtime and internal/cpu go versions will stay in sync i think we can expose (not officially export) the HWCAP/HWCAP2 bits the runtime uses and then read those from a internal/cpu (assembler) function. This would keep any code of internal/cpu out of the runtime and if we ever need to read that information elsewhere e.g. for something similar like the popcnt intrinsic for x86 we already have that info readable. There might be downsides to that approach vs using linkname i am missing. If we want to stay with Go code linkname seems the way to go. |
You don't need assembly, you just need to define in internal/cpu:
and then in runtime:
and then when hwcap is processed in runtime today, just set cpu_hwcap too. Then internal/cpu can reads the hwcap variable. Alternately, if runtime can import internal/cpu (I don't particularly see why not), then internal/cpu can export a SetHwcap function and runtime can just call it, no linkname magic required. |
I think generally importing internal/cpu in runtime wont work currently as it will result in an import cycle. |
I don't see any imports in internal/cpu at all. It should be fine for low-level runtime to import it. I would wait for Go 1.10 though. |
I checked with
similar to what cmd/dist/mkdeps.bash is using in runtime/internal: and ./make.bash and cmd/dist/mkdeps.bash give "can't load package: import cycle not allowed" errors for internal/cpu. Could be it is only the tooling that needs to be adjusted for the import to work without cycle errors. |
The go tool knows that every package outside runtime, other than unsafe, depends on runtime. See |
@ianlancetaylor I see. So, do you think using linknames is a more reasonable approach in this case? |
@ceseo I think it would be fine to adjust the go tool to permit the internal/cpu package to not depend on the runtime package, assuming of course that internal/cpu never needs to import runtime. It's an internal package, so while doing this would permit core Go developers to make a horrible error it should never affect any users of Go. |
What's the correct way of re-generating cmd/dist/deps.go? Just call mkdeps.bash directly? |
Yes. |
Change https://golang.org/cl/53830 mentions this issue: |
This change replaces the current runtime capabilities check for ppc64x with the new internal/cpu package. It also adds support for the new POWER9 ISA and capabilities. Updates #15403 Change-Id: I5b64a79e782f8da3603e5529600434f602986292 Reviewed-on: https://go-review.googlesource.com/53830 Reviewed-by: Martin Möhrmann <moehrmann@google.com>
Since many interested parties are already in this thread and it is still open i reply here (if wanted i can create a new proposal). I started prototyping disabling cpu features via internal/cpu e.g. for testing. https://golang.org/cl/91737 We already have GODEBUG and GOGC so i would propose GOCPU to disable cpu features so that whatever GO package implements cpu feature detection is the only consumer of GOCPU and there are no overlaps with GODEBUG. An example to run a test with AVX and SSE41 on amd64 disabled could look like: I would propose lower case feature names as it seems more readable and is in line with GODEBUG options. The special key "all" can be used to set features to the minimal set of features required by the current go implementation. So GOCPU=all=0 can be used going forward to run and test go programs with minimal cpu feature requirements. This way some builders with all=0 could be Another feature for cpu/internal that i think would be useful also for testing of internal/cpu itself is to support a variable in GODEBUG e.g. cpudetail=1 that prints the detected and disabled features. Could also be in GOCPU but it seems cleaner to me to use GOCPU only for cpu features. Functions in the runtime and e.g. suppport_popcnt checks emitted by the compiler wont be covered currently but unifying/merging runtime and internal/cpu into one overall detection functionality in go would solve that. (happy to work on this too for go1.11) |
I don't understand the goal. Is it just for regression testing? Would it be documented and exposed to the users? |
It would be documented and exposed like GODEBUG. One use case is testing (#12805), another I see is benchmarking different implementations that require different cpu capabilities against each other without the need for code changes. It could also be used to force running the same code paths on two different machines with different cpu capabilities for better debugging or reproducibility of errors. |
Change https://golang.org/cl/91737 mentions this issue: |
Needs the go compiler to be build with GOEXPERIMENT=debugcpu to be active. The GODEBUGCPU environment variable can be used to disable usage of specific processor features in the Go standard library. This is useful for testing and benchmarking different code paths that are guarded by internal/cpu variable checks. Use of processor features can not be enabled through GODEBUGCPU. To disable usage of AVX and SSE41 cpu features on GOARCH amd64 use: GODEBUGCPU=avx=0,sse41=0 The special "all" option can be used to disable all options: GODEBUGCPU=all=0 Updates #12805 Updates #15403 Change-Id: I699c2e6f74d98472b6fb4b1e5ffbf29b15697aab Reviewed-on: https://go-review.googlesource.com/91737 Run-TryBot: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Closing this issue as there is internal/cpu and x/sys/cpu now. |
We have a lot of packages in the std that uses specialized instructions
that are not always present. Most of the times, the package will detect
the required features by themselves. This is fine but will lead to code
duplication. And as Bryan Chan mentioned on https://golang.org/cl/22201,
even if the processor provides way to detect certain optional features,
it's still better to use AT_HWCAP from Linux auxv because that also
takes kernel support into account.
Only the runtime can access auxv, so it makes sense for the runtime
to query the processor capabilities and provide that to the packages.
I propose that we add an internal package
internal/cpu
that exposescapability flags for the current processor so that each std packages
could query it directly instead of having a runtime detection routine
that duplicates the work.
Another benefit is that, some processors, like ARM, doesn't provide
a way to do runtime capability detection, so we have to rely on the
kernel to provide this information. Different kernels provide different
mechanisms for this (sysctl for BSD and auxv for linux), so providing
a package that abstracts those OS-dependent feature away is also
beneficial.
We might promote the package to runtime/cpu if deemed fit, but that's
out of the scope for this proposal.
The package could be modeled after the Linux's AT_HWCAP bits,
and it will be processor dependent.
The text was updated successfully, but these errors were encountered: