Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal/cpu: new package to expose processor capabilities #15403

Closed
minux opened this issue Apr 21, 2016 · 47 comments
Closed

internal/cpu: new package to expose processor capabilities #15403

minux opened this issue Apr 21, 2016 · 47 comments

Comments

@minux
Copy link
Member

minux commented Apr 21, 2016

We have a lot of packages in the std that uses specialized instructions
that are not always present. Most of the times, the package will detect
the required features by themselves. This is fine but will lead to code
duplication. And as Bryan Chan mentioned on https://golang.org/cl/22201,
even if the processor provides way to detect certain optional features,
it's still better to use AT_HWCAP from Linux auxv because that also
takes kernel support into account.

Only the runtime can access auxv, so it makes sense for the runtime
to query the processor capabilities and provide that to the packages.

I propose that we add an internal package internal/cpu that exposes
capability flags for the current processor so that each std packages
could query it directly instead of having a runtime detection routine
that duplicates the work.

Another benefit is that, some processors, like ARM, doesn't provide
a way to do runtime capability detection, so we have to rely on the
kernel to provide this information. Different kernels provide different
mechanisms for this (sysctl for BSD and auxv for linux), so providing
a package that abstracts those OS-dependent feature away is also
beneficial.

We might promote the package to runtime/cpu if deemed fit, but that's
out of the scope for this proposal.

The package could be modeled after the Linux's AT_HWCAP bits,
and it will be processor dependent.

@mundaym
Copy link
Member

mundaym commented Aug 11, 2016

I'm happy to work on adding this (particularly the s390x bit).

I'm interested in what thoughts people have for the API. The three options that occur to me are:

  1. Function taking strings: cpu.Has("ssse3", "popcnt")
  2. Function taking constants: cpu.Has(cpu.SSSE3, cpu.POPCNT))
  3. Raw bools: cpu.SSSE3 && cpu.POPCNT)

I'm not sure if we want to prefix with the CPU type to avoid naming conflicts. If so then perhaps s/SSSE3/AMD64.SSSE3/.

It might also be nice if the function (or bools) could be inlined and constant folded. In that case I suspect we'd need to limit the number of features to be checked to one per call. This could be useful in scenarios where a feature is optional on say i386/ppc64, but mandatory on newer versions of the architecture such as amd64/ppc64le.

Another possibility is that we do this in the runtime/internal/sys package. Something like sys.Feature("ssse3") perhaps? There are variables in sys that might be useful to an application as well and could be made part of a public cpu package were we to ever go down that route.

BTW @minux when you say:

it will be processor dependent

Do you mean you want the API to be processor dependent?

@minux
Copy link
Member Author

minux commented Aug 11, 2016 via email

@laboger
Copy link
Contributor

laboger commented Aug 11, 2016

I think prefixing them with the arch name would be good. That seems to be the convention used for many other constants in golang (opcodes, relocations, etc.). That way there would be no confusion in case some cpu features are close but don't mean exactly the same thing on different architectures.

@ceseo
Copy link
Contributor

ceseo commented Aug 11, 2016

One idea is to use the same naming conventions that we have in glibc, for consistency (i.e. glibc/sysdeps/powerpc/dl-procinfo.c:_dl_powerpc_cap_flags). This way, the names match the ones shown when LD_SHOW_AUXV=1:

AT_HWCAP: vsx arch_2_06 dfp ic_snoop smt mmu fpu altivec ppc64 ppc32
AT_HWCAP2: tar isel ebb dscr htm arch_2_07

Querying for the capabilities should take into account whether an arch has a HWCAP2 or not, because there is no way to know (a priori) if a capability bit is in HWCAP or HWCAP2. My suggestion here is to use a concatenated 64-bit HWCAP+HWCAP2 mask, so we can easily map the bits to the capabilities. That's what we did in glibc/gcc to implement __builtin_cpu_is() / __builtin_cpu_supports() for Power.

@ceseo
Copy link
Contributor

ceseo commented Aug 31, 2016

@minux are you already working on this? Thanks.

@minux
Copy link
Member Author

minux commented Aug 31, 2016 via email

@ceseo
Copy link
Contributor

ceseo commented Aug 31, 2016

@mundaym I think that starting with a runtime/internal/sys approach is a good idea for now. My idea was to create something that would work similarly to __builtin_cpu_supports() in gcc. What do you think?

@mundaym
Copy link
Member

mundaym commented Sep 1, 2016

I had a play with this last weekend. I ended up getting a bit lost in circular dependencies... You might want to just add (perhaps hackily) what you need directly into the runtime package for now and let it get cleaned up later.

Along those lines it would be easy to add the variables runtime.hwcap (already there for arm) and runtime.hwcap2 and then grab them using assembly in other packages when necessary. Ideally they'd both have the type uint32 and be defined and set in os_linux.go (rather than in an arch-dependent file).

The constants representing features could go in the runtime/internal/sys package but that means they can only be accessed in the runtime package. If you instead put them in a global package like internal/cpu then the runtime package can't get at them. That could be a way to get this proposal started though.

@ceseo
Copy link
Contributor

ceseo commented Sep 1, 2016

@mundaym OK. We'll need these checks later, when we start to add the new ISA 3.0 (POWER9) instructions and write runtime optimizations using those.

For now, I was thinking about something trivial, like: ceseo@ca09310

then evolve from that.

@gopherbot
Copy link

CL https://golang.org/cl/31149 mentions this issue.

@rsc rsc modified the milestones: Proposal, Go1.8Early Oct 18, 2016
@rsc rsc changed the title Proposal: internal/cpu: new package to expose processor capabilities proposal: internal/cpu: new package to expose processor capabilities Oct 18, 2016
gopherbot pushed a commit that referenced this issue Oct 19, 2016
This is a more robust method for obtaining the availability of vx.
Since this variable may be checked frequently I've also now
padded it so that it will be in its own cache line.

I've kept the other check (in hash/crc32) the same for now until
I can figure out the best way to update it.

Updates #15403.

Change-Id: I74eed651afc6f6a9c5fa3b88fa6a2b0c9ecf5875
Reviewed-on: https://go-review.googlesource.com/31149
Reviewed-by: Austin Clements <austin@google.com>
ceseo added a commit to ceseo/go that referenced this issue Oct 28, 2016
This implements a check that can be done at runtime for the ISA level and
hardware capability. It follows the same implementation as in s390x.

Updates golang#15403
Fixes golang#16643
ceseo added a commit to ceseo/go that referenced this issue Oct 28, 2016
This implements a check that can be done at runtime for the ISA level and
hardware capability. It follows the same implementation as in s390x.

Updates golang#15403
Fixes golang#16643
ceseo added a commit to ceseo/go that referenced this issue Oct 28, 2016
This implements a check that can be done at runtime for the ISA level and
hardware capability. It follows the same implementation as in s390x.

Updates golang#15403
Fixes golang#16643
@gopherbot
Copy link

CL https://golang.org/cl/32330 mentions this issue.

gopherbot pushed a commit that referenced this issue Nov 1, 2016
…CAP2

This implements a check that can be done at runtime for the ISA level and
hardware capability. It follows the same implementation as in s390x.

These checks will be important as we enable new instructions and write go
asm implementations using those.

Updates #15403
Fixes #16643

Change-Id: Idfee374a3ffd7cf13a7d8cf0a6c83d247d3bee16
Reviewed-on: https://go-review.googlesource.com/32330
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@rsc
Copy link
Contributor

rsc commented Dec 12, 2016

For an internal package, it seems fine to experiment through the usual code review process. No approval needed here. There's runtime/internal/sys, for example. If some of that gets promoted to plain internal/sys or some other name, that seems OK.

@rsc rsc changed the title proposal: internal/cpu: new package to expose processor capabilities internal/cpu: new package to expose processor capabilities Dec 12, 2016
@martisch
Copy link
Contributor

@minux @laboger @ceseo
Did somebody plan to work on this in the near future or did already work on implementing an internal cpu package?

If not i would like to start on a CL for a small internal/cpu package and then have seperate CLs to clean up the std lib uses for cpu feature detection as listed in #19739.

I would just start by providing:
cpu.GOARCH.FLAG bools
and populate them on package init. That seems most similar to the current uses of the runtime flags i have seen and also decouples the std lib packages usages from the runtime.

FLAG would be named after the names Linux uses but in upper case.
e.g. for AMD64 linux uses: mmx, sse, sse2, sse3, sse4_1, see4_2, popcnt, aes, avx2, bmi1, bmi2, erms, ...

which results in: cpu.AMD64.AVX2 and cpu.AMD64.SSE4_1

In further iterations we can add HWCAP bit vectors and if needed more complex query functions or change the init such that the runtime queries the information first and we copy from there.

@ceseo
Copy link
Contributor

ceseo commented Mar 28, 2017

@martisch no, it's not in my list. I already have a functional workaround for ppc64x. Feel free to start working on your CL.

I can help later by adding any code to make it work for ppc64x.

@rsc
Copy link
Contributor

rsc commented Mar 29, 2017

@martisch, sounds reasonable, if LOUD.

@martisch
Copy link
Contributor

martisch commented Apr 7, 2017

After exploring the implementation details and replacing the uses of feature flags in the std lib - i changed the approach for the first version to:
cpu.Has_avx2 , cpu.Has_sse4_2 ...

This allows us to use the bools directly in assembler and go code.
Also has the advantage that we dont need extra code versions for amd64, 386, amd64p32 as they all can use the same code to query the internal/cpu package since the naming doesn't differ. Downsides are we can not put padding before and after the variables as easily as when they would be collected in an arch specific struct
and that the naming is less LOUD.

Different architectures can implement additional flags. When we find a name that conflicts between architectures they can share it as i guess for the foreseeable future go will only ever run on one architecture at the same time within a single go runtime.

I plan to have the first version ready for review next week after i finalized the initialization code and tested it on a few different cpus.

gopherbot pushed a commit that referenced this issue May 10, 2017
Implements detection of x86 cpu features that
are used in the go standard library.

Changes all standard library packages to use the new cpu package
instead of using runtime internal variables to check x86 cpu features.

Updates: #15403

Change-Id: I2999a10cb4d9ec4863ffbed72f4e021a1dbc4bb9
Reviewed-on: https://go-review.googlesource.com/41476
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@martisch martisch self-assigned this Jun 4, 2017
@martisch
Copy link
Contributor

martisch commented Jun 4, 2017

Basic support for feature flag detection has been added and the std lib usage of cpuid for x86 has been unified.

The runtime x86 cpuid detection has been consolidated a bit too and is separate from internal/cpu. There are some more cleanups i like to add in go1.10. e.g. padding of runtime cpu flags and moving the rest of cpuflags_amd64.go into rt0.go.

I also plan to work on extending internal/cpu in go1.10 by allowing to mask cpu features in internal/cpu so we can test different code paths in the std lib #12805.

@ceseo
Copy link
Contributor

ceseo commented Jun 21, 2017

I have a question: what's the "correct" way of implementing this for architectures that do not have a cpuid instruction? For example, on ppc64x, we currently rely on HWCAP/HWCAP2 bits for that.

@randall77
Copy link
Contributor

@ceseo, whatever the hardware provides, I guess. If needed (if the instruction is expensive) we can call it once on process startup and cache the results in a global variable.
It can be more challenging if we have to test instructions & catch unimplemented faults, or ask the OS. We can fight that hurdle if it becomes necessary.

@ceseo
Copy link
Contributor

ceseo commented Jun 22, 2017

@randall77 Power doesn't have anything in the hardware for identifying capabilities/ISA level. That's why we use the HWCAP/HWCAP2 bits exposed by the kernel. In glibc, for instance, I had to modify the TLS ABI and write the HWCAP/HWCAP2 bits inside the TCB, so that __builtin_cpu_is/__builtin_cpu_supports in gcc can read the information from somewhere.

In Go, we currently initialize a struct in runtime/os_linux_ppc64x.go and read from there in runtime to identify capabilities. However, that's heavily tied to the initialization procedures in runtime/os_linux.go.

Do you have any suggestions for adapting what we currently have so that it can reside inside the new internal/cpu package?

@randall77
Copy link
Contributor

For internal/ you can just use linkname to expose otherwise unexported parts of of its api.

For instance, you could have the runtime call into internal/cpu and pass it the auxv array so it can initialize itself. Initialization order is tricky but that's always the case with early initialization like this.

@martisch
Copy link
Contributor

I would favor we keep as much isolation of code between runtime and internal/cpu as possible.
However i see how that can not be achieved 100% for e.g. ppc64.

Since runtime and internal/cpu go versions will stay in sync i think we can expose (not officially export) the HWCAP/HWCAP2 bits the runtime uses and then read those from a internal/cpu (assembler) function.
https://github.com/golang/go/blob/master/src/cmd/compile/internal/gc/builtin/runtime.go

This would keep any code of internal/cpu out of the runtime and if we ever need to read that information elsewhere e.g. for something similar like the popcnt intrinsic for x86 we already have that info readable.

There might be downsides to that approach vs using linkname i am missing. If we want to stay with Go code linkname seems the way to go.

@rsc
Copy link
Contributor

rsc commented Jun 22, 2017

You don't need assembly, you just need to define in internal/cpu:

var hwcap uint32

and then in runtime:

//go:linkname cpu_hwcap internal/cpu.cap
var cpu_hwcap uint32

and then when hwcap is processed in runtime today, just set cpu_hwcap too. Then internal/cpu can reads the hwcap variable.

Alternately, if runtime can import internal/cpu (I don't particularly see why not), then internal/cpu can export a SetHwcap function and runtime can just call it, no linkname magic required.

@martisch
Copy link
Contributor

I think generally importing internal/cpu in runtime wont work currently as it will result in an import cycle.

@rsc
Copy link
Contributor

rsc commented Jun 23, 2017

I don't see any imports in internal/cpu at all. It should be fine for low-level runtime to import it. I would wait for Go 1.10 though.

@martisch
Copy link
Contributor

I checked with

go list -f '{{range .Deps}} {{.}} {{end}}' $PACKAGENAME

similar to what cmd/dist/mkdeps.bash is using
which even for an empty package in internal/ gives the dependencies:
runtime runtime/internal/atomic runtime/internal/sys unsafe

in runtime/internal:
runtime/internal/sys

and ./make.bash and cmd/dist/mkdeps.bash give "can't load package: import cycle not allowed" errors for internal/cpu.

Could be it is only the tooling that needs to be adjusted for the import to work without cycle errors.

@ianlancetaylor
Copy link
Contributor

The go tool knows that every package outside runtime, other than unsafe, depends on runtime. See Package.load in cmd/go/internal/load/pkg.go. If we want the runtime package to import internal/cpu that code will have to be adjusted.

@ceseo
Copy link
Contributor

ceseo commented Jun 27, 2017

@ianlancetaylor I see. So, do you think using linknames is a more reasonable approach in this case?

@ianlancetaylor
Copy link
Contributor

@ceseo I think it would be fine to adjust the go tool to permit the internal/cpu package to not depend on the runtime package, assuming of course that internal/cpu never needs to import runtime. It's an internal package, so while doing this would permit core Go developers to make a horrible error it should never affect any users of Go.

@ceseo
Copy link
Contributor

ceseo commented Jul 7, 2017

What's the correct way of re-generating cmd/dist/deps.go? Just call mkdeps.bash directly?

@bradfitz
Copy link
Contributor

bradfitz commented Jul 7, 2017

Yes.

@gopherbot
Copy link

Change https://golang.org/cl/53830 mentions this issue: runtime, internal/cpu: CPU capabilities detection for ppc64x

gopherbot pushed a commit that referenced this issue Aug 14, 2017
This change replaces the current runtime capabilities check for ppc64x with the
new internal/cpu package. It also adds support for the new POWER9 ISA and
capabilities.

Updates #15403

Change-Id: I5b64a79e782f8da3603e5529600434f602986292
Reviewed-on: https://go-review.googlesource.com/53830
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
@martisch
Copy link
Contributor

martisch commented Feb 4, 2018

Since many interested parties are already in this thread and it is still open i reply here (if wanted i can create a new proposal).

I started prototyping disabling cpu features via internal/cpu e.g. for testing. https://golang.org/cl/91737
The idea would be that for now one can only disable them from the beginning via an environment variable and not enable or disable them during runtime so packages can cache combined feature variables and initialize lookup tables for some special implementations on init.

We already have GODEBUG and GOGC so i would propose GOCPU to disable cpu features so that whatever GO package implements cpu feature detection is the only consumer of GOCPU and there are no overlaps with GODEBUG.

An example to run a test with AVX and SSE41 on amd64 disabled could look like:
GOCPU=avx=0,sse41=0 go test ...

I would propose lower case feature names as it seems more readable and is in line with GODEBUG options.

The special key "all" can be used to set features to the minimal set of features required by the current go implementation. So GOCPU=all=0 can be used going forward to run and test go programs with minimal cpu feature requirements. This way some builders with all=0 could be
set up to detect breakages related to some basic implementations of some algorithms.

Another feature for cpu/internal that i think would be useful also for testing of internal/cpu itself is to support a variable in GODEBUG e.g. cpudetail=1 that prints the detected and disabled features. Could also be in GOCPU but it seems cleaner to me to use GOCPU only for cpu features.

Functions in the runtime and e.g. suppport_popcnt checks emitted by the compiler wont be covered currently but unifying/merging runtime and internal/cpu into one overall detection functionality in go would solve that. (happy to work on this too for go1.11)

@rasky
Copy link
Member

rasky commented Feb 4, 2018

I don't understand the goal. Is it just for regression testing? Would it be documented and exposed to the users?

@martisch
Copy link
Contributor

martisch commented Feb 5, 2018

It would be documented and exposed like GODEBUG.

One use case is testing (#12805), another I see is benchmarking different implementations that require different cpu capabilities against each other without the need for code changes. It could also be used to force running the same code paths on two different machines with different cpu capabilities for better debugging or reproducibility of errors.

@gopherbot
Copy link

Change https://golang.org/cl/91737 mentions this issue: internal/cpu: use GOCPU environment variable to disable cpu features

gopherbot pushed a commit that referenced this issue May 22, 2018
Needs the go compiler to be build with GOEXPERIMENT=debugcpu to be active.

The GODEBUGCPU environment variable can be used to disable usage of
specific processor features in the Go standard library.
This is useful for testing and benchmarking different code paths that
are guarded by internal/cpu variable checks.

Use of processor features can not be enabled through GODEBUGCPU.

To disable usage of AVX and SSE41 cpu features on GOARCH amd64 use:
GODEBUGCPU=avx=0,sse41=0

The special "all" option can be used to disable all options:
GODEBUGCPU=all=0

Updates #12805
Updates #15403

Change-Id: I699c2e6f74d98472b6fb4b1e5ffbf29b15697aab
Reviewed-on: https://go-review.googlesource.com/91737
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@martisch
Copy link
Contributor

Closing this issue as there is internal/cpu and x/sys/cpu now.

@golang golang locked and limited conversation to collaborators May 31, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests