Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime/internal/atomic: improve performance for atomic functions on ppc64le/ppc64 #15469

Closed
laboger opened this issue Apr 27, 2016 · 1 comment

Comments

@laboger
Copy link
Contributor

laboger commented Apr 27, 2016

Please answer these questions before submitting your issue. Thanks!

  1. What version of Go are you using (go version)?
    go version devel +0436a89 Mon Apr 25 19:21:40 2016 +0000 linux/ppc64le
  2. What operating system and processor architecture are you using (go env)?
    go env
    GOARCH="ppc64le"
    GOBIN=""
    GOEXE=""
    GOHOSTARCH="ppc64le"
    GOHOSTOS="linux"
    GOOS="linux"
    GOPATH="/home/boger/gotools"
    GORACE=""
    GOROOT="/home/boger/golang/atomic/go"
    GOTOOLDIR="/home/boger/golang/atomic/go/pkg/tool/linux_ppc64le"
    CC="gcc"
    GOGCCFLAGS="-fPIC -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build937495002=/tmp/go-build -gno-record-gcc-switches"
    CXX="g++"
    CGO_ENABLED="1"
  3. What did you do?
    If possible, provide a recipe for reproducing the error.
    A complete runnable program is good.
    A link on play.golang.org is best.
  4. What did you expect to see?
  5. What did you see instead?

During some performance analysis of various applications it was found that some of the atomic functions appeared high on the profile list, and on further inspection of that code it was determined that some of those code sequences could be improved for ppc64le & ppc64.

The files to be changed are:
runtime/internal/atomic/asm_ppc64x.s
sync/atomic/asm_ppc64x.s

The main purpose is to eliminate the extra sync instructions. I have a patch and will post it shortly.

@gopherbot
Copy link

CL https://golang.org/cl/22549 mentions this issue.

@bradfitz bradfitz added this to the Unplanned milestone May 4, 2016
ceseo pushed a commit to powertechpreview/go that referenced this issue May 5, 2016
The following performance improvements have been made to the
low-level atomic functions for ppc64le & ppc64:

- For those cases containing a lwarx and stwcx (or other sizes):
sync, lwarx, maybe something, stwcx, loop to sync, sync, isync
The sync is moved before (outside) the lwarx/stwcx loop, and the
 sync after is removed, so it becomes:
sync, lwarx, maybe something, stwcx, loop to lwarx, isync

- For the Or8 and And8, the shifting and manipulation of the
address to the word aligned version were removed and the
instructions were changed to use lbarx, stbcx instead of
register shifting, xor, then lwarx, stwcx.

- New instructions LWSYNC, LBAR, STBCC were tested and added.
runtime/atomic_ppc64x.s was changed to use the LWSYNC opcode
instead of the WORD encoding.

Fixes golang#15469

Ran some of the benchmarks in the runtime and sync directories.
Some results varied from run to run but the trend was improvement
based on best times for base and new:

runtime.test:
BenchmarkChanNonblocking-128         0.88          0.89          +1.14%
BenchmarkChanUncontended-128         569           511           -10.19%
BenchmarkChanContended-128           63110         53231         -15.65%
BenchmarkChanSync-128                691           598           -13.46%
BenchmarkChanSyncWork-128            11355         11649         +2.59%
BenchmarkChanProdCons0-128           2402          2090          -12.99%
BenchmarkChanProdCons10-128          1348          1363          +1.11%
BenchmarkChanProdCons100-128         1002          746           -25.55%
BenchmarkChanProdConsWork0-128       2554          2720          +6.50%
BenchmarkChanProdConsWork10-128      1909          1804          -5.50%
BenchmarkChanProdConsWork100-128     1624          1580          -2.71%
BenchmarkChanCreation-128            237           212           -10.55%
BenchmarkChanSem-128                 705           667           -5.39%
BenchmarkChanPopular-128             5081190       4497566       -11.49%

BenchmarkCreateGoroutines-128             532           473           -11.09%
BenchmarkCreateGoroutinesParallel-128     35.0          34.7          -0.86%
BenchmarkCreateGoroutinesCapture-128      4923          4200          -14.69%

sync.test:
BenchmarkUncontendedSemaphore-128      112           94.2          -15.89%
BenchmarkContendedSemaphore-128        133           128           -3.76%
BenchmarkMutexUncontended-128          1.90          1.67          -12.11%
BenchmarkMutex-128                     353           310           -12.18%
BenchmarkMutexSlack-128                304           283           -6.91%
BenchmarkMutexWork-128                 554           541           -2.35%
BenchmarkMutexWorkSlack-128            567           556           -1.94%
BenchmarkMutexNoSpin-128               275           242           -12.00%
BenchmarkMutexSpin-128                 1129          1030          -8.77%
BenchmarkOnce-128                      1.08          0.96          -11.11%
BenchmarkPool-128                      29.8          27.4          -8.05%
BenchmarkPoolOverflow-128              40564         36583         -9.81%
BenchmarkSemaUncontended-128           3.14          2.63          -16.24%
BenchmarkSemaSyntNonblock-128          1087          1069          -1.66%
BenchmarkSemaSyntBlock-128             897           893           -0.45%
BenchmarkSemaWorkNonblock-128          1034          1028          -0.58%
BenchmarkSemaWorkBlock-128             949           886           -6.64%

Change-Id: I4403fb29d3cd5254b7b1ce87a216bd11b391079e
Reviewed-on: https://go-review.googlesource.com/22549
Reviewed-by: Michael Munday <munday@ca.ibm.com>
Reviewed-by: Minux Ma <minux@golang.org>
@golang golang locked and limited conversation to collaborators May 5, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants