Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: t*10 seems slower than t<<3+t+t #2671

Closed
griesemer opened this issue Jan 9, 2012 · 14 comments
Closed

cmd/compile: t*10 seems slower than t<<3+t+t #2671

griesemer opened this issue Jan 9, 2012 · 14 comments

Comments

@griesemer
Copy link
Contributor

The code generated by 6g for multiplication with a constant (t*10) appears to run slower
then a "manual" multiplication (t<<3+t+t). Specifically, for math/big,
revision 78944b6b971c, changing line nat.go:869 from:

s[i] = charset[r-t*10]

to:

s[i] = charset[r-t<<3-t-t]

leads to a significant (> 10%) performance improvement for some code:

gotest -bench=String1000*Base10

benchmark                          old ns/op    new ns/op    delta
big.BenchmarkString100Base10            1909         1855   -2.83%
big.BenchmarkString1000Base10          12867        12243   -4.85%
big.BenchmarkString10000Base10         65974        58307  -11.62%
big.BenchmarkString100000Base10     19938530     19831100   -0.54%

The compiler should be able to do the right thing here w/o manual intervention.
@rsc
Copy link
Contributor

rsc commented Sep 12, 2012

Comment 2:

Labels changed: added go1.1maybe.

@davecheney
Copy link
Contributor

Comment 3:

@gri: are you still able to reproduce this issue.
I rolled back the change to nat.go and confirmed that on 6g we get 
                                s[i] = charset[r-t*10]
  43511d:       48 89 f5                mov    %rsi,%rbp
  435120:       48 89 b4 24 d0 00 00    mov    %rsi,0xd0(%rsp)
  435127:       00 
  435128:       4c 39 f6                cmp    %r14,%rsi
  43512b:       73 43                   jae    435170 <math/big.nat.convertWords+0x580>
  43512d:       48 8d 1c 2b             lea    (%rbx,%rbp,1),%rbx
  435131:       48 89 c5                mov    %rax,%rbp
  435134:       48 6b ed 0a             imul   $0xa,%rbp,%rbp
but on my cpu, SNB core i5 there isn't much in it
benchmark                      old ns/op    new ns/op    delta
BenchmarkString100Base10            1318         1317   -0.08%
BenchmarkString1000Base10           7262         7388   +1.74%
BenchmarkString10000Base10         29226        29257   +0.11%
BenchmarkString100000Base10      9287810      9271090   -0.18%

@rsc
Copy link
Contributor

rsc commented Nov 7, 2012

Comment 4:

This was originally on a 2008-era Mac Pro, whatever chip they used.

@griesemer
Copy link
Contributor Author

Comment 5:

I am still seeing a bit of a slowdown:
benchmark                      old ns/op    new ns/op    delta
BenchmarkString100Base10            1662         1695   +1.99%
BenchmarkString1000Base10           9732         9838   +1.09%
BenchmarkString10000Base10         42616        43845   +2.88%
BenchmarkString100000Base10     14408500     15512376   +7.66%
The new numbers are with using t*10 instead of the existing code. With 6g, the generated
code still looks like before as far as I can tell:
5420 (nat.go:887) MOVQ    SI,BP
5421 (nat.go:887) MOVQ    SI,i+-32(SP)
5422 (nat.go:887) CMPQ    SI,R14
5423 (nat.go:887) JCS     $1,5426
5424 (nat.go:887) CALL    ,runtime.panicindex+0(SB)
5425 (nat.go:887) UNDEF   ,
5426 (nat.go:887) LEAQ    (BX)(BP*1),BX
5427 (nat.go:887) MOVQ    AX,BP
5428 (nat.go:887) IMULQ   $10,BP   <<<<<<<<<< not using
shifts and adds for *10
5429 (nat.go:887) MOVQ    DI,R8
5430 (nat.go:887) SUBQ    BP,R8
5431 (nat.go:887) MOVQ    R8,BP
5432 (nat.go:887) MOVQ    charset+48(FP),R8
5433 (nat.go:887) MOVQ    charset+56(FP),R9
5434 (nat.go:887) CMPQ    BP,R9
5435 (nat.go:887) JCS     $1,5438
5436 (nat.go:887) CALL    ,runtime.panicindex+0(SB)
5437 (nat.go:887) UNDEF   ,
5438 (nat.go:887) LEAQ    (R8)(BP*1),R8
5439 (nat.go:887) MOVBQZX (R8),BP
5440 (nat.go:887) MOVB    BP,(BX)
I prefer to leave this open and no make changes to nat.go for a bit longer. For one, the
code is working fine as is and appears to be faster for large strings.

@davecheney
Copy link
Contributor

Comment 6:

Understood. I will add a benchmark to the micro-benchmarking package
for this case.

@robpike
Copy link
Contributor

robpike commented Mar 7, 2013

Comment 7:

Labels changed: removed go1.1maybe.

@minux
Copy link
Member

minux commented Mar 9, 2013

Comment 8:

Issue #4199 has been merged into this issue.

@rsc
Copy link
Contributor

rsc commented Mar 12, 2013

Comment 10:

[The time for maybe has passed.]

@rsc
Copy link
Contributor

rsc commented Jul 30, 2013

Comment 11:

Labels changed: added go1.3.

@robpike
Copy link
Contributor

robpike commented Aug 20, 2013

Comment 12:

Labels changed: removed go1.3.

@rsc
Copy link
Contributor

rsc commented Nov 27, 2013

Comment 13:

Labels changed: added go1.3maybe.

@rsc
Copy link
Contributor

rsc commented Dec 4, 2013

Comment 14:

Labels changed: added release-none, removed go1.3maybe.

@rsc
Copy link
Contributor

rsc commented Dec 4, 2013

Comment 15:

Labels changed: added repo-main.

@rsc rsc added this to the Unplanned milestone Apr 10, 2015
@rsc rsc changed the title cmd/gc: t*10 seems slower than t<<3+t+t cmd/compile: t*10 seems slower than t<<3+t+t Jun 8, 2015
@gopherbot
Copy link

CL https://golang.org/cl/23372 mentions this issue.

@golang golang locked and limited conversation to collaborators May 24, 2017
@rsc rsc removed their assignment Jun 22, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants