Issue 6094047: code review 6094047: math: ARM assembly implementation for Abs

Issue 6094047: code review 6094047: math: ARM assembly implementation for Abs (Closed)

Can't Edit
Can't Publish+Mail
Start Review

Created:
12 years ago by minux1

Modified:
12 years ago

Reviewers:

CC:
dave_cheney.net, remyoudompheng, mtj1, rsc, golang-dev

Visibility:
Public.

Description

math: ARM assembly implementation for Abs Obtained on 700MHz OMAP4460: benchmark old ns/op new ns/op delta BenchmarkAbs 61 23 -61.63%

Patch Set 2 : diff -r 23c94e7b3fd6 https://code.google.com/p/go/ #

Patch Set 3 : diff -r 23c94e7b3fd6 https://code.google.com/p/go/ #

Total comments: 1

Patch Set 4 : diff -r 69a5c6d983b4 https://code.google.com/p/go/ #

Patch Set 5 : diff -r 69a5c6d983b4 https://code.google.com/p/go/ #

Created: 12 years ago

Download [raw] [tar.bz2]

		Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+6 lines, -1 line)			Patch
	M	src/pkg/math/abs_arm.s	View	1 2 3	1 chunk	+6 lines, -1 line	0 comments	Download

Messages

Total messages: 13

Expand All Messages | Collapse All Messages

minux1

Hello golang-dev@googlegroups.com (cc: golang-dev@googlegroups.com), I'd like you to review this change to https://code.google.com/p/go/

12 years ago (2012-04-21 11:13:21 UTC) #1

dave_cheney.net

I can't speak to the accuracy of this change, but the results are equally impressive ...

12 years ago (2012-04-21 11:22:35 UTC) #2

minux1

I can squeeze one more ns out if we use ldrd and strd. TEXT ·Abs(SB),7,$0 ...

12 years ago (2012-04-21 14:34:11 UTC) #3

remyoudompheng

How does this compare to the following Go code? func abs(x float64) (y float64) { ...

12 years ago (2012-04-21 17:28:10 UTC) #4

minux1

On Sun, Apr 22, 2012 at 1:28 AM, <remyoudompheng@gmail.com> wrote: > How does this compare ...

12 years ago (2012-04-21 17:46:11 UTC) #5

mtj1

Which is the key to all of this tuning -- it is "non floating point" ...

12 years ago (2012-04-21 17:52:03 UTC) #6

minux1

On Sun, Apr 22, 2012 at 1:45 AM, minux <minux.ma@gmail.com> wrote: > > On Sun, ...

12 years ago (2012-04-21 18:01:54 UTC) #7

On Sun, Apr 22, 2012 at 1:45 AM, minux <minux.ma@gmail.com> wrote:

>
> On Sun, Apr 22, 2012 at 1:28 AM, <remyoudompheng@gmail.com> wrote:
>
>> How does this compare to the following Go code?
>>
>> func abs(x float64) (y float64) {
>>      *(*uint64)(unsafe.Pointer(&y)) = *(*uint64)(unsafe.Pointer(&x)) &^
>> uint64(1<<63)
>>      return
>> }
>>
> 5g generates this for this function:
> 0000 (flt.go:3) TEXT   abs+0(SB),$0-16
> 0001 (flt.go:3) MOVD   $(0.00000000000000000e+00),F0
> 0002 (flt.go:3) MOVD   F0,y+8(FP)
> 0003 (flt.go:5) MOVW   $y+8(FP),R5
> 0004 (flt.go:5) MOVW   $x+0(FP),R0
> 0005 (flt.go:5) MOVW   0(R0),R4
> 0006 (flt.go:5) MOVW   4(R0),R1
> 0007 (flt.go:5) MOVW   $-1,R2
> 0008 (flt.go:5) AND     R2,R4,R0
> 0009 (flt.go:5) MOVW   $2147483647,R2
> 0010 (flt.go:5) AND     R2,R1,R3
> 0011 (flt.go:5) MOVW   R0,R1
> 0012 (flt.go:5) MOVW   R5,R0
> 0013 (flt.go:5) MOVW   R3,R2
> 0014 (flt.go:5) MOVW   R1,0(R5)
> 0015 (flt.go:5) MOVW   R3,4(R5)
> 0016 (flt.go:6) RET     ,
>
> it will be much slower than the assembly version, and what's worse, it
> unnecessarily
> load and store 0.0 into y, which will trigger a much more expensive soft
> float emulation
> on systems without FPU.
>
I also tried a C version:
void abs(long long in, long long out) {
out = in & ~(1ULL<<63);
USED(out);
}
5c generates:
TEXT abs+0(SB),0,$20-16
MOVW $out+8(FP),R1
 MOVW R1,4(R13)
MOVW R13,R1
 MOVW $in+0(FP),R2
MOVM.U 0(R2),[R3,R4]
 MOVW R3,8(R1)
MOVW R4,12(R1)
 MOVW R13,R1
MOVW $-1,R2
 MOVW R2,16(R1)
MOVW $2147483647,R2
 MOVW R2,20(R1)
BL ,_andv+0(SB)
 NOP out+8(FP),
NOP ,R0
 NOP ,F0
RET ,
 END ,
It calls _andv, when it should have done it in line.

For comparison, 8c generate this for the same function:
(flt.c:1) TEXT abs+0(SB),(gok(71))
(flt.c:2) MOVL in+0(FP),AX
(flt.c:2) ANDL $-1,AX
(flt.c:2) MOVL in+4(FP),CX
(flt.c:2) ANDL $2147483647,CX
(flt.c:2) MOVL AX,out+8(FP)
(flt.c:2) MOVL CX,out+12(FP)
(flt.c:3) RET ,
(flt.c:3) END ,
This is much better, and approaches the quality hand written assembly.

For the record, 8g generate this for the Go version:
0000 (flt.go:3) TEXT    abs+0(SB),$0-16
0001 (flt.go:3) FMOVD   $(0.00000000000000000e+00),F0
0002 (flt.go:3) FMOVDP  F0,y+8(FP)
0003 (flt.go:5) LEAL    y+8(FP),BX
0004 (flt.go:5) MOVL    BX,BP
0005 (flt.go:5) LEAL    x+0(FP),BX
0006 (flt.go:5) MOVL    (BX),AX
0007 (flt.go:5) MOVL    4(BX),CX
0008 (flt.go:5) ANDL    $2147483647,CX
0009 (flt.go:5) MOVL    AX,(BP)
0010 (flt.go:5) MOVL    CX,4(BP)
0011 (flt.go:6) RET     ,
Again, a much better job than 5g.

remyoudompheng

Le 21 avril 2012 19:51, Michael Jones <mtj@google.com> a écrit : > Which is the ...

12 years ago (2012-04-21 18:02:19 UTC) #8

dave_cheney.net

> But, to make it acceptable, I must first add LDRD and STRD to 5a ...

12 years ago (2012-04-21 23:55:56 UTC) #9

rsc

http://codereview.appspot.com/6094047/diff/5001/src/pkg/math/abs_arm.s File src/pkg/math/abs_arm.s (right): http://codereview.appspot.com/6094047/diff/5001/src/pkg/math/abs_arm.s#newcode7 src/pkg/math/abs_arm.s:7: MOVW 0(FP), R0 please use variable names and try ...

12 years ago (2012-04-23 14:54:09 UTC) #10

minux1

Hello dave@cheney.net, remyoudompheng@gmail.com, mtj@google.com, rsc@golang.org (cc: golang-dev@googlegroups.com), Please take another look.

12 years ago (2012-04-23 15:40:49 UTC) #11

minux1

12 years ago (2012-04-23 15:47:50 UTC) #13

*** Submitted as http://code.google.com/p/go/source/detail?r=5a1d471de6d2 ***

math: ARM assembly implementation for Abs

Obtained on 700MHz OMAP4460:
benchmark       old ns/op    new ns/op    delta
BenchmarkAbs           61           23  -61.63%

R=dave, remyoudompheng, mtj, rsc
CC=golang-dev
http://codereview.appspot.com/6094047

Expand All Messages | Collapse All Messages