cmd/compile: optimization of constant pool on arm #19844

benshi001 · 2017-04-05T03:04:40Z

For the following code

func ass12345678(a int) int {
return a + 0xffff
}

Currently go will store 0xffff to the constant pool and load it in runtime.
a.go:6 0x95a04 e59fb06c MOVW 0x6c(R15), R11
a.go:6 0x95a08 e080000b ADD R11, R0, R0
.................................
a.go:9 0x95a78 0000ffff STRD.EQ [R0], -PC, R15, R15

But gcc optimized it to
a = a + 0x10000
a = a - 1

Both 1 and 0x10000 can be directly encoded to $immediate-12 into the instructions without any access to memory.

benshi001 · 2017-04-05T05:04:31Z

Also，

a | 0xffff ->
orr $0xff, a, a
orr $0xff00, a, a

a & 0xffff0 ->
bic $0xf000000f, a, a
bic $0x0ff00000, a, a

gopherbot · 2017-04-05T11:45:14Z

CL https://golang.org/cl/39552 mentions this issue.

benshi001 · 2017-04-06T08:30:59Z

test.zip

benshi001 · 2017-04-06T08:35:42Z

The above attachment is a rough test, the output log on my raspberry pi 2 is
test0(constant pool) cost 29 seconds, test1(imm12) cost 20 seconds

it means,
total 17179869120 pairs of (ldr/add) cost 29 seconds, while 17179869120 pairs(add/add) cost 20 seconds, about 50% improvement.

However, the constant pool is in the data cache. If not, the cache miss will cause much more inefficiency.

bradfitz · 2017-04-06T15:09:52Z

@josharian, can you point @benshi001 to directions on how to run compiler benchmark tests?

@benshi001, you want to make pretty commit messages using https://godoc.org/golang.org/x/perf/cmd/benchstat like this one 50688fc

josharian · 2017-04-06T15:50:39Z

The time has come for me to clean up compiler benchmarking a bit. Let me do that first.

josharian · 2017-04-06T19:30:23Z

Not done, but cleaned up enough for the moment. Do:

go get -u golang.org/x/tools/cmd/compilebench github.com/josharian/compilecmp golang.org/x/perf/cmd/benchstat

Make sure all resulting binaries are in your $PATH.

Commit your work. For memory benchmarking:

compilecmp -n 10

For exection time benchmarking (with everything else closed):

compilecmp -n 50 -cpu

These will compare master to HEAD. compilecmp supports lots of other variations, run with -h for more. Ask if you have questions or feature requests.

benshi001 · 2017-04-07T07:28:21Z

I have troubles with my benchmark test that @josharian suggested. The reason is due to the network security policy.

I ssh connected to a remote host locates in USA, and did

git clone https://go.googlesource.com/go
git fetch https://go.googlesource.com/go refs/changes/52/39552/2 && git checkout FETCH_HEAD
GOPATH=/root/gopath go get -u golang.org/x/tools/cmd/compilebench
GOPATH=/root/gopath go get -u github.com/josharian/compilecmp
GOPATH=/root/gopath go get -u golang.org/x/perf/cmd/benchstat
PATH=/root/gopath/bin:$PATH
compilecmp -n 10

Then I got the expected result on the remote host via ssh terminal.

I copy the /root/go and the /root/gopath from the remote host to my local raspberry pi 2.
/root/go -> /home/pi/go
/root/gopath -> /home/pi/gopath

And did
GOPATH=/home/pi/gopath go build golang.org/x/tools/cmd/compilebench
GOPATH=/home/pi/gopath go build github.com/josharian/compilecmp
GOPATH=/home/pi/gopath go build golang.org/x/perf/cmd/benchstat

then the tools are built into ARM ELF.

But when I did
compilecmp -n 10, I only got part of the results,
compilecmp master HEAD
06:48:59 copy tree at master ( 19bd145 ) to /home/pi/.compilecmp/19bd145d0721a28658b15deb548f22a3405d83bd
06:49:03 /home/pi/.compilecmp/19bd145d0721a28658b15deb548f22a3405d83bd/src/make.bash
07:00:39 copy tree at HEAD ( 0f679e481a7cbcb0fbb930670581fa57cd027eee ) to /home/pi/.compilecmp/0f679e481a7cbcb0fbb930670581fa57cd027eee
07:00:57 /home/pi/.compilecmp/0f679e481a7cbcb0fbb930670581fa57cd027eee/src/make.bash
before: /home/pi/.compilecmp/19bd145d0721a28658b15deb548f22a3405d83bd
after: /home/pi/.compilecmp/0f679e481a7cbcb0fbb930670581fa57cd027eee
benchstat /tmp/590206338 /tmp/787780345
completed 10 of 10, estimated time remaining 0s (eta 7:13AM)
name old text-bytes new text-bytes delta
HelloSize 595k ± 0% 594k ± 0% -0.07% (p=0.000 n=10+10)

name old data-bytes new data-bytes delta
HelloSize 3.59k ± 0% 3.59k ± 0% ~ (all equal)

name old bss-bytes new bss-bytes delta
HelloSize 75.4k ± 0% 75.4k ± 0% ~ (all equal)

name old exe-bytes new exe-bytes delta
HelloSize 1.03M ± 0% 1.03M ± 0% ~ (all equal)

The others like
name old user-ns/op new user-ns/op delta
Template 488M ±11% 488M ± 6% ~ (p=0.920 n=10+9)
Unicode 249M ±10% 242M ±12% ~ (p=0.304 n=10+10)
GoTypes 1.45G ± 5% 1.48G ± 6% ~ (p=0.342 n=10+10)
Flate 216M ±15% 223M ±10% ~ (p=0.201 n=9+10)
GoParser 380M ± 9% 381M ± 8% ~ (p=0.762 n=10+9)
Reflect 603M ± 5% 599M ± 7% ~ (p=0.919 n=9+10)
Tar 258M ±16% 263M ± 6% ~ (p=0.752 n=10+10)
XML 502M ± 8% 522M ± 7% ~ (p=0.106 n=10+9)

are missing. what are wrong with my operations?

benshi001 · 2017-04-07T07:30:33Z

The network security policy forbids my raspberry pi board to access the internet, so I have to do it via tar cfz/ scp / tar xfz

benshi001 · 2017-04-07T08:31:40Z

Does any of the three tools need internet access when running? I do not think there is any difference between my board and the remote host.

josharian · 2017-04-07T13:04:02Z

Does any of the three tools need internet access when running?

No. I suspect that the problem is that compilecmp uses \r to update a running estimate of when the task will be done, and that your terminal didn't like it. But (unless the tmp dir has been emptied), the results are still there. Just manually run:

benchstat /tmp/590206338 /tmp/787780345

You could also tweak compilecmp to remove the live update. If that is in fact the problem, I'd be happy to either add a flag to remove it or (better) do some terminal sniffing to decide when not to use it. (Suggestions welcome on the latter front.)

benshi001 · 2017-04-10T06:10:34Z

I encountered two issues when running the benchmark test. And they were not related to the '\r'.

can't create compilebench.o: open compilebench.o: permission denied
bexport.go:94:2: can't find import: "cmd/compile/internal/big"

The first one can be fixed by changing line 234 of src/golang.org/x/tools/cmd/compilebench/main.go
from
args := []string{"-o", "compilebench.o"}
to
args := []string{"-o", "/tmp/compilebench.o"}

BTW: I am using golang 1.6.2 as bootstrap and build golang.org/x/tools/cmd/compilebench.
log.zip

However I got the benchmark result, thank you.

Here is the log (/tmp/185436984)
can't create compilebench.o: open compilebench.o: permission denied
can't create compilebench.o: open compilebench.o: permission denied
can't create compilebench.o: open compilebench.o: permission denied
bexport.go:94:2: can't find import: "cmd/compile/internal/big"
compilebench: cannot find package "cmd/compile/internal/ssa" in any of:
/usr/lib/go-1.6/src/cmd/compile/internal/ssa (from $GOROOT)
($GOPATH not set)
can't create compilebench.o: open compilebench.o: permission denied
can't create compilebench.o: open compilebench.o: permission denied
can't create compilebench.o: open compilebench.o: permission denied
can't create compilebench.o: open compilebench.o: permission denied
can't create compilebench.o: open compilebench.o: permission denied
BenchmarkHelloSize 1 688575 text-bytes 5808 data-bytes 134376 bss-bytes 1058816 exe-bytes
can't create compilebench.o: open compilebench.o: permission denied
can't create compilebench.o: open compilebench.o: permission denied
can't create compilebench.o: open compilebench.o: permission denied
bexport.go:94:2: can't find import: "cmd/compile/internal/big"
compilebench: cannot find package "cmd/compile/internal/ssa" in any of:
/usr/lib/go-1.6/src/cmd/compile/internal/ssa (from $GOROOT)
($GOPATH not set)
can't create compilebench.o: open compilebench.o: permission denied
can't create compilebench.o: open compilebench.o: permission denied
can't create compilebench.o: open compilebench.o: permission denied
can't create compilebench.o: open compilebench.o: permission denied
can't create compilebench.o: open compilebench.o: permission denied
BenchmarkHelloSize 1 688575 text-bytes 5808 data-bytes 134376 bss-bytes 1058816 exe-bytes

benshi001 · 2017-04-10T06:15:46Z

The golang 1.6.2 is installed to /usr/lib/go-1.6, where root privilege is required.

benshi001 · 2017-04-10T10:31:36Z

Can I treat the following output as "there is improvement after applying my patch CL 39552" ?

pi@raspberrypi:~/pending/go/src $ compilecmp -n 20 -cpu

name old time/op new time/op delta
Template 2.42s ± 2% 2.41s ± 2% ~ (p=0.301 n=20+20)
Unicode 1.31s ± 6% 1.32s ± 4% ~ (p=0.369 n=20+20)
GoTypes 7.92s ± 2% 7.92s ± 1% ~ (p=0.813 n=20+19)
SSA 58.6s ± 3% 58.1s ± 3% ~ (p=0.068 n=20+20)
Flate 1.52s ± 3% 1.52s ± 2% ~ (p=0.647 n=20+19)
GoParser 1.88s ± 2% 1.88s ± 3% ~ (p=0.813 n=20+19)
Reflect 5.29s ± 1% 5.30s ± 2% ~ (p=0.258 n=19+20)
Tar 1.49s ± 5% 1.49s ± 4% ~ (p=0.925 n=20+20)
XML 2.67s ± 2% 2.67s ± 2% ~ (p=0.738 n=20+20)

name old user-ns/op new user-ns/op delta
Template 2.95G ± 2% 2.95G ± 2% ~ (p=0.515 n=19+20)
Unicode 1.60G ± 4% 1.59G ± 3% ~ (p=0.381 n=20+19)
GoTypes 9.49G ± 2% 9.50G ± 1% ~ (p=0.515 n=20+20)
SSA 74.1G ± 1% 73.6G ± 2% -0.65% (p=0.011 n=18+19)
Flate 1.74G ± 3% 1.74G ± 3% ~ (p=0.928 n=19+20)
GoParser 2.23G ± 3% 2.24G ± 4% ~ (p=0.479 n=20+20)
Reflect 6.25G ± 2% 6.25G ± 1% ~ (p=0.884 n=20+19)
Tar 1.85G ± 4% 1.85G ± 4% ~ (p=0.652 n=20+20)
XML 3.15G ± 3% 3.15G ± 2% ~ (p=0.856 n=20+20)

name old text-bytes new text-bytes delta
HelloSize 595k ± 0% 594k ± 0% -0.07% (p=0.000 n=20+20)

name old data-bytes new data-bytes delta
HelloSize 3.59k ± 0% 3.59k ± 0% ~ (all equal)

name old bss-bytes new bss-bytes delta
HelloSize 75.4k ± 0% 75.4k ± 0% ~ (all equal)

name old exe-bytes new exe-bytes delta
HelloSize 1.03M ± 0% 1.03M ± 0% ~ (all equal)

benshi001 · 2017-04-10T11:01:00Z

I also attached the log to CL 39552's commit message. I will try
"compilecmp -n 50 -cpu" tomorrow.

ALTree · 2017-04-10T11:46:19Z

I doubt -n50 will help, those p values are quite high.

But it doesn't really matter, does it? Correct me if i'm wrong, but the goal of your CL was to make the generated code faster, and not to make the compiler faster. So now you've verified that your CL does not make the compiler slower (in fact, it's slightly faster now), but you still have to write benchmarks that show that the code we generate on ARM is executed faster.

@bradfitz Am I missing something? Why are we making OP run the compiler benchmarks? Were you expecting the change to slow down the compiler?

benshi001 · 2017-04-10T13:32:58Z

1. My original intention is to make the generated arm code both faster and shorter. 2. I am not familiar with this benchmark system. How to write a proper test program to verify my change? I have attached a test above, it showed 30% improvement in execution time.

dr2chase · 2017-04-10T14:21:54Z

For benchmarks, you can use some of the go1 benchmarks as a model:
https://github.com/golang/go/blob/master/test/bench/go1/binarytree_test.go
Ending the file name in _test.go is required, naming the benchmark Benchmark... is required I think.

Your benchmark program should contain a loop that runs b.N times so that the benchmarks harness can determine an appropriate run count:

func BenchmarkBinaryTree17(b *testing.B) {
    for i := 0; i < b.N; i++ {
        binarytree(17)
    }
}

Compile-and-test is go test -bench Benchmark -count 10, where "Benchmark" is a regular expression to match tests/benchmarks ("Benchmark" matches all the benchmarks, probably "B" would work just as well).

To compile into a binary so you can examine the generated code, rerun later for testing against a different version, use go test -c . .
Run the resulting test binary with (for example) ./go1.test -test.bench Benchmark -test.count 10.

We use benchstat (go get rsc.io/benchstat) to compare before and after results. If you save the output of two runs, you can compare them with

benchstat -geomean before.log after.log

I tend to run the benchmarks for a count of 25 to be sure that I get good numbers, and of course the more you can reduce changes in machine performance during a benchmark run, the better, but benchstat will help mitigate that somewhat or at least let you know that you have a problem.

ALTree · 2017-04-10T15:02:30Z

@benshi001 if you think your change will benefit ARM code in general, you can start from the go1 benchmark suite mentioned by @dr2chase, i.e. do:

$ cd go/test/bench/go1
$ go test -bench=. -count 20 > old.txt

from the current master, then again from your patch:

$ go test -bench=. -count 20 > new.txt

and then run benchstat old.txt new.txt to compare them.

benchstat is at golang.org/x/perf/cmd/benchstat (the rsc.io/benchstat repository is old and deprecated).

Otherwise (if you see no change on the go1 benchmark suite) you'll have to write a go function that your CL makes faster and benchmark it as @dr2chase explained.

The archive you posted is not a valid benchmark because it's c and assembly code. Benchmarks improvements needs to be measured on go code.

benshi001 · 2017-04-11T02:51:22Z

Thanks to all of your helps. I will try it later.

benshi001 · 2017-04-12T10:45:59Z

With patch set 5 of CL 39552, I did the go1 benchmark test. Here are my operation steps and results.

Steps:

go test -bench=. -count 20 -timeout 60000s > ~/old.txt
apply patch set 5 of CL 39552
go test -bench=. -count 20 -timeout 60000s > ~/new.txt
~/gopath/bin/benchstat ~/old.txt ~/new.txt

How to understand the following results?

name old time/op new time/op delta
BreakImmediate-4 599ns ± 2% 435ns ± 0% -27.31% (p=0.000 n=20+17)
Fannkuch11-4 25.0s ± 0% 25.0s ± 0% +0.14% (p=0.012 n=19+20)
FmtFprintfEmpty-4 895ns ± 2% 893ns ± 0% ~ (p=0.579 n=20+17)
FmtFprintfString-4 1.51µs ± 2% 1.48µs ± 2% -2.16% (p=0.000 n=17+20)
FmtFprintfInt-4 1.50µs ± 2% 1.51µs ± 1% +0.57% (p=0.004 n=18+18)
FmtFprintfIntInt-4 2.19µs ± 1% 2.18µs ± 1% ~ (p=0.292 n=20+17)
FmtFprintfPrefixedInt-4 2.52µs ± 1% 2.52µs ± 0% +0.17% (p=0.010 n=16+20)
FmtFprintfFloat-4 4.60µs ± 1% 4.55µs ± 0% -0.93% (p=0.000 n=19+20)
FmtManyArgs-4 9.02µs ± 1% 8.92µs ± 1% -1.11% (p=0.000 n=20+17)
GobDecode-4 106ms ± 4% 107ms ± 3% +1.00% (p=0.008 n=20+20)
GobEncode-4 91.2ms ± 1% 91.3ms ± 1% ~ (p=0.461 n=19+20)
Gzip-4 4.29s ± 1% 4.30s ± 1% ~ (p=0.355 n=20+20)
Gunzip-4 611ms ± 1% 611ms ± 1% ~ (p=0.301 n=16+19)
HTTPClientServer-4 669µs ± 3% 665µs ± 3% ~ (p=0.277 n=20+20)
JSONEncode-4 284ms ± 2% 282ms ± 1% ~ (p=0.102 n=20+20)
JSONDecode-4 936ms ± 2% 940ms ± 1% ~ (p=0.079 n=20+19)
Mandelbrot200-4 49.3ms ± 0% 49.3ms ± 0% -0.06% (p=0.030 n=20+18)
GoParse-4 44.8ms ± 1% 45.1ms ± 1% +0.61% (p=0.002 n=16+16)
RegexpMatchEasy0_32-4 1.29µs ± 0% 1.31µs ± 1% +1.82% (p=0.000 n=13+18)
RegexpMatchEasy0_1K-4 7.64µs ± 4% 7.67µs ± 5% ~ (p=0.642 n=20+19)
RegexpMatchEasy1_32-4 1.34µs ± 1% 1.33µs ± 1% -0.62% (p=0.000 n=18+18)
RegexpMatchEasy1_1K-4 10.3µs ± 4% 10.4µs ± 4% ~ (p=0.251 n=20+20)
RegexpMatchMedium_32-4 2.09µs ± 0% 2.12µs ± 1% +1.39% (p=0.000 n=12+20)
RegexpMatchMedium_1K-4 532µs ± 0% 534µs ± 1% +0.41% (p=0.001 n=16+18)
RegexpMatchHard_32-4 29.5µs ± 1% 29.8µs ± 0% +0.91% (p=0.000 n=17+19)
RegexpMatchHard_1K-4 889µs ± 2% 895µs ± 0% +0.60% (p=0.003 n=19+16)
Revcomp-4 84.9ms ± 2% 85.3ms ± 2% ~ (p=0.141 n=20+19)
Template-4 1.07s ± 3% 1.04s ± 2% -1.91% (p=0.000 n=18+20)
TimeParse-4 7.12µs ± 2% 7.19µs ± 2% +0.98% (p=0.001 n=19+19)
TimeFormat-4 13.5µs ± 0% 13.5µs ± 1% ~ (p=0.143 n=18+20)

name old speed new speed delta
GobDecode-4 7.22MB/s ± 4% 7.15MB/s ± 3% -0.87% (p=0.014 n=20+19)
GobEncode-4 8.42MB/s ± 1% 8.40MB/s ± 1% ~ (p=0.439 n=19+20)
Gzip-4 4.52MB/s ± 1% 4.51MB/s ± 1% ~ (p=0.211 n=20+20)
Gunzip-4 31.7MB/s ± 1% 31.8MB/s ± 1% ~ (p=0.295 n=16+19)
JSONEncode-4 6.83MB/s ± 2% 6.88MB/s ± 1% ~ (p=0.097 n=20+20)
JSONDecode-4 2.07MB/s ± 2% 2.07MB/s ± 1% -0.45% (p=0.040 n=20+19)
GoParse-4 1.29MB/s ± 1% 1.28MB/s ± 0% -0.71% (p=0.000 n=12+12)
RegexpMatchEasy0_32-4 24.8MB/s ± 0% 24.4MB/s ± 1% -1.77% (p=0.000 n=13+18)
RegexpMatchEasy0_1K-4 134MB/s ± 4% 133MB/s ± 5% ~ (p=0.474 n=20+20)
RegexpMatchEasy1_32-4 23.9MB/s ± 1% 24.1MB/s ± 1% +0.63% (p=0.000 n=18+18)
RegexpMatchEasy1_1K-4 100MB/s ± 4% 99MB/s ± 4% ~ (p=0.250 n=20+20)
RegexpMatchMedium_32-4 480kB/s ± 0% 470kB/s ± 0% -2.08% (p=0.000 n=16+18)
RegexpMatchMedium_1K-4 1.93MB/s ± 0% 1.92MB/s ± 1% -0.50% (p=0.000 n=16+18)
RegexpMatchHard_32-4 1.09MB/s ± 1% 1.07MB/s ± 1% -1.18% (p=0.000 n=17+19)
RegexpMatchHard_1K-4 1.15MB/s ± 2% 1.14MB/s ± 1% -0.73% (p=0.003 n=19+16)
Revcomp-4 29.9MB/s ± 2% 29.8MB/s ± 2% ~ (p=0.139 n=20+19)
Template-4 1.82MB/s ± 4% 1.86MB/s ± 3% +2.29% (p=0.000 n=20+20)

benshi001 · 2017-04-12T10:48:04Z

I also attached the results to commit message of CL 39552.

ALTree · 2017-04-12T11:47:13Z

Mixed results I'd say? You made 5 benchmarks slightly faster, but 10 are slightly slower (and a dozen are unchanged). You can also pass the -geomean flag to benchstat to make it print a line that summarizes the average effect.

Also please change the commit message in the CL to make clear that BreakImmediate was not part of the go1 benchmark suite. E.g. show the effects on go1 bench suite and then add that you also committed a specific Benchmark function BreakImmediate that shows a 30% improvement with your CL.

At this point I'd say the effects of the change are documented, so you'll have to wait for the opinion of whoever review the change.

dr2chase · 2017-04-12T13:57:16Z

Helpful to include the -geomean option (why is it not the default? oh well) so you get the aggregate change.

In practice, for other optimizations, we look for a geomean improvement, and we look at the outliers, and we look skeptically at the inner loops of some of the frequent offenders in benchmark noise (Revcomp in particular).

benshi001 · 2017-04-14T08:47:40Z

I updated my patch, did a new benchmark test, and attached the result in the commit message of patch set 7 of CL 39552 (https://go-review.googlesource.com/?polygerrit=0#/c/39552/).

How to understand the "[Geo mean] 410µs 586µs +42.99%" ? A very bad result?

ALTree · 2017-04-14T09:33:32Z

That geomean can't be correct, either there's something wrong with the files you passed to benchstat or there's a bug in the computation of the geomean. Most of the benchmarks got faster and the worst one is a +2.65%, the geomean can't be +40%.

benshi001 · 2017-04-17T06:58:14Z

Even the go1 benchmark run two times with the same go build, there are difference between the results.

name old time/op new time/op delta
BinaryTree17-4 42.2s ± 1% 42.5s ± 1% +0.61% (p=0.000 n=30+29)
Fannkuch11-4 23.8s ± 1% 23.8s ± 1% ~ (p=0.523 n=30+30)
FmtFprintfEmpty-4 883ns ± 1% 887ns ± 2% ~ (p=0.948 n=27+30)
FmtFprintfString-4 1.47µs ± 0% 1.44µs ± 1% -2.44% (p=0.000 n=28+27)
FmtFprintfInt-4 1.51µs ± 2% 1.49µs ± 1% -1.24% (p=0.000 n=30+27)
FmtFprintfIntInt-4 2.26µs ± 2% 2.25µs ± 1% -0.54% (p=0.003 n=30+30)
FmtFprintfPrefixedInt-4 2.59µs ± 0% 2.63µs ± 0% +1.21% (p=0.000 n=17+20)
FmtFprintfFloat-4 4.49µs ± 0% 4.52µs ± 1% +0.53% (p=0.000 n=24+26)
FmtManyArgs-4 8.78µs ± 1% 8.74µs ± 1% -0.44% (p=0.000 n=27+28)
GobDecode-4 103ms ± 1% 103ms ± 1% ~ (p=0.307 n=24+29)
GobEncode-4 89.8ms ± 2% 89.9ms ± 1% ~ (p=0.482 n=30+28)
Gzip-4 4.23s ± 1% 4.22s ± 2% ~ (p=0.124 n=30+29)
Gunzip-4 608ms ± 2% 605ms ± 1% ~ (p=0.147 n=27+27)
HTTPClientServer-4 729µs ± 3% 707µs ± 3% -2.94% (p=0.000 n=30+29)
JSONEncode-4 281ms ± 0% 281ms ± 1% ~ (p=0.848 n=20+29)
JSONDecode-4 921ms ± 1% 919ms ± 1% -0.17% (p=0.032 n=26+29)
Mandelbrot200-4 49.4ms ± 0% 49.4ms ± 0% ~ (p=0.057 n=25+23)
GoParse-4 45.1ms ± 2% 44.9ms ± 1% ~ (p=0.110 n=29+29)
RegexpMatchEasy0_32-4 1.32µs ± 2% 1.32µs ± 2% ~ (p=0.193 n=29+30)
RegexpMatchEasy0_1K-4 7.81µs ± 6% 7.69µs ± 6% -1.54% (p=0.013 n=30+29)
RegexpMatchEasy1_32-4 1.34µs ± 1% 1.34µs ± 1% ~ (p=0.242 n=29+28)
RegexpMatchEasy1_1K-4 10.5µs ± 2% 10.4µs ± 4% -1.03% (p=0.042 n=26+30)
RegexpMatchMedium_32-4 2.06µs ± 2% 2.05µs ± 1% ~ (p=0.267 n=30+27)
RegexpMatchMedium_1K-4 531µs ± 0% 531µs ± 1% ~ (p=0.274 n=26+28)
RegexpMatchHard_32-4 29.3µs ± 1% 29.3µs ± 1% ~ (p=0.072 n=25+28)
RegexpMatchHard_1K-4 887µs ± 3% 884µs ± 2% ~ (p=0.255 n=30+28)
Revcomp-4 82.6ms ± 3% 82.4ms ± 2% ~ (p=0.922 n=30+29)
Template-4 1.03s ± 1% 1.04s ± 1% ~ (p=0.166 n=29+26)
TimeParse-4 7.09µs ± 2% 7.10µs ± 2% ~ (p=0.190 n=30+30)
TimeFormat-4 13.4µs ± 0% 13.3µs ± 1% -0.45% (p=0.000 n=24+28)
[Geo mean] 746µs 744µs -0.33%

name old speed new speed delta
GobDecode-4 7.43MB/s ± 1% 7.42MB/s ± 1% ~ (p=0.254 n=24+29)
GobEncode-4 8.55MB/s ± 2% 8.54MB/s ± 2% ~ (p=0.370 n=30+29)
Gzip-4 4.59MB/s ± 1% 4.60MB/s ± 2% ~ (p=0.067 n=30+29)
Gunzip-4 31.9MB/s ± 2% 32.1MB/s ± 1% ~ (p=0.160 n=27+27)
JSONEncode-4 6.91MB/s ± 1% 6.90MB/s ± 1% ~ (p=0.576 n=21+29)
JSONDecode-4 2.11MB/s ± 1% 2.11MB/s ± 1% ~ (p=0.413 n=26+29)
GoParse-4 1.29MB/s ± 3% 1.29MB/s ± 1% ~ (p=0.188 n=30+29)
RegexpMatchEasy0_32-4 24.3MB/s ± 2% 24.3MB/s ± 2% ~ (p=0.140 n=29+30)
RegexpMatchEasy0_1K-4 131MB/s ± 6% 133MB/s ± 5% +1.58% (p=0.012 n=30+29)
RegexpMatchEasy1_32-4 23.9MB/s ± 1% 23.9MB/s ± 1% ~ (p=0.298 n=29+28)
RegexpMatchEasy1_1K-4 97.5MB/s ± 2% 98.5MB/s ± 4% +1.06% (p=0.042 n=26+30)
RegexpMatchMedium_32-4 485kB/s ± 3% 490kB/s ± 0% +0.96% (p=0.001 n=30+24)
RegexpMatchMedium_1K-4 1.93MB/s ± 0% 1.93MB/s ± 1% ~ (p=0.286 n=26+28)
RegexpMatchHard_32-4 1.09MB/s ± 0% 1.09MB/s ± 0% ~ (all equal)
RegexpMatchHard_1K-4 1.15MB/s ± 3% 1.16MB/s ± 2% ~ (p=0.287 n=30+28)
Revcomp-4 30.8MB/s ± 3% 30.8MB/s ± 2% ~ (p=0.921 n=30+29)
Template-4 1.88MB/s ± 1% 1.87MB/s ± 1% -0.28% (p=0.041 n=29+26)
[Geo mean] 6.61MB/s 6.63MB/s +0.29%

benshi001 · 2017-04-17T11:37:39Z

Here is a contrast, can I treated the result as "
there is a -0.19% / + 1.08% improvement with my patch (excluding the floating error) ?

go VS go
[Geo mean] 748µs 749µs +0.25%
[Geo mean] 6.54MB/s 6.55MB/s +0.11%

My patch VS my patch
[Geo mean] 746µs 744µs -0.33%
[Geo mean] 6.61MB/s 6.63MB/s +0.29%

go VS my patch
[Geo mean] 748µs 744µs -0.52%
[Geo mean] 6.54MB/s 6.63MB/s +1.37%

benshi001 · 2017-04-17T11:59:32Z

I also attached a detailed log to the commit message of CL 39552 (https://go-review.googlesource.com/?polygerrit=0#/c/39552/ ).

benshi001 · 2017-04-18T03:11:35Z

I made go1 benchmark test two times for the original go and two times for my patch. Then 6 comparison are made. (old means original go and new means my patch)
old_1 vs old_2
new_1 vs new_2
old_1 vs new_1
old_1 vs new_2
old_2 vs new_1
old_2 vs new_2

The conclusion is,

There are floating errors among each round of test
Some single test (HTTPClientServer-4) varies much larger than others among each round of test
My patch has optimization in general (excluding the floating error)

The attachment is the detailed comparison.
compare.zip

benshi001 · 2017-04-26T03:03:12Z

Current status,

Keith Randall has a better solution https://go-review.googlesource.com/41612
than mine https://go-review.googlesource.com/#/c/39552/

And here is a supplement https://go-review.googlesource.com/#/c/41679/

benshi001 · 2017-04-28T02:49:57Z

update:
Keith Randall's solution https://go-review.googlesource.com/41612 is merged. But more need to be optimized.

a = b + 0x00ffff00 -> a = (b + 0x01000000) - 0x00000100
a = b + 0xfffffff0 -> a = b - 0x10

gopherbot · 2017-05-09T04:00:38Z

CL https://golang.org/cl/42430 mentions this issue.

benshi001 changed the title ~~Opimization of constant pool~~ cmd/internal/obj/arm: Opimization of constant pool Apr 5, 2017

benshi001 changed the title ~~cmd/internal/obj/arm: Opimization of constant pool~~ cmd/compile: Opimization of constant pool Apr 5, 2017

bradfitz added the Performance label Apr 5, 2017

bradfitz added this to the Unplanned milestone Apr 5, 2017

bradfitz changed the title ~~cmd/compile: Opimization of constant pool~~ cmd/compile: optimization of constant pool on arm Apr 5, 2017

bradfitz modified the milestones: Go1.9Maybe, Unplanned Apr 5, 2017

gopherbot closed this as completed in 6897030 May 11, 2017

benshi001 mentioned this issue May 25, 2017

cmd/internal/obj/arm: optimize MVN/MOVW #20493

Closed

golang locked and limited conversation to collaborators May 11, 2018

gopherbot added the FrozenDueToAge label May 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/compile: optimization of constant pool on arm #19844

cmd/compile: optimization of constant pool on arm #19844

benshi001 commented Apr 5, 2017 •

edited

benshi001 commented Apr 5, 2017 •

edited

gopherbot commented Apr 5, 2017

benshi001 commented Apr 6, 2017

benshi001 commented Apr 6, 2017

bradfitz commented Apr 6, 2017

josharian commented Apr 6, 2017

josharian commented Apr 6, 2017

benshi001 commented Apr 7, 2017 •

edited

benshi001 commented Apr 7, 2017

benshi001 commented Apr 7, 2017

josharian commented Apr 7, 2017

benshi001 commented Apr 10, 2017

benshi001 commented Apr 10, 2017

benshi001 commented Apr 10, 2017

benshi001 commented Apr 10, 2017

ALTree commented Apr 10, 2017 •

edited

benshi001 commented Apr 10, 2017 via email

dr2chase commented Apr 10, 2017

ALTree commented Apr 10, 2017

benshi001 commented Apr 11, 2017

benshi001 commented Apr 12, 2017

benshi001 commented Apr 12, 2017

ALTree commented Apr 12, 2017 •

edited

dr2chase commented Apr 12, 2017

benshi001 commented Apr 14, 2017

ALTree commented Apr 14, 2017

benshi001 commented Apr 17, 2017

benshi001 commented Apr 17, 2017

benshi001 commented Apr 17, 2017

benshi001 commented Apr 18, 2017

benshi001 commented Apr 26, 2017 •

edited

benshi001 commented Apr 28, 2017

gopherbot commented May 9, 2017

cmd/compile: optimization of constant pool on arm #19844

cmd/compile: optimization of constant pool on arm #19844

Comments

benshi001 commented Apr 5, 2017 • edited

benshi001 commented Apr 5, 2017 • edited

gopherbot commented Apr 5, 2017

benshi001 commented Apr 6, 2017

benshi001 commented Apr 6, 2017

bradfitz commented Apr 6, 2017

josharian commented Apr 6, 2017

josharian commented Apr 6, 2017

benshi001 commented Apr 7, 2017 • edited

benshi001 commented Apr 7, 2017

benshi001 commented Apr 7, 2017

josharian commented Apr 7, 2017

benshi001 commented Apr 10, 2017

benshi001 commented Apr 10, 2017

benshi001 commented Apr 10, 2017

benshi001 commented Apr 10, 2017

ALTree commented Apr 10, 2017 • edited

benshi001 commented Apr 10, 2017 via email

dr2chase commented Apr 10, 2017

ALTree commented Apr 10, 2017

benshi001 commented Apr 11, 2017

benshi001 commented Apr 12, 2017

benshi001 commented Apr 12, 2017

ALTree commented Apr 12, 2017 • edited

dr2chase commented Apr 12, 2017

benshi001 commented Apr 14, 2017

ALTree commented Apr 14, 2017

benshi001 commented Apr 17, 2017

benshi001 commented Apr 17, 2017

benshi001 commented Apr 17, 2017

benshi001 commented Apr 18, 2017

benshi001 commented Apr 26, 2017 • edited

benshi001 commented Apr 28, 2017

gopherbot commented May 9, 2017

benshi001 commented Apr 5, 2017 •

edited

benshi001 commented Apr 5, 2017 •

edited

benshi001 commented Apr 7, 2017 •

edited

ALTree commented Apr 10, 2017 •

edited

ALTree commented Apr 12, 2017 •

edited

benshi001 commented Apr 26, 2017 •

edited