Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: some problems on optimization when takes care about of arrays in struct #20022

Open
zhaozhiqianghw opened this issue Apr 18, 2017 · 7 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. help wanted NeedsFix The path to resolution is known, but the work has not been done. Performance
Milestone

Comments

@zhaozhiqianghw
Copy link

zhaozhiqianghw commented Apr 18, 2017

As @ianlancetaylor and @ALTree mentioned, I changed the version from 1.7.4 to 1.8.1 to show the bug clearly.

What version of Go are you using (go version)?

go version go1.8.1 linux/amd64

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH=""
GORACE=""
GOROOT="/home/zhaozq/src/go"
GOTOOLDIR="/home/zhaozq/src/go/pkg/tool/linux_amd64"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build701526462=/tmp/go-build"
CXX="g++"
CGO_ENABLED="1"

What did you do?

I test several examples to test the performance of iterating an array. All of the version are compile by go build

version I

I put the array into a struct

package main

const NUM = 100

type T struct {
     arr [NUM]int
}

func sub(t *T) {
     for i := 0; i < NUM; i++ {
         t.arr[i] = 3
     }
}

func main() {
     var t T
     sub(&t)
}

The assemble of function sub is

000000000044d5b0 <main.sub>:
  44d5b0:       48 8b 44 24 08          mov    0x8(%rsp),%rax
  44d5b5:       31 c9                   xor    %ecx,%ecx
  44d5b7:       48 83 f9 64             cmp    $0x64,%rcx
  44d5bb:       7d 13                   jge    44d5d0 <main.sub+0x20>
  44d5bd:       84 00                   test   %al,(%rax)
  44d5bf:       48 c7 04 c8 03 00 00    movq   $0x3,(%rax,%rcx,8)
  44d5c6:       00
  44d5c7:       48 ff c1                inc    %rcx
  44d5ca:       48 83 f9 64             cmp    $0x64,%rcx
  44d5ce:       7c ed                   jl     44d5bd <main.sub+0xd>
  44d5d0:       c3                      retq

And the loop is

  44d5bd:       84 00                   test   %al,(%rax)
  44d5bf:       48 c7 04 c8 03 00 00    movq   $0x3,(%rax,%rcx,8)
  44d5c6:       00
  44d5c7:       48 ff c1                inc    %rcx
  44d5ca:       48 83 f9 64             cmp    $0x64,%rcx
  44d5ce:       7c ed                   jl     44d5bd <main.sub+0xd>

The line 44d5bdis quite strange. What's the usage of test here?? I think it's useless, and it will waste a cycle. We care about it, because we will use this kind of function a lot.

version II

We use a way to make the compiler do not what we are passing.

package main

import "unsafe"

const NUM = 100

type T struct {
     arr [NUM]int
}

func sub(p unsafe.Pointer) {
     for i := 0; i < NUM; i++ {
         *(*int)(p) = 3
         p = unsafe.Pointer(uintptr(p) + 8)
     }
}

func main() {
     var t T
     sub(unsafe.Pointer(&t))
}

The assembler of function sub

000000000044d5b0 <main.sub>:
  44d5b0:       48 8b 44 24 08          mov    0x8(%rsp),%rax
  44d5b5:       31 c9                   xor    %ecx,%ecx
  44d5b7:       48 83 f9 64             cmp    $0x64,%rcx
  44d5bb:       7d 14                   jge    44d5d1 <main.sub+0x21>
  44d5bd:       48 c7 00 03 00 00 00    movq   $0x3,(%rax)
  44d5c4:       48 ff c1                inc    %rcx
  44d5c7:       48 83 c0 08             add    $0x8,%rax
  44d5cb:       48 83 f9 64             cmp    $0x64,%rcx
  44d5cf:       7c ec                   jl     44d5bd <main.sub+0xd>
  44d5d1:       c3                      retq

The loop is

  44d5bd:       48 c7 00 03 00 00 00    movq   $0x3,(%rax)
  44d5c4:       48 ff c1                inc    %rcx
  44d5c7:       48 83 c0 08             add    $0x8,%rax
  44d5cb:       48 83 f9 64             cmp    $0x64,%rcx
  44d5cf:       7c ec                   jl     44d5bd <main.sub+0xd>

The number of operations in version II is the same as version I, but memory read times have been reduced, so this kind of version is faster.

version III

To reduce the numbers of operations in version II, I changed again.

package main

import "unsafe"

const NUM = 100

type T struct {
     arr [NUM]int
}

func sub(p unsafe.Pointer) {
     maxp := uintptr(p) + 8 * NUM
     for ; uintptr(p) < maxp; {
         *(*int)(p) = 3
         p = unsafe.Pointer(uintptr(p) + 8)
     }
}

func main() {
     var t T
     sub(unsafe.Pointer(&t))
}

The assemble of function sub

000000000044d5b0 <main.sub>:
  44d5b0:       48 8b 44 24 08          mov    0x8(%rsp),%rax
  44d5b5:       48 89 c1                mov    %rax,%rcx
  44d5b8:       48 81 c1 20 03 00 00    add    $0x320,%rcx
  44d5bf:       48 89 c2                mov    %rax,%rdx
  44d5c2:       48 39 ca                cmp    %rcx,%rdx
  44d5c5:       73 13                   jae    44d5da <main.sub+0x2a>
  44d5c7:       48 c7 00 03 00 00 00    movq   $0x3,(%rax)
  44d5ce:       48 83 c0 08             add    $0x8,%rax
  44d5d2:       48 89 c2                mov    %rax,%rdx
  44d5d5:       48 39 ca                cmp    %rcx,%rdx
  44d5d8:       72 ed                   jb     44d5c7 <main.sub+0x17>
  44d5da:       c3                      retq

The loop is

  44d5c7:       48 c7 00 03 00 00 00    movq   $0x3,(%rax)
  44d5ce:       48 83 c0 08             add    $0x8,%rax
  44d5d2:       48 89 c2                mov    %rax,%rdx
  44d5d5:       48 39 ca                cmp    %rcx,%rdx
  44d5d8:       72 ed                   jb     44d5c7 <main.sub+0x17>

The number of operations are not reduced. The reason is 44d5d2 and 44d5d2.

  44d5d2:       48 89 c2                mov    %rax,%rdx
  44d5d5:       48 39 ca                cmp    %rcx,%rdx```

Why not `cmp %rcx %rax`? Why introduce `%rdx`....?

### What did you expect to see?

Please focus version I and version III.
@ianlancetaylor
Copy link
Contributor

We don't use the issue tracker for discussions. If you just want to talk about this, please raise it on the mailing list golang-dev. If there is a specific bug here that you think should be addressed, please be precise: give one test case, and show why the output is sub-optimal. Right now I find this issue too complex--I don't know what is broken and what should be fixed. I also don't care about the performance of code compiled with -B. Thanks.

@ianlancetaylor ianlancetaylor changed the title [performance] compile: some puzzles on optimization when takes care about of arrays in struct cmd/compile: some puzzles on optimization when takes care about of arrays in struct Apr 18, 2017
@ALTree
Copy link
Member

ALTree commented Apr 18, 2017

Also you should disassemble go1.8 code at least (even better: tip, the current master). Reports that point out deficiencies in the go1.7 compiler are not very useful because whoever process your report will have to re-check everything with tip just to understand if the issues you reported are still valid or not.

@zhaozhiqianghw
Copy link
Author

@ianlancetaylor Actually, I use -B to make the assemble easy to read. And the bug(I think it's a bug) is also there when remove -B. And I hope this can be fixed(by removing useless operations).

@zhaozhiqianghw
Copy link
Author

@ALTree Go 1.8 is the same as Go 1.7,4. I should put Go 1.8 instead of Go 1.7. But I really tried Go 1.8. This situation is always there.

@zhaozhiqianghw
Copy link
Author

@ianlancetaylor @ALTree Changed to 1.8.1, and show the problem more clearly.

@zhaozhiqianghw zhaozhiqianghw changed the title cmd/compile: some puzzles on optimization when takes care about of arrays in struct cmd/compile: some problems on optimization when takes care about of arrays in struct Apr 18, 2017
@randall77
Copy link
Contributor

Both of these cases (II and IV) are still happening at tip.

In II, that extra instruction is a nil check. The compiler does not yet know that it can lift that nil check out of the loop. You could do something like

	_ = t.arr[0]

before the loop to lift that nil check out yourself.

In IV, the extra move is an unfortunate side-effect of how we track unsafe.Pointer <-> uintptr conversions. We are careful in the compiler to keep those two types separate to keep the pointer maps for the garbage collector accurate. This means we end up with an extra MOVQconvert that isn't necessary.
A rule like (CMPQ (MOVQconvert x) y) -> (CMPQ x y) would probably help.

@bradfitz bradfitz added help wanted NeedsFix The path to resolution is known, but the work has not been done. Performance labels Apr 18, 2017
@bradfitz bradfitz added this to the Unplanned milestone Apr 18, 2017
@josharian
Copy link
Contributor

For IV (the extra move), see #21572.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. help wanted NeedsFix The path to resolution is known, but the work has not been done. Performance
Projects
None yet
Development

No branches or pull requests

7 participants