Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: strange performance difference between two implementations #49785

Open
go101 opened this issue Nov 24, 2021 · 4 comments
Open

cmd/compile: strange performance difference between two implementations #49785

go101 opened this issue Nov 24, 2021 · 4 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@go101
Copy link

go101 commented Nov 24, 2021

What version of Go are you using (go version)?

$ go version
go version go1.17.3 linux/amd64

Does this issue reproduce with the latest release?

Yes

What did you do?

package pointers

import "testing"

const N = 10000

type T struct {
	x int
}

//go:noinline
func f(t *T) {
	t.x = 0
	for i := 0; i < N; i++ {
		t.x += i
	}
}

//go:noinline
func g(t *T) {
	var x = 0
	for i := 0; i < N; i++ {
		x += i
	}
	t.x = x
}

func Benchmark_f(b *testing.B) {
	var t = &T{}
	for i := 0; i < b.N; i++ { f(t) }
}

func Benchmark_g(b *testing.B) {
	var t = &T{}
	for i := 0; i < b.N; i++ { g(t) }
}

What did you expect to see?

Similar performances.

What did you see instead?

goos: linux
goarch: amd64
pkg: example.com
cpu: Intel(R) Core(TM) i5-4210U CPU @ 1.70GHz
Benchmark_f-4   	   48352	     24403 ns/op
Benchmark_g-4   	  292581	      3956 ns/op

I checked the generated assembly instructions. Yes, they are different, but the complexities are similar. So it is some strange that the performance difference is so large.

@randall77
Copy link
Contributor

The inner loop in f still has writes in it, which is probably why it is slower than g (whose inner loop is completely in registers).

To fix this I think we'd have to promote t.x from memory to register somehow. That seems pretty challenging.
(If the loop were unrolled it might get much of that effect automatically.)

@randall77 randall77 added this to the Unplanned milestone Nov 24, 2021
@heschi heschi added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Nov 24, 2021
@go101
Copy link
Author

go101 commented Nov 25, 2021

The problem also exists for reads:

package pointers

import "testing"

const N = 1000
var a [N]int
var r int

//go:noinline
func g1(a *[N]int) int {
	var r int
	_ = *a
	for i := range a {
		r += a[i]
	}
	return r
}

//go:noinline
func g0(a *[N]int) int {
	var r int
	for i := range a {
		r += a[i]
	}
	return r
}

func Benchmark_g1(b *testing.B) {
	for i := 0; i < b.N; i++ { r = g1(&a) }
}

func Benchmark_g0(b *testing.B) {
	for i := 0; i < b.N; i++ { r = g0(&a) }
}
Benchmark_g1-4   	 2178316	       556.8 ns/op
Benchmark_g0-4   	 1949654	       611.8 ns/op

@go101
Copy link
Author

go101 commented Nov 25, 2021

It looks the read case is different from the write case. The compiler generates one more instruction TESTB AL, (AX) for the g0 function in the read case.

@randall77
Copy link
Contributor

@go101 In your example it's just that the nil pointer check is outside the loop in g1 but inside the loop in g0. We'd need to lift the nil check out of the loop to make them the same speed. Which I believe is #41666.

@seankhliao seankhliao changed the title cmd/compile: strange performacne difference between two implementations cmd/compile: strange performance difference between two implementations Nov 25, 2021
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
None yet
Development

No branches or pull requests

4 participants