Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: unexpected benchmark results #64328

Open
go101 opened this issue Nov 22, 2023 · 11 comments
Open

cmd/compile: unexpected benchmark results #64328

go101 opened this issue Nov 22, 2023 · 11 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@go101
Copy link

go101 commented Nov 22, 2023

What version of Go are you using (go version)?

$ go version
go version go1.21.4 linux/amd64

Does this issue reproduce with the latest release?

Yes

What did you do?

package main

import "testing"

var n = 1024 * 1024 * 16

type Large = int
var r1 []Large
var r2 []Large

func Benchmark_CompactFunc(b *testing.B) {
	ss1 := make([]Large, n)
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		r1 = CompactFunc(ss1, func(a, b Large) bool {return a == b})
	}
}

func Benchmark_CompactFunc_2(b *testing.B) {
	ss1 := make([]Large, n)
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		r2 = CompactFunc_2(ss1, func(a, b Large) bool {return a == b})
	}
}


func CompactFunc[S ~[]E, E any](s S, eq func(E, E) bool) S {
	if len(s) < 2 {
		return s
	}
	i := 1
	for k := 1; k < len(s); k++ {
		if !eq(s[k], s[k-1]) {
			if i != k {
				s[i] = s[k]
			}
			i++
		}
	}
	clear(s[i:]) // zero/nil out the obsolete elements, for GC
	return s[:i]
}

var global bool

func CompactFunc_2[S ~[]E, E any](s S, eq func(E, E) bool) S {
	global = true // This is the only difference

	if len(s) < 2 {
		return s
	}
	i := 1
	for k := 1; k < len(s); k++ {
		if !eq(s[k], s[k-1]) {
			if i != k {
				s[i] = s[k]
			}
			i++
		}
	}
	clear(s[i:]) // zero/nil out the obsolete elements, for GC
	return s[:i]
}

What did you expect to see?

Similar performance for the specified test case.

What did you see instead?

Benchmark result shows CompactFunc_2 is (about 15%) slower, which is quite confusing.

goos: linux
goarch: amd64
pkg: example.com
cpu: Intel(R) Core(TM) i5-4210U CPU @ 1.70GHz
Benchmark_CompactFunc-4     	      19	  66914978 ns/op
Benchmark_CompactFunc_2-4   	      18	  75483657 ns/op
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Nov 22, 2023
@Jorropo
Copy link
Member

Jorropo commented Nov 22, 2023

I am not able to reproduce:

> go test -bench=. a_test.go
goos: linux
goarch: amd64
cpu: AMD Ryzen 5 3600 6-Core Processor              
Benchmark_CompactFunc-12      	      30	  38447312 ns/op
Benchmark_CompactFunc_2-12    	      26	  38899574 ns/op
PASS
ok  	command-line-arguments	4.652s

Using:

go version devel go1.22-5f7a408563 Wed Nov 22 02:20:04 2023 +0000 linux/amd64

@Jorropo
Copy link
Member

Jorropo commented Nov 22, 2023

@go101 could you please run 30 interleaved benchmarks and benchstat them ?
(is it not your laptop going into power saving mode, or some other unrelated side effect ?)

@go101
Copy link
Author

go101 commented Nov 22, 2023

How to do it interleavely? I didn't find the option.
I changed the banchmark order in code, same result on my laptop, and the same for go version devel go1.22-aa9dd50095.

(My laptop is always in charging.)

@aimuz
Copy link
Contributor

aimuz commented Nov 22, 2023

go test -bench ^Benchmark_CompactFunc$ -benchmem -run ^$ -count 30 > CompactFunc.out
go test -bench ^Benchmark_CompactFunc_2$ -benchmem -run ^$ -count 30 > CompactFunc_2.out

benchstat CompactFunc.out CompactFunc_2.out

@go101
Copy link
Author

go101 commented Nov 22, 2023

$ benchstat compact_old.txt compact_new.txt
goos: linux
goarch: amd64
pkg: example.com
cpu: Intel(R) Core(TM) i5-4210U CPU @ 1.70GHz
                 │ compact_old.txt │  compact_new.txt   │
                 │     sec/op      │   sec/op     vs base   │
_CompactFunc-4         54.25m ± 1%
_CompactFunc_2-4                     60.73m ± 0%
geomean                54.25m        60.73m       ? ¹ ²
¹ benchmark set differs from baseline; geomeans may not be comparable
² ratios must be >0 to compute geomean

@aimuz
Copy link
Contributor

aimuz commented Nov 22, 2023

Perhaps if you manually change _CompactFunc_2 to _CompactFunc you can use it as a comparison

@Jorropo
Copy link
Member

Jorropo commented Nov 22, 2023

@aimuz this is not interleaved, it does not help against bias that would randomly start or stop in the benchmark.

@go101 you need to either use -c and then run it in a for loop, but here since it's two functions just the loop should work:

for i in $(seq 30); do go test -bench=. file_test.go >> results; done

@go101
Copy link
Author

go101 commented Nov 22, 2023

I don't get it. Should the -shuffle=on option be used? Same result with this option.

@go101
Copy link
Author

go101 commented Nov 22, 2023

Checked the results of -gcflags=-S, the only difference between the assemble code of the two functions is for the i++ lines:

...
example%2ecom.CompactFunc[go.shape.[]int,go.shape.int] STEXT dupok size=369 args=0x28 locals=0x30 funcid=0x0 align=0x0
...
	0x00cf 00207 (compact_test.go:38)	LEAQ	1(AX), DX
	0x00d3 00211 (compact_test.go:38)	JMP	76
...
example%2ecom.CompactFunc_2[go.shape.[]int,go.shape.int] STEXT dupok size=389 args=0x28 locals=0x30 funcid=0x0 align=0x0
...
	0x00d8 00216 (compact_test.go:59)	LEAQ	1(AX), DX
	0x00dc 00220 (compact_test.go:59)	NOP
	0x00e0 00224 (compact_test.go:59)	JMP	85

A NOP instruction can cause so large performance difference?

@mknyszek mknyszek added this to the Backlog milestone Nov 22, 2023
@mknyszek
Copy link
Contributor

CC @golang/compiler

@mknyszek mknyszek added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Nov 22, 2023
@thanm
Copy link
Contributor

thanm commented Nov 22, 2023

Many intel processors are sensitive to loop alignment; if the target of a loop is not aligned properly, this can cause slowdowns; this could be what's happening here. The specific rules are processor specific, to make things more complicated.

Suggestion: try doing this to gather more info:

#!/bin/sh
set -x
rm -f t.exe
go test -c -o t.exe -bench=. prog_test.go
./t.exe -test.bench=.
perf record ./t.exe -test.bench=.
perf annotate -s 'command-line-arguments.CompactFunc_2[go.shape.[]int,go.shape.int]' > withglobalstore.txt
perf annotate -s 'command-line-arguments.CompactFunc[go.shape.[]int,go.shape.int]' > base.txt

You can then compare the two to see if anything jumps out in terms of specific instructions taking more cycles.

@mknyszek mknyszek modified the milestones: Backlog, Unplanned Dec 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
None yet
Development

No branches or pull requests

6 participants