Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: binary.PutUint64 performance degradation in 1.10 #24078

Closed
pascaldekloe opened this issue Feb 23, 2018 · 14 comments
Closed

cmd/compile: binary.PutUint64 performance degradation in 1.10 #24078

pascaldekloe opened this issue Feb 23, 2018 · 14 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.

Comments

@pascaldekloe
Copy link
Contributor

Two little highly-optimised functions suffer from serious performance when compared to Go version 1.9.

name          old time/op    new time/op    delta
PutUint64-12    4.09ns ± 0%    5.79ns ± 0%  +41.62%  (p=0.000 n=20+19)
Uint64-12       4.24ns ± 0%    6.54ns ± 0%  +54.35%  (p=0.000 n=20+19)

The assembler output from go build -gcflags=-S is exactly the same so I don't know where to look for. If anybody can give me some pointers I'd be happy to build a clean test case.

Here follows a snippet from https://github.com/pascaldekloe/flit for context.

// PutUint64 encodes an integer into buf and returns the serial size.
// If the buffer is smaller than 9 bytes, PutUint64 may panic.
func PutUint64(buf []byte, v uint64) (n int) {
	if v < 128 {
		buf[0] = uint8(v)<<1 | 1
		return 1
	}
	if v >= uint64(1) << 56 {
		buf[0] = 0
		binary.LittleEndian.PutUint64(buf[1:], v)
		return 9
	}

	lz := bits.LeadingZeros64(v)
	// extra bytes = (bits - 1) / 7 = (63 - lz) / 7
	e := ((63 - lz) * 2454267027) >> 34

	v <<= 1
	v |= 1
	v <<= uint(e) & 63
	binary.LittleEndian.PutUint64(buf, v)

	return e + 1
}
@bcmills
Copy link
Contributor

bcmills commented Feb 23, 2018

The assembler output from go build -gcflags=-S is exactly the same so I don't know where to look for.

Last time I saw a performance issue like that, it turned out that the performance of the CPU was sensitive to the alignment of the function in memory.

That said, this function seems small enough that it is likely to be inlined. Do you have a larger benchmark that shows the same regressions, or do they only occur in microbenchmarks? (That is, what is the impact of the regression on the overall program, rather than just the benchmark?)

@mvdan
Copy link
Member

mvdan commented Feb 23, 2018

git diff go1.9..go1.10 -- src/encoding/binary shows no changes to the implementation, so this is likely a compiler issue if there is an issue at all.

Perhaps the function used to be inlined, but isn't being inlined anymore?

@ALTree
Copy link
Member

ALTree commented Feb 23, 2018

The assembler output from go build -gcflags=-S is exactly the same

This is not completely true. The 1.10 function is slightly larger.

1.9:

"".PutUint64 STEXT nosplit size=269 args=0x28 locals=0x8

1.10 (note size=):

"".PutUint64 STEXT nosplit size=276 args=0x28 locals=0x8

the reason the 1.10 is larger is that it has one more panicindex

0x010b 00267 (prova.go:14)	UNDEF
0x010d 00269 (prova.go:10)	PCDATA	$0, $1
0x010d 00269 (prova.go:10)	CALL	runtime.panicindex(SB)

at the bottom. 1.9 has three, 1.10 has four.

@mvdan mvdan changed the title Performance degradation in 1.10 cmd/compile: binary.PutUint64 performance degradation in 1.10 Feb 23, 2018
@mvdan mvdan added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Feb 23, 2018
@pascaldekloe
Copy link
Contributor Author

pascaldekloe commented Feb 23, 2018

Spot on @mvdan! Go 1.10 does no longer inline these functions into the Benchmark functions.

I was just about to apply this function and for as far as I know there are no users yet @bcmills. I care not so much for this specific case yet this could have been something not visible for most cases.

The panic case is not benchmarked @ALTree.

The help is much appreciated. Thanks a bunch.

@mvdan
Copy link
Member

mvdan commented Feb 23, 2018

I'm confused - are you closing this because Go 1.10 no longer inlines calls made in benchmark functions by design?

@ALTree
Copy link
Member

ALTree commented Feb 23, 2018

The panic case is not benchmarked

I know that. But the function size sometime matters in microbenchmarks.

Also why did you close this? Is the fact that the function is no longer inlined expected?

@pascaldekloe
Copy link
Contributor Author

Weather the compiler inlines a function or not can hardly be a bug right? And this might be triggered by the slightly larger size indeed. I was worried that some essential was seriously off.

@mvdan
Copy link
Member

mvdan commented Feb 23, 2018

I would expect that, given that PutUint64 hasn't changed between 1.9 and 1.10, the compiled code should be the same or better - not worse. Unless I'm missing something obvious?

One way or another, no harm in leaving this open in case there is actually a compiler regression somewhere.

@ALTree
Copy link
Member

ALTree commented Feb 23, 2018

Weather the compiler inlines a function or not can hardly be a bug right?

It absolutely can. We are definitely interested in understanding why your function is no longer inlined in 1.10.

@ALTree ALTree reopened this Feb 23, 2018
@ALTree
Copy link
Member

ALTree commented Feb 23, 2018

1.10 "-m -m" says

cannot inline PutUint64: function too complex: cost 216 exceeds budget 80

while 1.9 inlines it without complaining.

@mvdan
Copy link
Member

mvdan commented Feb 23, 2018

/cc @mdempsky @josharian @TocarIP

@ALTree
Copy link
Member

ALTree commented Feb 23, 2018

I believe this is caused by the fix for #19261.

1.9 inlines the function because it incorrectly computed a cost of 0 for cross-package calls like binary.LittleEndian.PutUint64. This was fixed for 1.10 in CL 70151, and since the cost of the binary methods calls is now computed correctly, OP's function is much more heavier and it's no longer marked as inlineable.

So I believe this is WAI.

@TocarIP
Copy link
Contributor

TocarIP commented Feb 23, 2018

I think https://go-review.googlesource.com/c/go/+/95475 is relevant here
Edit:
It fixes PutUint32/16 degradation and not 64, so it isn't relevant here.

@ALTree
Copy link
Member

ALTree commented Mar 1, 2018

Closing this, since it is expected that the function is no longer inlined (i linked the fix that caused this above).

@ALTree ALTree closed this as completed Mar 1, 2018
pascaldekloe added a commit to pascaldekloe/flit that referenced this issue Jul 28, 2018
@golang golang locked and limited conversation to collaborators Mar 1, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

6 participants