Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unicode/utf8: make DecodeRune and DecodeLastRune inlineable #48195

Open
dsnet opened this issue Sep 5, 2021 · 10 comments
Open

unicode/utf8: make DecodeRune and DecodeLastRune inlineable #48195

dsnet opened this issue Sep 5, 2021 · 10 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@dsnet
Copy link
Member

dsnet commented Sep 5, 2021

Code like the following is unfortunately very common:

r, n := rune(b[0]), 1
if r >= utf8.RuneSelf {
    r, n = utf8.DecodeRune(b)
}

and we have several examples in the standard library itself:

All of these cases ultimate come down to the caller manually performing inlined logic for the ASCII case. We should make it such that DecodeRune and DecodeLastRune are inlineable for the ASCII case. Unfortunately my attempts to do so exceed the inline budget by ever so much:

// cannot inline DecodeRune: function too complex: cost 87 exceeds budget 80
func DecodeRune(p []byte) (r rune, size int) {
	if len(p) > 0 && p[0] < RuneSelf {
		return rune(p[0]), 1
	}
	return decodeRune(p)
}

// cannot inline DecodeLastRune: function too complex: cost 93 exceeds budget 80
func DecodeLastRune(p []byte) (r rune, size int) {
	if len(p) > 0 && p[len(p)-1] < UTFMax {
		return rune(p[len(p)-1]), 1
	}
	return decodeLastRune(p)
}

We should find a way to make these inlinable whether by special casing them, clever tricks to make the cost lower, by increasing the inline budget, etc.

\cc @CAFxX @josharian @mdempsky @martisch

@CAFxX
Copy link
Contributor

CAFxX commented Sep 5, 2021

FWIW, I gave it a shot before: https://go-review.googlesource.com/c/go/+/332770

Unfortunately yes, I really can not see a way to make it work without messing with the inliner (either by changing the heuristics, or by adding special cases) or otherwise adding some kind of annotation.

@renthraysk
Copy link

Seem to remember commenting on a similar issue, that couldn't do the same with binary.Uvarint().

#31666

But now with for loop inlining since added to go, Uvarint() can be made to (replace range use) inline as whole without needing a separate func.

@go101
Copy link

go101 commented Sep 7, 2021

The inline cost of the following implementation is 83:

func DecodeRune(p []byte) (r rune, size int) {
	if len(p) > 0 && p[0] < RuneSelf {
		r, size = rune(p[0]), 1
	} else {
		r, size = decodeRune(p)
	}
	return
}

@go101
Copy link

go101 commented Sep 7, 2021

And this one is 81:

func DecodeRune(p []byte) (r rune, size int) {
	if len(p) > 0 && p[0] < RuneSelf {
		return rune(p[0]), 1
	}
	r, size = decodeRune(p)
	return
}

@go101
Copy link

go101 commented Sep 7, 2021

Maybe the inline cost calculation should consider whether or not bound checks are eliminated.
Here, the bound checks for p[0] are eliminated, so its cost should be some lower.

BTW, why the inline costs of function calls are so large?

@cagedmantis cagedmantis added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Sep 13, 2021
@cagedmantis cagedmantis added this to the Backlog milestone Sep 13, 2021
@mengzhuo
Copy link
Contributor

The ExtraCall budget is 57 plus two arguments we had at least 64 budget, maybe we can force inline by asm code.
However I've tried modified the ExtraCall budget and it appears only improve ASCII related DecodeRune and regression on slow path.

https://perf.golang.org/search?q=upload:20210922.6

name                               old time/op  new time/op  delta
RuneCountTenASCIIChars             9.27ns ± 1%  9.27ns ± 1%     ~     (p=0.151 n=9+9)
RuneCountTenJapaneseChars          37.8ns ± 2%  38.2ns ± 1%   +1.09%  (p=0.022 n=10+8)
RuneCountInStringTenASCIIChars     9.26ns ± 0%  9.26ns ± 0%     ~     (p=0.333 n=10+10)
RuneCountInStringTenJapaneseChars  39.3ns ± 1%  38.1ns ± 2%   -3.07%  (p=0.000 n=10+10)
EncodeASCIIRune                    1.71ns ± 0%  1.71ns ± 0%     ~     (all equal)
EncodeJapaneseRune                 2.86ns ± 0%  2.83ns ± 0%   -1.12%  (p=0.000 n=10+9)
AppendASCIIRune                    0.26ns ± 0%  0.26ns ± 0%   -2.44%  (p=0.000 n=7+10)
AppendJapaneseRune                 3.09ns ± 0%  3.08ns ± 0%   -0.02%  (p=0.029 n=9+9)
DecodeASCIIRune                    2.06ns ± 0%  0.26ns ± 0%  -87.53%  (p=0.000 n=8+8)
DecodeJapaneseRune                 3.60ns ± 0%  3.73ns ± 0%   +3.52%  (p=0.000 n=10+8)
FullRune/ASCII                     1.03ns ± 0%  1.18ns ± 3%  +14.28%  (p=0.000 n=10+10)
FullRune/Incomplete                2.18ns ± 2%  2.14ns ± 1%   -2.00%  (p=0.000 n=10+10)
FullRune/Japanese                  1.03ns ± 0%  1.18ns ± 3%  +15.03%  (p=0.000 n=9+10)

@dsnet
Copy link
Member Author

dsnet commented Sep 22, 2021

However I've tried modified the ExtraCall budget and it appears only improve ASCII related DecodeRune and regression on slow path.

Isn't that expected? With inlining, we generally want to get the common case to be inlined (i.e., ASCII), and perform the function call for the less common case (i.e., multibyte Unicode).

@gopherbot
Copy link

Change https://golang.org/cl/354769 mentions this issue: cmd/compile: zero cost of byte to rune conversion

@gopherbot
Copy link

Change https://golang.org/cl/354770 mentions this issue: unicode/utf8: make DecodeRune inlined

@gopherbot
Copy link

Change https://golang.org/cl/356769 mentions this issue: unicode/utf8: default ascii fast path for DecodeRune

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
None yet
Development

No branches or pull requests

7 participants