Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: emit optimized assembly for checking the first chars before {strings|bytes}.IndexByte call #25844

Closed
valyala opened this issue Jun 12, 2018 · 5 comments
Labels
FrozenDueToAge Performance WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Milestone

Comments

@valyala
Copy link
Contributor

valyala commented Jun 12, 2018

The issue

Currently strings.IndexByte (and bytes.IndexByte) call overhead makes these functions slower comparing to a plain check for the first few bytes on amd64. The following benchmark shows the issue:

package indexbyte_test

import (
        "strings"
        "testing"
)

func stdFunc(s string, ch byte) int {
        // Just call strings.IndexByte
        return strings.IndexByte(s, ch)
}

func prefixCheckFunc(s string, ch byte) int {
        if len(s) >= 4 {
                // Search for ch in the first few bytes of s before resorting
                // to strings.IndexByte call.
                // The compiler may substitute this by compact optimized assembly in order
                // to reduce overhead when ch isn't found in the first few bytes.
                if s[0] == ch {
                        return 0
                }
                if s[1] == ch {
                        return 1
                }
                if s[2] == ch {
                        return 2
                }
                if s[3] == ch {
                        return 3
                }
                return strings.IndexByte(s, ch)
        }

        // The code for short s completely skips strings.IndexByte call overhead.
        if len(s) == 0 {
                return -1
        }
        if len(s) == 1 {
                if s[0] == ch {
                        return 0
                }
                return -1
        }
        if len(s) == 2 {
                if s[0] == ch {
                        return 0
                }
                if s[1] == ch {
                        return 1
                }
                return -1
        }
        if len(s) == 3 {
                if s[0] == ch {
                        return 0
                }
                if s[1] == ch {
                        return 1
                }
                if s[2] == ch {
                        return 2
                }
                return -1
        }
        return -1
}

func BenchmarkCommaSearch(b *testing.B) {
        for _, s := range []string{
                        "",
                        "1",
                        "12",
                        "123",
                        "1234",
                        "12345",
                        "123456",
                        ",",
                        "1,",
                        "12,",
                        "123,",
                        "1234,",
                        "12345,",
                        "123456,",
                        ", 123, 4567890 qwertyuiop",
                        "1, 23, 4567890 qwertyuiop",
                        "12, 34, 567890 qwertyuiop",
                        "123, 4, 567890 qwertyuiop",
                        "1234, 5, 67890 qwertyuiop",
                        "12345, 6, 7890 qwertyuiop",
                        "123456, 7, 890 qwertyuiop",
                } {
                b.Run(s, func(b *testing.B) {
                        b.Run("std", func(b *testing.B) {
                                for i := 0; i < b.N; i++ {
                                        Sink += stdFunc(s, ',')
                                }
                        })
                        b.Run("prefixCheck", func(b *testing.B) {
                                for i := 0; i < b.N; i++ {
                                        Sink += prefixCheckFunc(s, ',')
                                }
                        })
                })
        }
}

var Sink int

Benchmark results:

$ go test -bench=CommaSearch
goos: linux
goarch: amd64
BenchmarkCommaSearch/#00/std-4     	300000000	         5.39 ns/op
BenchmarkCommaSearch/#00/prefixCheck-4         	500000000	         3.04 ns/op
BenchmarkCommaSearch/1/std-4                   	300000000	         5.72 ns/op
BenchmarkCommaSearch/1/prefixCheck-4           	500000000	         3.24 ns/op
BenchmarkCommaSearch/12/std-4                  	300000000	         5.88 ns/op
BenchmarkCommaSearch/12/prefixCheck-4          	500000000	         3.46 ns/op
BenchmarkCommaSearch/123/std-4                 	300000000	         5.52 ns/op
BenchmarkCommaSearch/123/prefixCheck-4         	500000000	         3.94 ns/op
BenchmarkCommaSearch/1234/std-4                	300000000	         5.92 ns/op
BenchmarkCommaSearch/1234/prefixCheck-4        	200000000	         6.22 ns/op
BenchmarkCommaSearch/12345/std-4               	200000000	         5.68 ns/op
BenchmarkCommaSearch/12345/prefixCheck-4       	200000000	         6.32 ns/op
BenchmarkCommaSearch/123456/std-4              	300000000	         5.97 ns/op
BenchmarkCommaSearch/123456/prefixCheck-4      	200000000	         6.24 ns/op
BenchmarkCommaSearch/,/std-4                   	300000000	         6.17 ns/op
BenchmarkCommaSearch/,/prefixCheck-4           	500000000	         3.30 ns/op
BenchmarkCommaSearch/1,/std-4                  	200000000	         6.49 ns/op
BenchmarkCommaSearch/1,/prefixCheck-4          	500000000	         3.76 ns/op
BenchmarkCommaSearch/12,/std-4                 	300000000	         6.23 ns/op
BenchmarkCommaSearch/12,/prefixCheck-4         	500000000	         3.74 ns/op
BenchmarkCommaSearch/123,/std-4                	200000000	         6.08 ns/op
BenchmarkCommaSearch/123,/prefixCheck-4        	300000000	         4.04 ns/op
BenchmarkCommaSearch/1234,/std-4               	200000000	         6.23 ns/op
BenchmarkCommaSearch/1234,/prefixCheck-4       	200000000	         6.55 ns/op
BenchmarkCommaSearch/12345,/std-4              	300000000	         5.95 ns/op
BenchmarkCommaSearch/12345,/prefixCheck-4      	200000000	         6.55 ns/op
BenchmarkCommaSearch/123456,/std-4             	200000000	         6.47 ns/op
BenchmarkCommaSearch/123456,/prefixCheck-4     	200000000	         7.08 ns/op
BenchmarkCommaSearch/,_123,_4567890_qwertyuiop/std-4     	200000000	         6.28 ns/op
BenchmarkCommaSearch/,_123,_4567890_qwertyuiop/prefixCheck-4         	500000000	         3.35 ns/op
BenchmarkCommaSearch/1,_23,_4567890_qwertyuiop/std-4                 	200000000	         6.68 ns/op
BenchmarkCommaSearch/1,_23,_4567890_qwertyuiop/prefixCheck-4         	500000000	         3.60 ns/op
BenchmarkCommaSearch/12,_34,_567890_qwertyuiop/std-4                 	200000000	         6.87 ns/op
BenchmarkCommaSearch/12,_34,_567890_qwertyuiop/prefixCheck-4         	300000000	         3.73 ns/op
BenchmarkCommaSearch/123,_4,_567890_qwertyuiop/std-4                 	200000000	         6.26 ns/op
BenchmarkCommaSearch/123,_4,_567890_qwertyuiop/prefixCheck-4         	300000000	         4.32 ns/op
BenchmarkCommaSearch/1234,_5,_67890_qwertyuiop/std-4                 	200000000	         6.26 ns/op
BenchmarkCommaSearch/1234,_5,_67890_qwertyuiop/prefixCheck-4         	200000000	         6.96 ns/op
BenchmarkCommaSearch/12345,_6,_7890_qwertyuiop/std-4                 	200000000	         6.25 ns/op
BenchmarkCommaSearch/12345,_6,_7890_qwertyuiop/prefixCheck-4         	200000000	         7.47 ns/op
BenchmarkCommaSearch/123456,_7,_890_qwertyuiop/std-4                 	200000000	         6.76 ns/op
BenchmarkCommaSearch/123456,_7,_890_qwertyuiop/prefixCheck-4         	200000000	         6.94 ns/op

The benchmark shows that the prefixCheckFunc outperforms strings.IndexByte for short strings and when the required char is located in the short string prefix.

Solution

Emit optimized assembly for checking the first chars before {strings|bytes}.IndexByte call.

This may speed up parsers extracting small tokens such as short numbers, ids and strings.

cc'ing @TocarIP and @randall77 , who may be interested in the implementation.

@valyala
Copy link
Contributor Author

valyala commented Jun 12, 2018

cc'ing @quasilyte and @josharian , since it looks like this may help CLs like https://go-review.googlesource.com/c/go/+/115616 .

@bcmills bcmills added this to the Unplanned milestone Jun 12, 2018
@josharian
Copy link
Contributor

We could also do something similar for HasPrefix and HasSuffix, particularly when the sought byte/prefix/suffix is a constant. However, it seems to me these are perfect candidates for mid-stack or partial inlining, rather than hand-coded compiler optimizations.

@valyala
Copy link
Contributor Author

valyala commented Jun 12, 2018

However, it seems to me these are perfect candidates for mid-stack or partial inlining, rather than hand-coded compiler optimizations.

But as I know mid-stack inlining cannot inline yet functions written in assembly

@agnivade
Copy link
Contributor

It is not clear to me what this issue is about. {strings,bytes}.IndexByte already emits handcoded assembly for various platforms.

Is this about checking the initial chars before going with the generic algorithm ? What is the heuristic to have 4 as the number ? Why not 3 or 5 ?

Or is this about inlining the function calls ?

If it is the previous, the difference is very minimal now(smaller than a ns), for the cases where prefix is faster than std. And we already have a special-case assembly when the length of the string is less than 16. So I am not sure if it is worth pursuing to optimize for more corner cases when the attained gain is so less.

$benchstat std.txt prefix.txt 
name                   old time/op  new time/op  delta
CommaSearch/#00-4      4.16ns ± 1%  3.77ns ± 1%   -9.19%  (p=0.008 n=5+5)
CommaSearch/1-4        4.55ns ± 1%  4.16ns ± 1%   -8.48%  (p=0.008 n=5+5)
CommaSearch/12-4       4.53ns ± 0%  4.91ns ± 1%   +8.44%  (p=0.008 n=5+5)
CommaSearch/123-4      4.80ns ± 2%  4.94ns ± 1%   +2.75%  (p=0.008 n=5+5)
CommaSearch/1234-4     4.76ns ± 1%  8.42ns ± 1%  +77.04%  (p=0.008 n=5+5)
CommaSearch/12345-4    4.76ns ± 1%  8.42ns ± 0%  +76.83%  (p=0.008 n=5+5)
CommaSearch/123456-4   4.55ns ± 1%  8.32ns ± 1%  +82.97%  (p=0.008 n=5+5)
CommaSearch/,-4        4.72ns ± 1%  3.99ns ± 2%  -15.47%  (p=0.008 n=5+5)
CommaSearch/1,-4       4.69ns ± 1%  4.34ns ± 1%   -7.43%  (p=0.008 n=5+5)
CommaSearch/12,-4      4.99ns ± 1%  4.65ns ± 1%   -6.85%  (p=0.008 n=5+5)
CommaSearch/123,-4     4.98ns ± 0%  4.66ns ± 1%   -6.58%  (p=0.008 n=5+5)
CommaSearch/1234,-4    4.70ns ± 1%  8.73ns ± 1%  +85.78%  (p=0.008 n=5+5)
CommaSearch/12345,-4   4.73ns ± 2%  8.74ns ± 1%  +84.85%  (p=0.008 n=5+5)
CommaSearch/123456,-4  4.72ns ± 1%  8.72ns ± 3%  +84.99%  (p=0.008 n=5+5)

@valyala ?

@agnivade agnivade added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Mar 26, 2019
@gopherbot
Copy link

Timed out in state WaitingForInfo. Closing.

(I am just a bot, though. Please speak up if this is a mistake or you have the requested information.)

@golang golang locked and limited conversation to collaborators Apr 25, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge Performance WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Projects
None yet
Development

No branches or pull requests

5 participants