Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: teach prove.go that {strings|bytes}.Index* return value in the range [-1 .. len(s)) #25862

Open
valyala opened this issue Jun 13, 2018 · 4 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@valyala
Copy link
Contributor

valyala commented Jun 13, 2018

The issue

Currently prove.go doesn't know that functions like strings.Index* always return value in the range [-1 .. len(s)) where s - input string / byte slice. So it emits unnecessary bounds checks in the following code:

func nextSpace(s string) string {
        n := strings.IndexByte(s, ' ')
        if n >= 0 {
                // unnecessary bounds check is generated here
                return s[n:]
        }
        return ""
}

go tip compiler generates the following assembly code for nextSpace:

TEXT nextSpace(SB)
func nextSpace(s string) string {
  0x50f040              64488b0c25f8ffffff      MOVQ FS:0xfffffff8, CX
  0x50f049              483b6110                CMPQ 0x10(CX), SP
  0x50f04d              0f8684000000            JBE 0x50f0d7
  0x50f053              4883ec28                SUBQ $0x28, SP
  0x50f057              48896c2420              MOVQ BP, 0x20(SP)
  0x50f05c              488d6c2420              LEAQ 0x20(SP), BP
        n := strings.IndexByte(s, ' ')
  0x50f061              488b442430              MOVQ 0x30(SP), AX
  0x50f066              48890424                MOVQ AX, 0(SP)
  0x50f06a              488b4c2438              MOVQ 0x38(SP), CX
  0x50f06f              48894c2408              MOVQ CX, 0x8(SP)
  0x50f074              c644241020              MOVB $0x20, 0x10(SP)
  0x50f079              e8d235efff              CALL strings.IndexByte(SB)
  0x50f07e              488b442418              MOVQ 0x18(SP), AX
        if n < 0 {
  0x50f083              4885c0                  TESTQ AX, AX
  0x50f086              7c36                    JL 0x50f0be
        return s[n:]
  0x50f088              488b4c2438              MOVQ 0x38(SP), CX

        ; The compiler may prove 0 <= n < len(s) on this line,
        ; so the following 2 lines plus runtime.panicslice on 0x50f0d0
        ; may be eliminated.
  0x50f08d              4839c8                  CMPQ CX, AX
  0x50f090              773e                    JA 0x50f0d0

  0x50f092              4829c1                  SUBQ AX, CX

        ; Additionally, the following 4 lines assume &s[0] + (len(s) - n) may exceed
        ; max word value. This is impossible, since &s[0] + len(s) doesn't exceed
        ; max word value.
        ; The compiler has enough info to disprove this in order to eliminate these lines.
  0x50f095              4889ca                  MOVQ CX, DX
  0x50f098              48f7d9                  NEGQ CX
  0x50f09b              48c1f93f                SARQ $0x3f, CX
  0x50f09f              4821c8                  ANDQ CX, AX

  0x50f0a2              488b4c2430              MOVQ 0x30(SP), CX
  0x50f0a7              4801c8                  ADDQ CX, AX
  0x50f0aa              4889442440              MOVQ AX, 0x40(SP)
  0x50f0af              4889542448              MOVQ DX, 0x48(SP)
  0x50f0b4              488b6c2420              MOVQ 0x20(SP), BP
  0x50f0b9              4883c428                ADDQ $0x28, SP
  0x50f0bd              c3                      RET
                return ""
  0x50f0be              0f57c0                  XORPS X0, X0
  0x50f0c1              0f11442440              MOVUPS X0, 0x40(SP)
  0x50f0c6              488b6c2420              MOVQ 0x20(SP), BP
  0x50f0cb              4883c428                ADDQ $0x28, SP
  0x50f0cf              c3                      RET
        return s[n:]
  0x50f0d0              e82b9df1ff              CALL runtime.panicslice(SB)
  0x50f0d5              0f0b                    UD2
func nextSpace(s string) string {
  0x50f0d7              e8347df4ff              CALL runtime.morestack_noctxt(SB)
  0x50f0dc              e95fffffff              JMP nextSpace(SB)

The solution

  1. Teach the compiler that {strings|bytes}.Index* return value in the range [-1 .. len(s))
  2. Teach the compiler that &s[0] + len(s) - n doesn't exceed max word value if 0 <= n < len(s)

This should improve performance of various parsers for text-based messages (json, xml, html, http, etc.)

@ianlancetaylor ianlancetaylor added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jun 13, 2018
@ianlancetaylor ianlancetaylor added this to the Go1.12 milestone Jun 13, 2018
@valyala
Copy link
Contributor Author

valyala commented Jun 13, 2018

cc'ing @aclements and @rasky , since they recently worked on prove.go improvements.

@josharian
Copy link
Contributor

I don't know how much we want to teach the compiler about specific std functions. It'd be nice if there were some way to infer this instead, although I don't see how. Note that if someone was playing around with these functions in the standard library and added -2 as a special signal value, any calling code might be mis-compiled.

@valyala
Copy link
Contributor Author

valyala commented Jun 15, 2018

I don't know how much we want to teach the compiler about specific std functions. It'd be nice if there were some way to infer this instead, although I don't see how.

Probably, we could annotate return values for functions written in assembly. Such annotations may contain valid ranges for the returned function values. These annotations must be restricted only to assembly-written functions, since in theory the compiler may infer return value ranges for ordinary Go functions using the code from prove pass.

Additionally, I think the underlying functions for bytes / strings located in internal/bytealg/ are special:

  • they are frequently used in hot paths;
  • so they are mostly written in assembly.

So it would be OK adding nasty optimizations like this one for these functions.

Note that if someone was playing around with these functions in the standard library and added -2 as a special signal value, any calling code might be mis-compiled.

This will violate strings.IndexByte contract, which explicitly states it returns values in the range [-1 ... len(s) ). Additionally, if assembly-written functions will contain annotations for return value ranges, it will be harder to shoot the foot when playing around with such functions.

The compiler may have a special validation mode, which, when enabled, would add checks for value ranges returned from annotated assembly functions.

@randall77
Copy link
Contributor

Punting to 1.13, too late for anything major in 1.12.

@randall77 randall77 modified the milestones: Go1.12, Go1.13 Dec 12, 2018
@andybons andybons modified the milestones: Go1.13, Go1.14 Jul 8, 2019
@rsc rsc modified the milestones: Go1.14, Backlog Oct 9, 2019
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
Status: Triage Backlog
Development

No branches or pull requests

8 participants