Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: combine int tests when comparing against a constant string #37274

Open
josharian opened this issue Feb 18, 2020 · 6 comments
Open
Labels
NeedsFix The path to resolution is known, but the work has not been done. Performance
Milestone

Comments

@josharian
Copy link
Contributor

To compare against "abc", we currently emit code to test whether uint16(first two bytes)==uint16("ab") and then whether (third byte)=="c".

I suspect that it'd be more efficient to load the three bytes into the first three bytes of a uint32 (two loads combined with |) and then compare that to uint32("abc\0").

I'm not sure whether we could do this with comparison-combining rewrite rules (note that the loads have side-effects in general, but we're sure in this case they won't fault) or whether it'd be better to fix this in walk.go, where this optimization is generated.

Low priority.

@josharian josharian modified the milestones: Backlog, Unplanned Feb 18, 2020
@josharian
Copy link
Contributor Author

cc @mundaym who has been thinking about SSA rules to combine branches

@mundaym
Copy link
Member

mundaym commented Feb 18, 2020

Tricky one. This transformation could slow down/bloat some code (generally when uint16(first two bytes)==uint16("ab") is unlikely), so while I think the backend could do it I'm not sure it should unless it has likeliness information.

Note that for longer strings we could overlap the loads given they are unaligned anyway (not sure if we do this already - I don't think we do from what I see in walk):

    x == "abcdefg" -> x[0:4] == "abcd" && x[3:7] == "defg"

I think that transformation could be applied greedily since it doesn't add any extra instructions to the 'not taken' path. It could also be applied in any situation where we want to check the last 3 characters in a string and we also know the length of the string is greater than 3 (i.e. we can access index -1 in the string and be in bounds). I suspect this crops up occasionally in string switch statement lowering. The overlapping characters could be masked out even when known to avoid double-load related issues in racy code. That said, this would all be quite hard in SSA, but maybe string comparison lowering could do it.

@josharian
Copy link
Contributor Author

josharian commented Feb 18, 2020

Overlapping loads is a good idea, thanks! Agreed we should do it during walk.

@toothrot toothrot added the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Feb 18, 2020
@josharian josharian added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. labels Feb 18, 2020
@josharian
Copy link
Contributor Author

I just remembered (vaguely) that overlapping loads caused performance problems a few years ago because they interfered with load-store forwarding.

@mundaym
Copy link
Member

mundaym commented Feb 22, 2020

I just remembered (vaguely) that overlapping loads caused performance problems a few years ago because they interfered with load-store forwarding.

Yeah, this might be a problem. Though since Go programs don't tend to copy strings by value that much we might not see load-hit-store hazards very often. My guess would be that strings are rarely still in the store buffer when we are comparing them, but that is just a guess.

@josharian
Copy link
Contributor Author

This all also applies to comparison of small arrays of ints; see walkcompare.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsFix The path to resolution is known, but the work has not been done. Performance
Projects
None yet
Development

No branches or pull requests

3 participants