New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: add bounds information from non-control ops to prove pass #25086
Comments
I would be surprised if it matters much right now. If you want to try something, I'd try adding it at the end of To check if the code is really helping, check if |
Should do this for the |
I'll mail a CL that adds bounds derived from:
The result in terms of proved ops can be seen here: It's easy to add |
If it is easy, I think it is worth adding the full math/bits intrisincs set, on pain of having to rediscover it later, even if usage is rare; when math/bits is in use people frequently care lots about performance. I’ll note in passing that we can convert signed ext to zero ext if we can prove non-negative. Not sure if that is very useful in this context, but I can imagine on amd64 that it could eliminate some 32-to-64 extensions by folding them into MOVLs. |
cc @zdjones A couple of days ago, I was saddened to see that To reiterate for Zach from my previous comment: I think it is worth adding the full math/bits intrinsics set. (Plus whatever else you are inspired by from the rest of this issue.) |
I've been tossing around this idea in my head a bit. My intuition is that there is some gains to be had by collecting range information from any number of non-control ops. But I'm also wary of the cost in terms of compute and maintenance complexity. I suppose it'd be a matter of determining which ops justified the cost. My understanding is that prove, at a high level, currently does three passes of it's own:
Adding non control-ops should be a matter of determining whether the range information is static or not. If static, add it to the data range pass, if not, to the control range pass (which I suppose I'd then have to rename). Dynamic op ranges are also more complicated by SSA values being unordered within their block. As to the costs of adding more ops, the partially ordered set that holds all the transitive inequality information (ie v1 <= v2), is effectively O(n) where n is the number of values in the poset. There are 2 posets, one each for the signed and unsigned domains. Each value we add to the poset makes querying later facts slower. Its only linear, so it may be negligible, but I get the feeling that everyone's quite sensitive to changes in compiler speed. This, at least, means we want to be sure that the facts being added are actually helping prove make better decisions. @rasky, please let me know if I'm wrong about this. The small data range pass currently uses a switch to check for the three ops it uses. I assume we'd need to change this lookup in a map[Op]struct, like the one used by the control range pass here. I don't know memory allocation intimately, but I think the map would mean another heap allocation? Or maybe not since it'd never be written to at runtime? Hope this is clear, let me know what you think. |
A few thoughts: (1) math/bits ops are relatively rare, and most likely to be used in performance-sensitive places, so I think they probably offer good bang-for-buck. (2) It's relatively easy to add them (I hacked in mine quickly), so we can simply measure the toolspeed impact to make a more informed decision. Have you found compilecmp? Btw, you are right that we zealously guard compiler performance. (3) Another option is to do a non-poset manual look-up. For example, we have a static isNonNegative that we can use to make local inferences on the fly. We could add an (4) A switch is likely to outperform a map until the number of ops gets pretty high. But: benchmark. :) (5) Somewhat unrelated: I noticed while I was there that the code working with StringLen, SliceLen, and SliceCap goes to some pains to detect and reuse a |
(2) Agreed, not difficult to add these. toolspeed? Saw your mention of compilecmp on the Compiler Developer Guide issue from Daniel. I've been running and comparing compilebench manually. I'd love to hear how you do your compiler comparisons and benchmarking generally, as I'm doing it mostly ad-hoc still. (3) Good thought. If the inferences are only required locally, there is no need to reach for the poset. I'm familiar with the two isNonNegative's, I just mailed a CL to make the dynamic one use the poset. I'll play around with the isBelow idea a bit. I know of at least one related open issue, BCE for (4) Didn't know this. Why is that? (5) Aaah, ConstInt. Looks like an easy win. Maybe I'll just add that into my isNonNegative CL because it touches that code and relies on the zero constant. Do you think it'd be out of place, or require a standalone issue for any reason? Doesn't look like it should change behaviour at all. |
(2) compilecmp: https://github.com/josharian/compilecmp. Sorry it doesn't have docs. One day, one day. It's pretty easy to use, I believe running with (3) There are probably cases where it would be used non-locally, too. I'd say if you're up for it to try it both ways and see how much the extra horsepower buys you and at what cost. (4) switch does binary search, and each search is usually just a plain compare-and-jump, so it is very cheap. It takes a whole lot of mispredicted jumps to add up to the full map machinery (hashing, allocation, etc.). (5) Either way. I personally prefer to separate refactoring-only changes from substantive changes. It makes reviewing easier and bisection/rollback more fine-grained. But I know other folks tend to like fewer CLs. So do what you think is best. |
Change https://golang.org/cl/190657 mentions this issue: |
Probably low priority; I'm not sure.
Some ops have strongly bounded output. For example, Ctz64 is always in [0,64], and Ctz64NonZero is in [0, 63]. Rsh by a constant also provides bounds.
The prove pass seems unaware of this. Does it / should it care? Where/how should it be added?
@rasky
The text was updated successfully, but these errors were encountered: