-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: generic functions are significantly slower than identical non-generic functions in some cases #50182
Comments
Not sure if this is expected at this stage or not, but it could be a consequence of the current implementation. |
If you use
I still do not know what the |
This is because we convert |
|
@randall77 , why does that lead to 14 allocations per iteration, though? I would expect that to result in 1 additional allocation per iteration. Instead, we seem to be allocating for each element of the slice. |
It's one allocation per iteration of the loop in |
( |
@komuw Further reading: |
Is there a reason to add this to Go1.19 rather than 1.18 itself considering this is a beta ? |
I suspect any fix for this is going to be large + difficult. We don't want to put changes like that in at this point, especially if it is just for performance. |
Why isn't data.Less() inlined into the loop and then, particularly for integers, it becomes a single comparison instruction? You know the precise type at the call site, amirite? |
No, we don't. We know the type at the callsite implements |
@randall77 It would be useful to represent explicitly the knowledge that all dictionaries are read-only (static). Maybe we should represent access to a dictionary entry as a single operation that makes it to the SSA passes, rather than breaking it down immediately into the indexing and loading operations, as we currently do right during instantiation (stenciling). That way, we could ensure that the dictionary access operation takes no memory input, and therefore can always be CSE'ed when it has identical inputs. In this case (and probably most cases), it seems like we also need to make sure the CONVIDATA calls can be CSE'ed, but they are transformed during the order() part of Walk() into run-time calls. We might want the CONVIDATA call to make it SSA (so again, it can easily be CSEed for the same input value), but that seems harder, since all the logic for understanding/translating CONVIDATA is in walk.go |
IsSortedGeneric[sort.IntSlice](data) @randall77 But you do know the exact type. It's passed right here in the brackets. The compiler should be able to deduce that it can short-circuit the type here and that only sort.IntSlice's Less() implementation is being called. This is a real issue. What's the point of generics if the compiler can't short-circuit the interface machinery when it knows the precise types? You lose the performance benefit via inlining if you can't do that. |
We don't know it when compiling IsSortedGeneric. We don't compile |
@randall77 So, for example, sorting (think sort.Stable for example, but with a generic implementation available instead of the interface) basically becomes functionally identical to the existing sort.Stable that takes an interface upon compilation? I've run into performance problems with sort.Sort and sort.Stable--copying the implementation to a separate file and s/Interface/MyType/g can be many times faster--and I was hoping that generics would resolve this? |
@g-talbot The goal of generics is not to provide faster execution time. The goal is to make it easier to write certain kinds of code with less repetitive boilerplate and more compile-time type checking. Rewriting code to use generics may sometimes make it faster. It may sometimes make it slower (as in this example). While we will continue to improve the compiler balancing the needs of fast builds and fast runtimes, it will never be the case that generic code will always be faster than non-generic code. That is not a goal. |
Change https://golang.org/cl/378178 mentions this issue: |
Just wanted to reference #48849 which to me is a very similar issue. |
Change https://go.dev/cl/385274 mentions this issue: |
Change https://go.dev/cl/385254 mentions this issue: |
Just wanted to share some benchmarks showing the improvement for the example at the top of this issue.
|
…ctionary Currently we do quite a dance for method expressions on generic types. We write a new closure, and in that closure convert the receiver to an interface with the required method, then call the target using an interface call. Instead in this CL, we just allocate a (captureless) closure in the dictionary which implements that method expression. This CL makes method expressions faster and simpler. But the real win is some followon CLs, where we can use the same closure to implement bound method calls using the same closure, instead of converting to interface and having wrappers convert back. Much faster and simpler. Still thinking about how to do method values. The receiver still needs to be captured, so there must be some closure involved, I think. Update #50182 Change-Id: I1fbd57e7105663f8b049955b8f4111649a5f4aa8 Reviewed-on: https://go-review.googlesource.com/c/go/+/385254 Trust: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
… calls When we have x.M(args) where x is a value of type parameter type, we currently cast x to the bound of that type parameter (which is an interface) and then invoke the method on that interface. That's pretty inefficient because: 1) We need to convert x to an interface, which often requires allocation. With CL 378178 it is at least stack allocation, but allocation nontheless. 2) We need to call through wrapper functions to unpack the interface into the right argument locations for the callee. Instead, let's just call the target directly. The previous CL to this one added method expression closures to the dictionary, which is a simple captureless closure that implements T.M for type parameter T and method M. So to implement x.M(args) for x of type T, we use methodexpr(T,M)(x, args). We just need to move x from the receiver slot to the first argument, and use the dictionary entry to implement the polymorphism. This works because we stencil by shape, so we know how to marshal x for the call even though we don't know its exact type. We should be able to revert CL 378178 after this one, as that optimization will no longer be necssary as we're not converting values to interfaces to implement this language construct anymore. Update #50182 Change-Id: I813de4510e41ab63626e58bd1167f9ae93016202 Reviewed-on: https://go-review.googlesource.com/c/go/+/385274 Trust: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
… allocating Let T be a type parameter, and say we instantiate it with S, a type that isn't pointer-like (e.g. a pair of ints, or as in 50182, a slice). Then to call a method m on a variable of type T, the compiler does essentially: var v T = ... i := (interface{m()})(v) i.m() The conversion at that second line allocates, as we need to make the data word for an interface. And in the general case, that interface may live an arbitrarily long time. But in this case, we know it doesn't. The data word of i has type *S. When we call i.m, we can't call S.m directly. It is expecting an S, not a *S. We call through a wrapper defined on *S, which looks like: func (p *S) m() { var s S = *p s.m() } The value passed in for p is exactly the data word mentioned above. It never escapes anywhere - the wrapper copies a type S variable out of *p and p is dead after that. That means that in the situation where we build an interface for the explicit purpose of calling a method on it, and use that built interface nowhere else, the allocation of the data word for that interface is known to die before the call returns and thus can be stack allocated. One tricky case is that although the allocation of the backing store of the interface conversion doesn't escape, pointers we store *inside* that allocation might escape (in fact they definitely will, unless we can devirtualize the receiver). Fixes golang#50182 Change-Id: I40e893955c2e6871c54ccecf1b9f0cae17871b0d Reviewed-on: https://go-review.googlesource.com/c/go/+/378178 Trust: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Trust: Dan Scales <danscales@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Dan Scales <danscales@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
It reproduces on the 1.18 Beta 1 release.
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
sort_test.go:
What did you expect to see?
IsSortedGeneric to be faster than, or at least very similar in runtime to, the otherwise identical sort.IsSorted.
What did you see instead?
This appears to be a problem specifically when methods are part of the type constraint - and you may note above that the number of allocs (14) is exactly equal to the number of method calls (1 .Len, 13 .Less calls). Compiler output points to runtime.convTslice triggered at each method call as the cause of the allocs.
A generic IsSorted implementation that restricts to a slice of constraints.Ordered does not have this slowdown. Experimentation also finds that supplying a pointer type instead of a value does not have this issue:
The text was updated successfully, but these errors were encountered: