-
Notifications
You must be signed in to change notification settings - Fork 18k
cmd/compile, runtime: a design for low-cost defers via inline code #34481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Change https://golang.org/cl/196964 mentions this issue: |
This changes adds a design document for reducing the cost of defers significantly by essentially inlining defer calls on normal exits in most cases. Updates golang/go#34481 Change-Id: If64787e591864ab7843503b8f09c4d6dd6a7a535 Reviewed-on: https://go-review.googlesource.com/c/proposal/+/196964 Reviewed-by: Keith Randall <khr@golang.org>
Change https://golang.org/cl/197017 mentions this issue: |
@danscales, should this be milestoned to 1.14, or is it a longer-term effort? |
Yes, Go1.14 milestone is correct -- that's what we're aiming for. |
This proposal only allows up do 64 defers per function, is that correct? Or imagine test code opening every file in a directory and deferring the file close. This won’t work for large directories and may break backwards compatibility in some cases. Also, what would be the error? |
Yes, there is some limit to the number of defers per function (the size of the deferBits bitmask). In the current implementation, it is 8, but easily changed to 64. But to clarify, if open-coded (inlined) defers cannot be used in a function, we just revert back to the current way of doing defers (create a "defer object" and add to the defer chain, process the appropriate objects on the defer chain at all function exits). Our goal is to reduce the defer overhead in the common cases where it occurs, which is typically smallish functions with a few defers. In unusual cases (functions with defers in loops or lots of defer statements), we just revert to the current method. I will add some text to the design document to make this clearer. |
Is there a way to avoid checking deferBits for defers that are always executed? I don’t mind that the respective bits get set, but there is no compiler optimization that’s able to remove checks for those bits (unless they are all always set and thus the variable is constant). It looks like it should be sufficient to check whether the defer is called in a block that dominates all exits (or something like that) and not emit the bitcheck in that case. |
@rasky dominating all exits is not enough; consider:
If |
The special panic handler can still check the bits if it needs to. My comment is about not generating bit checks in open-coded defers within the function body. |
Yes, the deferBits conditionals (in the inline code) are eliminated completely in the simple cases where all defers are unconditional. If all defers are unconditional, then the stream of operations on deferBits are all unconditional and just involve constants, so the value is known everywhere through constant propagation. The 'opt' phase of the compiler does the constant propagation and then eliminates the if-check on deferBits which is always true. As you point out (good point!), if one defer is always executed (dominates the exit) but others are not, we could potentially remove the deferBits check for that specific defer on any exits after that defer. I will put that on the TODO list (and add that as an idea for the next rev of the design doc). I may not do that optimization for the initial checkin. |
Wasn't @dr2chase asking about bitwise constant-propagation a while back? |
@danscales inlining happens before
|
Yes, inlining happens much earlier in the compiler, so for my implementation, even though the defer calls will be directly expressed (open-coded) at exit, the calls cannot be replaced by their definition (i.e. inlined). |
Add a bunch of extra tests and benchmarks for defer, in preparation for new low-cost (open-coded) implementation of defers (see #34481), - New file defer_test.go that tests a bunch more unusual defer scenarios, including things that might have problems for open-coded defers. - Additions to callers_test.go actually verifying what the stack trace looks like for various panic or panic-recover scenarios. - Additions to crash_test.go testing several more crash scenarios involving recursive panics. - New benchmark in runtime_test.go measuring speed of panic-recover - New CGo benchmark in cgo_test.go calling from Go to C back to Go that shows defer overhead Updates #34481 Change-Id: I423523f3e05fc0229d4277dd00073289a5526188 Reviewed-on: https://go-review.googlesource.com/c/go/+/197017 Run-TryBot: Dan Scales <danscales@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Austin Clements <austin@google.com>
Change https://golang.org/cl/190098 mentions this issue: |
I'm not too familiar with the internals of the compiler, but would it be possible to mix unconditional, conditional, and looped defers in the same function with this design? For example, insert hard-coded calls to any unconditional defers, regardless of others, at the exits, and insert the bit checks for conditional ones, and insert code to deal with looped defers? In other words, func Example(t int) {
defer Unconditional()
if t > 3 {
defer Conditional()
}
defer AlsoUnconditional()
for i := 0; i < t; i++ {
defer Looped(i)
}
} would essentially have the equivalent of all of the following inserted at the returns: for i := len(deferredLoopedCalls) - 1; i >= 0; i-- {
deferredLoopedCalls[i]()
}
AlsoUnconditional()
if conditionalBitSet {
Conditional()
}
Unconditional() |
We already handle conditional and non-conditional defers with this proposal. You have to set the bitmask in all cases, since you can have panics (especially run-time panics) at many points in the function, and so it is not easy to know otherwise (without a detailed pc map) whether you hit a particular unconditional defer statement before the panic happened. For actual exit code, you can eliminate some bitmask checks if a particular defer statement dominates the exit that you are generating for. That optimization is not in this change, but seems reasonable to do soon as an optimization - definitely on my TODO list. It would seem to be fairly complex to combine the low-cost defers with the looped defers (which would have to be handled by the current defer record approach) in the same function. I think a bunch more information would have to be save and new code generated to make sure that the looped defers and the inlined defers were executed in right order. So, I'm not planning to do that (especially since those cases are very rarely and not as likely to be performance-sensitive). |
…ode and extra funcdata Generate inline code at defer time to save the args of defer calls to unique (autotmp) stack slots, and generate inline code at exit time to check which defer calls were made and make the associated function/method/interface calls. We remember that a particular defer statement was reached by storing in the deferBits variable (always stored on the stack). At exit time, we check the bits of the deferBits variable to determine which defer function calls to make (in reverse order). These low-cost defers are only used for functions where no defers appear in loops. In addition, we don't do these low-cost defers if there are too many defer statements or too many exits in a function (to limit code increase). When a function uses open-coded defers, we produce extra FUNCDATA_OpenCodedDeferInfo information that specifies the number of defers, and for each defer, the stack slots where the closure and associated args have been stored. The funcdata also includes the location of the deferBits variable. Therefore, for panics, we can use this funcdata to determine exactly which defers are active, and call the appropriate functions/methods/closures with the correct arguments for each active defer. In order to unwind the stack correctly after a recover(), we need to add an extra code segment to functions with open-coded defers that simply calls deferreturn() and returns. This segment is not reachable by the normal function, but is returned to by the runtime during recovery. We set the liveness information of this deferreturn() to be the same as the liveness at the first function call during the last defer exit code (so all return values and all stack slots needed by the defer calls will be live). I needed to increase the stackguard constant from 880 to 896, because of a small amount of new code in deferreturn(). The -N flag disables open-coded defers. '-d defer' prints out the kind of defer being used at each defer statement (heap-allocated, stack-allocated, or open-coded). Cost of defer statement [ go test -run NONE -bench BenchmarkDefer$ runtime ] With normal (stack-allocated) defers only: 35.4 ns/op With open-coded defers: 5.6 ns/op Cost of function call alone (remove defer keyword): 4.4 ns/op Text size increase (including funcdata) for go cmd without/with open-coded defers: 0.09% The average size increase (including funcdata) for only the functions that use open-coded defers is 1.1%. The cost of a panic followed by a recover got noticeably slower, since panic processing now requires a scan of the stack for open-coded defer frames. This scan is required, even if no frames are using open-coded defers: Cost of panic and recover [ go test -run NONE -bench BenchmarkPanicRecover runtime ] Without open-coded defers: 62.0 ns/op With open-coded defers: 255 ns/op A CGO Go-to-C-to-Go benchmark got noticeably faster because of open-coded defers: CGO Go-to-C-to-Go benchmark [cd misc/cgo/test; go test -run NONE -bench BenchmarkCGoCallback ] Without open-coded defers: 443 ns/op With open-coded defers: 347 ns/op Updates #14939 (defer performance) Updates #34481 (design doc) Change-Id: I51a389860b9676cfa1b84722f5fb84d3c4ee9e28 Reviewed-on: https://go-review.googlesource.com/c/go/+/190098 Reviewed-by: Austin Clements <austin@google.com>
Change https://golang.org/cl/202340 mentions this issue: |
…ode and extra funcdata Generate inline code at defer time to save the args of defer calls to unique (autotmp) stack slots, and generate inline code at exit time to check which defer calls were made and make the associated function/method/interface calls. We remember that a particular defer statement was reached by storing in the deferBits variable (always stored on the stack). At exit time, we check the bits of the deferBits variable to determine which defer function calls to make (in reverse order). These low-cost defers are only used for functions where no defers appear in loops. In addition, we don't do these low-cost defers if there are too many defer statements or too many exits in a function (to limit code increase). When a function uses open-coded defers, we produce extra FUNCDATA_OpenCodedDeferInfo information that specifies the number of defers, and for each defer, the stack slots where the closure and associated args have been stored. The funcdata also includes the location of the deferBits variable. Therefore, for panics, we can use this funcdata to determine exactly which defers are active, and call the appropriate functions/methods/closures with the correct arguments for each active defer. In order to unwind the stack correctly after a recover(), we need to add an extra code segment to functions with open-coded defers that simply calls deferreturn() and returns. This segment is not reachable by the normal function, but is returned to by the runtime during recovery. We set the liveness information of this deferreturn() to be the same as the liveness at the first function call during the last defer exit code (so all return values and all stack slots needed by the defer calls will be live). I needed to increase the stackguard constant from 880 to 896, because of a small amount of new code in deferreturn(). The -N flag disables open-coded defers. '-d defer' prints out the kind of defer being used at each defer statement (heap-allocated, stack-allocated, or open-coded). Cost of defer statement [ go test -run NONE -bench BenchmarkDefer$ runtime ] With normal (stack-allocated) defers only: 35.4 ns/op With open-coded defers: 5.6 ns/op Cost of function call alone (remove defer keyword): 4.4 ns/op Text size increase (including funcdata) for go binary without/with open-coded defers: 0.09% The average size increase (including funcdata) for only the functions that use open-coded defers is 1.1%. The cost of a panic followed by a recover got noticeably slower, since panic processing now requires a scan of the stack for open-coded defer frames. This scan is required, even if no frames are using open-coded defers: Cost of panic and recover [ go test -run NONE -bench BenchmarkPanicRecover runtime ] Without open-coded defers: 62.0 ns/op With open-coded defers: 255 ns/op A CGO Go-to-C-to-Go benchmark got noticeably faster because of open-coded defers: CGO Go-to-C-to-Go benchmark [cd misc/cgo/test; go test -run NONE -bench BenchmarkCGoCallback ] Without open-coded defers: 443 ns/op With open-coded defers: 347 ns/op Updates #14939 (defer performance) Updates #34481 (design doc) Change-Id: I63b1a60d1ebf28126f55ee9fd7ecffe9cb23d1ff Reviewed-on: https://go-review.googlesource.com/c/go/+/202340 Reviewed-by: Austin Clements <austin@google.com>
Fixed by be64a19 |
Could a defer propagate an error value through the function's normal return values? Defers are most often used for closing I/O handles and the like, which often have important error information. Workarounds:
|
@mcandre I'm sorry, I don't understand what you are asking. In any case, this does not seem like the right place to ask it. This issue is about a specific design of the implementation of defer. It's not about how defer is defined to work in the language. You may want to use a forum; see https://golang.org/wiki/Questions. Thanks. |
We are working on an implementation of defer statements that makes most
defer
s no more expensive than open-coding the deferred call, hence eliminating the incentives to avoid using this language to its fullest extent. We wanted to document the design in the proposal/design repo.See issue #14939 for discussion of
defer
performance issues.The design document is posted here. The current, mostly complete implementation is here
Comments are welcome on either the design or the implementation.
The text was updated successfully, but these errors were encountered: