cmd/compile, runtime: a design for low-cost defers via inline code #34481

danscales · 2019-09-23T23:09:00Z

We are working on an implementation of defer statements that makes most defers no more expensive than open-coding the deferred call, hence eliminating the incentives to avoid using this language to its fullest extent. We wanted to document the design in the proposal/design repo.
See issue #14939 for discussion of defer performance issues.

The design document is posted here. The current, mostly complete implementation is here

Comments are welcome on either the design or the implementation.

The text was updated successfully, but these errors were encountered:

gopherbot · 2019-09-23T23:33:05Z

Change https://golang.org/cl/196964 mentions this issue: design: add 34481-opencoded-defers.md

This changes adds a design document for reducing the cost of defers significantly by essentially inlining defer calls on normal exits in most cases. Updates golang/go#34481 Change-Id: If64787e591864ab7843503b8f09c4d6dd6a7a535 Reviewed-on: https://go-review.googlesource.com/c/proposal/+/196964 Reviewed-by: Keith Randall <khr@golang.org>

gopherbot · 2019-09-24T01:16:06Z

Change https://golang.org/cl/197017 mentions this issue: misc, runtime, test: extra tests and benchmarks for defer

bcmills · 2019-09-24T12:49:54Z

@danscales, should this be milestoned to 1.14, or is it a longer-term effort?

danscales · 2019-09-24T14:33:04Z

Yes, Go1.14 milestone is correct -- that's what we're aiming for.

tsatke · 2019-09-24T14:46:47Z

This proposal only allows up do 64 defers per function, is that correct?
I imagine this is enough for most hand written code, but I can imagine some frameworks generating a lot more than that...

Or imagine test code opening every file in a directory and deferring the file close. This won’t work for large directories and may break backwards compatibility in some cases. Also, what would be the error?

danscales · 2019-09-24T17:42:10Z

This proposal only allows up do 64 defers per function, is that correct?

Yes, there is some limit to the number of defers per function (the size of the deferBits bitmask). In the current implementation, it is 8, but easily changed to 64.

But to clarify, if open-coded (inlined) defers cannot be used in a function, we just revert back to the current way of doing defers (create a "defer object" and add to the defer chain, process the appropriate objects on the defer chain at all function exits).

Our goal is to reduce the defer overhead in the common cases where it occurs, which is typically smallish functions with a few defers. In unusual cases (functions with defers in loops or lots of defer statements), we just revert to the current method.

I will add some text to the design document to make this clearer.

rasky · 2019-09-24T18:41:55Z

Is there a way to avoid checking deferBits for defers that are always executed? I don’t mind that the respective bits get set, but there is no compiler optimization that’s able to remove checks for those bits (unless they are all always set and thus the variable is constant). It looks like it should be sufficient to check whether the defer is called in a block that dominates all exits (or something like that) and not emit the bitcheck in that case.

CAFxX · 2019-09-24T20:10:17Z

@rasky dominating all exits is not enough; consider:

func foo() {
  bar()
  defer baz()
  // ...
}

If bar() panics, baz() should not be executed (even though the defer dominates all exits).

rasky · 2019-09-24T20:35:29Z

The special panic handler can still check the bits if it needs to. My comment is about not generating bit checks in open-coded defers within the function body.

danscales · 2019-09-24T21:15:19Z

Is there a way to avoid checking deferBits for defers that are always executed? I don’t mind that the respective bits get set, but there is no compiler optimization that’s able to remove checks for those bits (unless they are all always set and thus the variable is constant). It looks like it should be sufficient to check whether the defer is called in a block that dominates all exits (or something like that) and not emit the bitcheck in that case.

Yes, the deferBits conditionals (in the inline code) are eliminated completely in the simple cases where all defers are unconditional. If all defers are unconditional, then the stream of operations on deferBits are all unconditional and just involve constants, so the value is known everywhere through constant propagation. The 'opt' phase of the compiler does the constant propagation and then eliminates the if-check on deferBits which is always true.

As you point out (good point!), if one defer is always executed (dominates the exit) but others are not, we could potentially remove the deferBits check for that specific defer on any exits after that defer. I will put that on the TODO list (and add that as an idea for the next rev of the design doc). I may not do that optimization for the initial checkin.

bcmills · 2019-09-24T21:24:22Z

Wasn't @dr2chase asking about bitwise constant-propagation a while back?

CAFxX · 2019-09-25T11:03:24Z

@danscales inlining happens before defers are lowered in your implementation, so in the following example Unlock() is not inlined in foo(), correct? 😭

func foo(m *sync.Mutex) {
  m.Lock()
  defer m.Unlock()
  // ...
}

danscales · 2019-09-25T17:23:33Z

Yes, inlining happens much earlier in the compiler, so for my implementation, even though the defer calls will be directly expressed (open-coded) at exit, the calls cannot be replaced by their definition (i.e. inlined).

Add a bunch of extra tests and benchmarks for defer, in preparation for new low-cost (open-coded) implementation of defers (see #34481), - New file defer_test.go that tests a bunch more unusual defer scenarios, including things that might have problems for open-coded defers. - Additions to callers_test.go actually verifying what the stack trace looks like for various panic or panic-recover scenarios. - Additions to crash_test.go testing several more crash scenarios involving recursive panics. - New benchmark in runtime_test.go measuring speed of panic-recover - New CGo benchmark in cgo_test.go calling from Go to C back to Go that shows defer overhead Updates #34481 Change-Id: I423523f3e05fc0229d4277dd00073289a5526188 Reviewed-on: https://go-review.googlesource.com/c/go/+/197017 Run-TryBot: Dan Scales <danscales@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Austin Clements <austin@google.com>

gopherbot · 2019-09-26T17:14:19Z

Change https://golang.org/cl/190098 mentions this issue: cmd/compile, cmd/link, runtime: make defers low-cost through inline code and extra funcdata

DeedleFake · 2019-10-10T17:46:48Z

I'm not too familiar with the internals of the compiler, but would it be possible to mix unconditional, conditional, and looped defers in the same function with this design? For example, insert hard-coded calls to any unconditional defers, regardless of others, at the exits, and insert the bit checks for conditional ones, and insert code to deal with looped defers?

In other words,

func Example(t int) {
  defer Unconditional()
  if t > 3 {
    defer Conditional()
  }
  defer AlsoUnconditional()
  for i := 0; i < t; i++ {
    defer Looped(i)
  }
}

would essentially have the equivalent of all of the following inserted at the returns:

for i := len(deferredLoopedCalls) - 1; i >= 0; i-- {
  deferredLoopedCalls[i]()
}
AlsoUnconditional()
if conditionalBitSet {
  Conditional()
}
Unconditional()

danscales · 2019-10-11T01:00:29Z

We already handle conditional and non-conditional defers with this proposal. You have to set the bitmask in all cases, since you can have panics (especially run-time panics) at many points in the function, and so it is not easy to know otherwise (without a detailed pc map) whether you hit a particular unconditional defer statement before the panic happened.

For actual exit code, you can eliminate some bitmask checks if a particular defer statement dominates the exit that you are generating for. That optimization is not in this change, but seems reasonable to do soon as an optimization - definitely on my TODO list.

It would seem to be fairly complex to combine the low-cost defers with the looped defers (which would have to be handled by the current defer record approach) in the same function. I think a bunch more information would have to be save and new code generated to make sure that the looped defers and the inlined defers were executed in right order. So, I'm not planning to do that (especially since those cases are very rarely and not as likely to be performance-sensitive).

…ode and extra funcdata Generate inline code at defer time to save the args of defer calls to unique (autotmp) stack slots, and generate inline code at exit time to check which defer calls were made and make the associated function/method/interface calls. We remember that a particular defer statement was reached by storing in the deferBits variable (always stored on the stack). At exit time, we check the bits of the deferBits variable to determine which defer function calls to make (in reverse order). These low-cost defers are only used for functions where no defers appear in loops. In addition, we don't do these low-cost defers if there are too many defer statements or too many exits in a function (to limit code increase). When a function uses open-coded defers, we produce extra FUNCDATA_OpenCodedDeferInfo information that specifies the number of defers, and for each defer, the stack slots where the closure and associated args have been stored. The funcdata also includes the location of the deferBits variable. Therefore, for panics, we can use this funcdata to determine exactly which defers are active, and call the appropriate functions/methods/closures with the correct arguments for each active defer. In order to unwind the stack correctly after a recover(), we need to add an extra code segment to functions with open-coded defers that simply calls deferreturn() and returns. This segment is not reachable by the normal function, but is returned to by the runtime during recovery. We set the liveness information of this deferreturn() to be the same as the liveness at the first function call during the last defer exit code (so all return values and all stack slots needed by the defer calls will be live). I needed to increase the stackguard constant from 880 to 896, because of a small amount of new code in deferreturn(). The -N flag disables open-coded defers. '-d defer' prints out the kind of defer being used at each defer statement (heap-allocated, stack-allocated, or open-coded). Cost of defer statement [ go test -run NONE -bench BenchmarkDefer$ runtime ] With normal (stack-allocated) defers only: 35.4 ns/op With open-coded defers: 5.6 ns/op Cost of function call alone (remove defer keyword): 4.4 ns/op Text size increase (including funcdata) for go cmd without/with open-coded defers: 0.09% The average size increase (including funcdata) for only the functions that use open-coded defers is 1.1%. The cost of a panic followed by a recover got noticeably slower, since panic processing now requires a scan of the stack for open-coded defer frames. This scan is required, even if no frames are using open-coded defers: Cost of panic and recover [ go test -run NONE -bench BenchmarkPanicRecover runtime ] Without open-coded defers: 62.0 ns/op With open-coded defers: 255 ns/op A CGO Go-to-C-to-Go benchmark got noticeably faster because of open-coded defers: CGO Go-to-C-to-Go benchmark [cd misc/cgo/test; go test -run NONE -bench BenchmarkCGoCallback ] Without open-coded defers: 443 ns/op With open-coded defers: 347 ns/op Updates #14939 (defer performance) Updates #34481 (design doc) Change-Id: I51a389860b9676cfa1b84722f5fb84d3c4ee9e28 Reviewed-on: https://go-review.googlesource.com/c/go/+/190098 Reviewed-by: Austin Clements <austin@google.com>

gopherbot · 2019-10-21T02:31:01Z

Change https://golang.org/cl/202340 mentions this issue: cmd/compile, cmd/link, runtime: make defers low-cost through inline code and extra funcdata

…ode and extra funcdata Generate inline code at defer time to save the args of defer calls to unique (autotmp) stack slots, and generate inline code at exit time to check which defer calls were made and make the associated function/method/interface calls. We remember that a particular defer statement was reached by storing in the deferBits variable (always stored on the stack). At exit time, we check the bits of the deferBits variable to determine which defer function calls to make (in reverse order). These low-cost defers are only used for functions where no defers appear in loops. In addition, we don't do these low-cost defers if there are too many defer statements or too many exits in a function (to limit code increase). When a function uses open-coded defers, we produce extra FUNCDATA_OpenCodedDeferInfo information that specifies the number of defers, and for each defer, the stack slots where the closure and associated args have been stored. The funcdata also includes the location of the deferBits variable. Therefore, for panics, we can use this funcdata to determine exactly which defers are active, and call the appropriate functions/methods/closures with the correct arguments for each active defer. In order to unwind the stack correctly after a recover(), we need to add an extra code segment to functions with open-coded defers that simply calls deferreturn() and returns. This segment is not reachable by the normal function, but is returned to by the runtime during recovery. We set the liveness information of this deferreturn() to be the same as the liveness at the first function call during the last defer exit code (so all return values and all stack slots needed by the defer calls will be live). I needed to increase the stackguard constant from 880 to 896, because of a small amount of new code in deferreturn(). The -N flag disables open-coded defers. '-d defer' prints out the kind of defer being used at each defer statement (heap-allocated, stack-allocated, or open-coded). Cost of defer statement [ go test -run NONE -bench BenchmarkDefer$ runtime ] With normal (stack-allocated) defers only: 35.4 ns/op With open-coded defers: 5.6 ns/op Cost of function call alone (remove defer keyword): 4.4 ns/op Text size increase (including funcdata) for go binary without/with open-coded defers: 0.09% The average size increase (including funcdata) for only the functions that use open-coded defers is 1.1%. The cost of a panic followed by a recover got noticeably slower, since panic processing now requires a scan of the stack for open-coded defer frames. This scan is required, even if no frames are using open-coded defers: Cost of panic and recover [ go test -run NONE -bench BenchmarkPanicRecover runtime ] Without open-coded defers: 62.0 ns/op With open-coded defers: 255 ns/op A CGO Go-to-C-to-Go benchmark got noticeably faster because of open-coded defers: CGO Go-to-C-to-Go benchmark [cd misc/cgo/test; go test -run NONE -bench BenchmarkCGoCallback ] Without open-coded defers: 443 ns/op With open-coded defers: 347 ns/op Updates #14939 (defer performance) Updates #34481 (design doc) Change-Id: I63b1a60d1ebf28126f55ee9fd7ecffe9cb23d1ff Reviewed-on: https://go-review.googlesource.com/c/go/+/202340 Reviewed-by: Austin Clements <austin@google.com>

danscales · 2019-10-29T21:56:17Z

Fixed by be64a19

mcandre · 2019-11-07T20:46:45Z

Could a defer propagate an error value through the function's normal return values? Defers are most often used for closing I/O handles and the like, which often have important error information.

Workarounds:

Log, but otherwise remain quiet about the error.
Emit the error to an external monitoring service.
Panic on error.
Don't use defer.

ianlancetaylor · 2019-11-07T22:47:50Z

@mcandre I'm sorry, I don't understand what you are asking. In any case, this does not seem like the right place to ask it. This issue is about a specific design of the implementation of defer. It's not about how defer is defined to work in the language. You may want to use a forum; see https://golang.org/wiki/Questions. Thanks.

bcmills added NeedsInvestigation Performance labels Sep 24, 2019

bcmills added this to the Go1.14 milestone Sep 24, 2019

bcmills mentioned this issue Sep 24, 2019

runtime: panic + recover can cancel a call to Goexit #29226

Closed

bcmills mentioned this issue Sep 26, 2019

spec: clarify when calling recover stops a panic #34530

Open

rsc modified the milestones: Go1.14, Backlog Oct 9, 2019

danscales modified the milestones: Backlog, Go1.14 Oct 10, 2019

danscales closed this as completed Oct 29, 2019

randall77 mentioned this issue Dec 18, 2019

Proposal: compile staticaly branch free defer #36207

Closed

golang locked and limited conversation to collaborators Nov 6, 2020

gopherbot added the FrozenDueToAge label Nov 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/compile, runtime: a design for low-cost defers via inline code #34481

cmd/compile, runtime: a design for low-cost defers via inline code #34481

danscales commented Sep 23, 2019 •

edited

Loading

gopherbot commented Sep 23, 2019

gopherbot commented Sep 24, 2019

bcmills commented Sep 24, 2019

danscales commented Sep 24, 2019 •

edited

Loading

tsatke commented Sep 24, 2019

danscales commented Sep 24, 2019 •

edited

Loading

rasky commented Sep 24, 2019

CAFxX commented Sep 24, 2019 •

edited

Loading

rasky commented Sep 24, 2019

danscales commented Sep 24, 2019

bcmills commented Sep 24, 2019

CAFxX commented Sep 25, 2019 •

edited

Loading

danscales commented Sep 25, 2019

gopherbot commented Sep 26, 2019

DeedleFake commented Oct 10, 2019 •

edited

Loading

danscales commented Oct 11, 2019

gopherbot commented Oct 21, 2019

danscales commented Oct 29, 2019

mcandre commented Nov 7, 2019

ianlancetaylor commented Nov 7, 2019

cmd/compile, runtime: a design for low-cost defers via inline code #34481

cmd/compile, runtime: a design for low-cost defers via inline code #34481

Comments

danscales commented Sep 23, 2019 • edited Loading

gopherbot commented Sep 23, 2019

gopherbot commented Sep 24, 2019

bcmills commented Sep 24, 2019

danscales commented Sep 24, 2019 • edited Loading

tsatke commented Sep 24, 2019

danscales commented Sep 24, 2019 • edited Loading

rasky commented Sep 24, 2019

CAFxX commented Sep 24, 2019 • edited Loading

rasky commented Sep 24, 2019

danscales commented Sep 24, 2019

bcmills commented Sep 24, 2019

CAFxX commented Sep 25, 2019 • edited Loading

danscales commented Sep 25, 2019

gopherbot commented Sep 26, 2019

DeedleFake commented Oct 10, 2019 • edited Loading

danscales commented Oct 11, 2019

gopherbot commented Oct 21, 2019

danscales commented Oct 29, 2019

mcandre commented Nov 7, 2019

ianlancetaylor commented Nov 7, 2019

danscales commented Sep 23, 2019 •

edited

Loading

danscales commented Sep 24, 2019 •

edited

Loading

danscales commented Sep 24, 2019 •

edited

Loading

CAFxX commented Sep 24, 2019 •

edited

Loading

CAFxX commented Sep 25, 2019 •

edited

Loading

DeedleFake commented Oct 10, 2019 •

edited

Loading