Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile, runtime: a design for low-cost defers via inline code #34481

Closed
danscales opened this issue Sep 23, 2019 · 20 comments
Closed

cmd/compile, runtime: a design for low-cost defers via inline code #34481

danscales opened this issue Sep 23, 2019 · 20 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@danscales
Copy link
Contributor

danscales commented Sep 23, 2019

We are working on an implementation of defer statements that makes most defers no more expensive than open-coding the deferred call, hence eliminating the incentives to avoid using this language to its fullest extent. We wanted to document the design in the proposal/design repo.
See issue #14939 for discussion of defer performance issues.

The design document is posted here. The current, mostly complete implementation is here

Comments are welcome on either the design or the implementation.

@gopherbot
Copy link

Change https://golang.org/cl/196964 mentions this issue: design: add 34481-opencoded-defers.md

gopherbot pushed a commit to golang/proposal that referenced this issue Sep 23, 2019
This changes adds a design document for reducing the cost of defers significantly
by essentially inlining defer calls on normal exits in most cases.

Updates golang/go#34481

Change-Id: If64787e591864ab7843503b8f09c4d6dd6a7a535
Reviewed-on: https://go-review.googlesource.com/c/proposal/+/196964
Reviewed-by: Keith Randall <khr@golang.org>
@gopherbot
Copy link

Change https://golang.org/cl/197017 mentions this issue: misc, runtime, test: extra tests and benchmarks for defer

@bcmills bcmills added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance labels Sep 24, 2019
@bcmills
Copy link
Contributor

bcmills commented Sep 24, 2019

@danscales, should this be milestoned to 1.14, or is it a longer-term effort?

@danscales
Copy link
Contributor Author

danscales commented Sep 24, 2019

Yes, Go1.14 milestone is correct -- that's what we're aiming for.

@tsatke
Copy link

tsatke commented Sep 24, 2019

This proposal only allows up do 64 defers per function, is that correct?
I imagine this is enough for most hand written code, but I can imagine some frameworks generating a lot more than that...

Or imagine test code opening every file in a directory and deferring the file close. This won’t work for large directories and may break backwards compatibility in some cases. Also, what would be the error?

@danscales
Copy link
Contributor Author

danscales commented Sep 24, 2019

This proposal only allows up do 64 defers per function, is that correct?

Yes, there is some limit to the number of defers per function (the size of the deferBits bitmask). In the current implementation, it is 8, but easily changed to 64.

But to clarify, if open-coded (inlined) defers cannot be used in a function, we just revert back to the current way of doing defers (create a "defer object" and add to the defer chain, process the appropriate objects on the defer chain at all function exits).

Our goal is to reduce the defer overhead in the common cases where it occurs, which is typically smallish functions with a few defers. In unusual cases (functions with defers in loops or lots of defer statements), we just revert to the current method.

I will add some text to the design document to make this clearer.

@rasky
Copy link
Member

rasky commented Sep 24, 2019

Is there a way to avoid checking deferBits for defers that are always executed? I don’t mind that the respective bits get set, but there is no compiler optimization that’s able to remove checks for those bits (unless they are all always set and thus the variable is constant). It looks like it should be sufficient to check whether the defer is called in a block that dominates all exits (or something like that) and not emit the bitcheck in that case.

@CAFxX
Copy link
Contributor

CAFxX commented Sep 24, 2019

@rasky dominating all exits is not enough; consider:

func foo() {
  bar()
  defer baz()
  // ...
}

If bar() panics, baz() should not be executed (even though the defer dominates all exits).

@rasky
Copy link
Member

rasky commented Sep 24, 2019

The special panic handler can still check the bits if it needs to. My comment is about not generating bit checks in open-coded defers within the function body.

@danscales
Copy link
Contributor Author

Is there a way to avoid checking deferBits for defers that are always executed? I don’t mind that the respective bits get set, but there is no compiler optimization that’s able to remove checks for those bits (unless they are all always set and thus the variable is constant). It looks like it should be sufficient to check whether the defer is called in a block that dominates all exits (or something like that) and not emit the bitcheck in that case.

Yes, the deferBits conditionals (in the inline code) are eliminated completely in the simple cases where all defers are unconditional. If all defers are unconditional, then the stream of operations on deferBits are all unconditional and just involve constants, so the value is known everywhere through constant propagation. The 'opt' phase of the compiler does the constant propagation and then eliminates the if-check on deferBits which is always true.

As you point out (good point!), if one defer is always executed (dominates the exit) but others are not, we could potentially remove the deferBits check for that specific defer on any exits after that defer. I will put that on the TODO list (and add that as an idea for the next rev of the design doc). I may not do that optimization for the initial checkin.

@bcmills
Copy link
Contributor

bcmills commented Sep 24, 2019

Wasn't @dr2chase asking about bitwise constant-propagation a while back?

@CAFxX
Copy link
Contributor

CAFxX commented Sep 25, 2019

@danscales inlining happens before defers are lowered in your implementation, so in the following example Unlock() is not inlined in foo(), correct? 😭

func foo(m *sync.Mutex) {
  m.Lock()
  defer m.Unlock()
  // ...
}

@danscales
Copy link
Contributor Author

Yes, inlining happens much earlier in the compiler, so for my implementation, even though the defer calls will be directly expressed (open-coded) at exit, the calls cannot be replaced by their definition (i.e. inlined).

gopherbot pushed a commit that referenced this issue Sep 25, 2019
Add a bunch of extra tests and benchmarks for defer, in preparation for new
low-cost (open-coded) implementation of defers (see #34481),

 - New file defer_test.go that tests a bunch more unusual defer scenarios,
   including things that might have problems for open-coded defers.
 - Additions to callers_test.go actually verifying what the stack trace looks like
   for various panic or panic-recover scenarios.
 - Additions to crash_test.go testing several more crash scenarios involving
   recursive panics.
 - New benchmark in runtime_test.go measuring speed of panic-recover
 - New CGo benchmark in cgo_test.go calling from Go to C back to Go that
   shows defer overhead

Updates #34481

Change-Id: I423523f3e05fc0229d4277dd00073289a5526188
Reviewed-on: https://go-review.googlesource.com/c/go/+/197017
Run-TryBot: Dan Scales <danscales@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
@gopherbot
Copy link

Change https://golang.org/cl/190098 mentions this issue: cmd/compile, cmd/link, runtime: make defers low-cost through inline code and extra funcdata

@DeedleFake
Copy link

DeedleFake commented Oct 10, 2019

I'm not too familiar with the internals of the compiler, but would it be possible to mix unconditional, conditional, and looped defers in the same function with this design? For example, insert hard-coded calls to any unconditional defers, regardless of others, at the exits, and insert the bit checks for conditional ones, and insert code to deal with looped defers?

In other words,

func Example(t int) {
  defer Unconditional()
  if t > 3 {
    defer Conditional()
  }
  defer AlsoUnconditional()
  for i := 0; i < t; i++ {
    defer Looped(i)
  }
}

would essentially have the equivalent of all of the following inserted at the returns:

for i := len(deferredLoopedCalls) - 1; i >= 0; i-- {
  deferredLoopedCalls[i]()
}
AlsoUnconditional()
if conditionalBitSet {
  Conditional()
}
Unconditional()

@danscales danscales modified the milestones: Backlog, Go1.14 Oct 10, 2019
@danscales
Copy link
Contributor Author

We already handle conditional and non-conditional defers with this proposal. You have to set the bitmask in all cases, since you can have panics (especially run-time panics) at many points in the function, and so it is not easy to know otherwise (without a detailed pc map) whether you hit a particular unconditional defer statement before the panic happened.

For actual exit code, you can eliminate some bitmask checks if a particular defer statement dominates the exit that you are generating for. That optimization is not in this change, but seems reasonable to do soon as an optimization - definitely on my TODO list.

It would seem to be fairly complex to combine the low-cost defers with the looped defers (which would have to be handled by the current defer record approach) in the same function. I think a bunch more information would have to be save and new code generated to make sure that the looped defers and the inlined defers were executed in right order. So, I'm not planning to do that (especially since those cases are very rarely and not as likely to be performance-sensitive).

gopherbot pushed a commit that referenced this issue Oct 16, 2019
…ode and extra funcdata

Generate inline code at defer time to save the args of defer calls to unique
(autotmp) stack slots, and generate inline code at exit time to check which defer
calls were made and make the associated function/method/interface calls. We
remember that a particular defer statement was reached by storing in the deferBits
variable (always stored on the stack). At exit time, we check the bits of the
deferBits variable to determine which defer function calls to make (in reverse
order). These low-cost defers are only used for functions where no defers
appear in loops. In addition, we don't do these low-cost defers if there are too
many defer statements or too many exits in a function (to limit code increase).

When a function uses open-coded defers, we produce extra
FUNCDATA_OpenCodedDeferInfo information that specifies the number of defers, and
for each defer, the stack slots where the closure and associated args have been
stored. The funcdata also includes the location of the deferBits variable.
Therefore, for panics, we can use this funcdata to determine exactly which defers
are active, and call the appropriate functions/methods/closures with the correct
arguments for each active defer.

In order to unwind the stack correctly after a recover(), we need to add an extra
code segment to functions with open-coded defers that simply calls deferreturn()
and returns. This segment is not reachable by the normal function, but is returned
to by the runtime during recovery. We set the liveness information of this
deferreturn() to be the same as the liveness at the first function call during the
last defer exit code (so all return values and all stack slots needed by the defer
calls will be live).

I needed to increase the stackguard constant from 880 to 896, because of a small
amount of new code in deferreturn().

The -N flag disables open-coded defers. '-d defer' prints out the kind of defer
being used at each defer statement (heap-allocated, stack-allocated, or
open-coded).

Cost of defer statement  [ go test -run NONE -bench BenchmarkDefer$ runtime ]
  With normal (stack-allocated) defers only:         35.4  ns/op
  With open-coded defers:                             5.6  ns/op
  Cost of function call alone (remove defer keyword): 4.4  ns/op

Text size increase (including funcdata) for go cmd without/with open-coded defers:  0.09%

The average size increase (including funcdata) for only the functions that use
open-coded defers is 1.1%.

The cost of a panic followed by a recover got noticeably slower, since panic
processing now requires a scan of the stack for open-coded defer frames. This scan
is required, even if no frames are using open-coded defers:

Cost of panic and recover [ go test -run NONE -bench BenchmarkPanicRecover runtime ]
  Without open-coded defers:        62.0 ns/op
  With open-coded defers:           255  ns/op

A CGO Go-to-C-to-Go benchmark got noticeably faster because of open-coded defers:

CGO Go-to-C-to-Go benchmark [cd misc/cgo/test; go test -run NONE -bench BenchmarkCGoCallback ]
  Without open-coded defers:        443 ns/op
  With open-coded defers:           347 ns/op

Updates #14939 (defer performance)
Updates #34481 (design doc)

Change-Id: I51a389860b9676cfa1b84722f5fb84d3c4ee9e28
Reviewed-on: https://go-review.googlesource.com/c/go/+/190098
Reviewed-by: Austin Clements <austin@google.com>
@gopherbot
Copy link

Change https://golang.org/cl/202340 mentions this issue: cmd/compile, cmd/link, runtime: make defers low-cost through inline code and extra funcdata

gopherbot pushed a commit that referenced this issue Oct 24, 2019
…ode and extra funcdata

Generate inline code at defer time to save the args of defer calls to unique
(autotmp) stack slots, and generate inline code at exit time to check which defer
calls were made and make the associated function/method/interface calls. We
remember that a particular defer statement was reached by storing in the deferBits
variable (always stored on the stack). At exit time, we check the bits of the
deferBits variable to determine which defer function calls to make (in reverse
order). These low-cost defers are only used for functions where no defers
appear in loops. In addition, we don't do these low-cost defers if there are too
many defer statements or too many exits in a function (to limit code increase).

When a function uses open-coded defers, we produce extra
FUNCDATA_OpenCodedDeferInfo information that specifies the number of defers, and
for each defer, the stack slots where the closure and associated args have been
stored. The funcdata also includes the location of the deferBits variable.
Therefore, for panics, we can use this funcdata to determine exactly which defers
are active, and call the appropriate functions/methods/closures with the correct
arguments for each active defer.

In order to unwind the stack correctly after a recover(), we need to add an extra
code segment to functions with open-coded defers that simply calls deferreturn()
and returns. This segment is not reachable by the normal function, but is returned
to by the runtime during recovery. We set the liveness information of this
deferreturn() to be the same as the liveness at the first function call during the
last defer exit code (so all return values and all stack slots needed by the defer
calls will be live).

I needed to increase the stackguard constant from 880 to 896, because of a small
amount of new code in deferreturn().

The -N flag disables open-coded defers. '-d defer' prints out the kind of defer
being used at each defer statement (heap-allocated, stack-allocated, or
open-coded).

Cost of defer statement  [ go test -run NONE -bench BenchmarkDefer$ runtime ]
  With normal (stack-allocated) defers only:         35.4  ns/op
  With open-coded defers:                             5.6  ns/op
  Cost of function call alone (remove defer keyword): 4.4  ns/op

Text size increase (including funcdata) for go binary without/with open-coded defers:  0.09%

The average size increase (including funcdata) for only the functions that use
open-coded defers is 1.1%.

The cost of a panic followed by a recover got noticeably slower, since panic
processing now requires a scan of the stack for open-coded defer frames. This scan
is required, even if no frames are using open-coded defers:

Cost of panic and recover [ go test -run NONE -bench BenchmarkPanicRecover runtime ]
  Without open-coded defers:        62.0 ns/op
  With open-coded defers:           255  ns/op

A CGO Go-to-C-to-Go benchmark got noticeably faster because of open-coded defers:

CGO Go-to-C-to-Go benchmark [cd misc/cgo/test; go test -run NONE -bench BenchmarkCGoCallback ]
  Without open-coded defers:        443 ns/op
  With open-coded defers:           347 ns/op

Updates #14939 (defer performance)
Updates #34481 (design doc)

Change-Id: I63b1a60d1ebf28126f55ee9fd7ecffe9cb23d1ff
Reviewed-on: https://go-review.googlesource.com/c/go/+/202340
Reviewed-by: Austin Clements <austin@google.com>
@danscales
Copy link
Contributor Author

Fixed by be64a19

@mcandre
Copy link

mcandre commented Nov 7, 2019

Could a defer propagate an error value through the function's normal return values? Defers are most often used for closing I/O handles and the like, which often have important error information.

Workarounds:

  • Log, but otherwise remain quiet about the error.
  • Emit the error to an external monitoring service.
  • Panic on error.
  • Don't use defer.

@ianlancetaylor
Copy link
Contributor

@mcandre I'm sorry, I don't understand what you are asking. In any case, this does not seem like the right place to ask it. This issue is about a specific design of the implementation of defer. It's not about how defer is defined to work in the language. You may want to use a forum; see https://golang.org/wiki/Questions. Thanks.

@golang golang locked and limited conversation to collaborators Nov 6, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
None yet
Development

No branches or pull requests

10 participants