Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

context: add APIs for writing and reading cancelation cause #51365

Closed
Sajmani opened this issue Feb 25, 2022 · 75 comments
Closed

context: add APIs for writing and reading cancelation cause #51365

Sajmani opened this issue Feb 25, 2022 · 75 comments

Comments

@Sajmani
Copy link
Contributor

Sajmani commented Feb 25, 2022

This proposal addresses issue #26356 by extending package context with new functions for writing and reading the "cause" of the context's cancelation. This approach is similar to what was proposed in #46273.

Design notes and alternatives

Naming

I chose the name Cause because it's short. Alternatives include Reason (as in #46273) and WhyDone.

Cause is an error

Based on user feedback in #26356, we've made cause an error so that the programmer can include structured data and introspect it with errors.Is/As. Alternatives would be a string or a stack trace (captured automatically).

Compatibility

As package context is covered by the Go 1 compatibility guarantee, we are unable to change any of the existing types or functions in the package. In particular

  • we cannot add Cause as a method of the Context interface
  • we cannot change the existing CancelFunc type to take a cause
  • we cannot change the existing WithCancel, WithDeadline, or WithTimeout function signatures
  • we cannot change Err to return any values other than nil, Canceled, or DeadlineExceeded

Performance considerations

Based on user feedback, our goal is to introduce no new performance overhead for existing uses of Context (hence the rejection of the design that captured stack traces automatically for existing cancelations). We consider it acceptable to increase the size of some context implementation structs by a small amount (specifically, adding an error field to cancelCtx).

Cause(ctx) vs. ctx.Err()

@andreimatei pointed out that in the current prototype, it's possible for Cause(ctx) to be nil when ctx.Err() is non-nil. This is working as intended: we expect users to check ctx.Err() before reading Cause(ctx). However, @andreimatei suggested users might want to check Cause(ctx) instead of ctx.Err(), in which case we'd need to guarantee that Cause(ctx) is non-nil exactly when ctx.Err() is non-nil. If we do this, I'd recommend Cause(ctx) == ctx.Err() at all times except when the cause has been explicitly set to non-nil. This would complicate the handling of code that sets the cause to nil.

Concerns for custom Context implementations

@andreimatei raised concerns about the fact that this design does not provide a way for custom Context implementations to control the behavior of Cause(ctx). As a result, it's possible for Cause(ctx) to return a non-nil error while the custom Context's Done channel has not yet been closed. I pointed out that a custom Context can control Cause(ctx) by wrapping a standard Context returned by WithCancelCause; @andreimatei pointed out that this makes it difficult for custom Context implementations to achieve their efficiency goals.

We could address this concern by exposing a key for Context.Value that looks up the Cause, which would allow custom Context implementations to override the lookup for that key. This would require changing the current prototype that reuses the existing cancelCtxKey to find the cause.

Proposed API changes

I've prototyped this change in CL https://go-review.googlesource.com/c/go/+/375977, including several tests to exercise various interleaving of cancelations with and without causes.

package context

// WithCancelCause behaves like WithCancel but returns a CancelCauseFunc instead of a CancelFunc.
// Calling cancel with a non-nil error (the "cause") records that error in ctx;
// it can then be retrieved using Cause(ctx).
//
// Example use:
//   ctx, cancel := context.WithCancelCause(parent)
//   cancel(myError)
//   ctx.Err() // returns context.Canceled
//   context.Cause(ctx) // returns myError
func WithCancelCause(parent Context) (ctx Context, cancel CancelCauseFunc)

// A CancelCauseFunc behaves like a CancelFunc but additionally sets the cancelation cause.
// This cause can be retrieved by calling Cause on the canceled Context or any of its derived Contexts.
// If the context has already been canceled, CancelCauseFunc does not set the cause.
type CancelCauseFunc func(cause error)

// Cause returns a non-nil error if a parent Context was canceled using a CancelCauseFunc that was passed that error.
// Otherwise Cause returns nil.
func Cause(c Context) error

// WithDeadlineCause behaves like WithDeadline but also sets the cause of the
// returned Context when the deadline is exceeded. The returned CancelFunc does
// not set the cause.
func WithDeadlineCause(parent Context, d time.Time, cause error) (Context, CancelFunc)

// WithTimeoutCause behaves like WithDeadlineCause(parent, time.Now().Add(timeout), cause).
func WithTimeoutCause(parent Context, timeout time.Duration, cause error) (Context, CancelFunc)
@gopherbot gopherbot added this to the Proposal milestone Feb 25, 2022
@ianlancetaylor ianlancetaylor added this to Incoming in Proposals (old) Feb 25, 2022
@wizardishungry
Copy link

// Cause returns a non-nil error if a parent Context was canceled using a CancelCauseFunc that was passed that error.
// Otherwise Cause returns nil.
func Cause(c Context) error

It would be nice to use errors.Is or errors.Unwrap directly on ctx.Err(), but I assume a ton of existing code does err == context.Canceled.

@Sajmani
Copy link
Contributor Author

Sajmani commented Feb 25, 2022

@wizardishungry That's right: the Context API guarantees that Err only ever returns nil, Canceled, or DeadlineExceeded, and plenty of client code relies on this. We can't break it.

@rsc
Copy link
Contributor

rsc commented Mar 2, 2022

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc
Copy link
Contributor

rsc commented Mar 9, 2022

Commented on #26356 and #46273 to try to get feedback from people who are interested in this problem.

@OneOfOne
Copy link
Contributor

OneOfOne commented Mar 9, 2022

I've needed this in multiple projects.

@bullgare
Copy link

I think, it could work if will be adopted by packages like net/http, sql, and so on.

@Sajmani
Copy link
Contributor Author

Sajmani commented Mar 10, 2022

Hi Dmitry, what would integration with http and sql look like? They already integrate with Context.

@egonelbre
Copy link
Contributor

@Sajmani from the documentation https://pkg.go.dev/context#Context.Err it's not clear to me that it only returns those values. To me it looks like it just lists some of the usual errors you might get. I've always assumed that custom context implementations can return different errors.

@bullgare
Copy link

@Sajmani If that libraries would add a cancellation reason.
To know at least which library did it and, ideally, why.
I had a case when on production I got random "context canceled" log messages. And in the case like that you don't even know where to dig and how to investigate it further. Or how to reproduce it on a local machine.
Cancellation cause could help with that potentially.

@Sajmani
Copy link
Contributor Author

Sajmani commented Mar 14, 2022

@egonelbre While we might reasonably interpret the documentation of Err to mean it could return other values, I believe there is code that already assumes it only returns those values. I believe it's best to leave Err's behavior unchanged to avoid breaking such code.

@egonelbre
Copy link
Contributor

@Sajmani sure, I do fear that changing the behavior in that case would cause more problems. However, it also means that code checking Err() == context.Canceled might be wrong already, when custom contexts are used. It might be good to either suggest in context.Context documentation to use errors.Is for checking the error or alternatively change the wording to make clear that only those values can be returned.

I haven't found an easy way to find how common that problem is. One that I did find is https://github.com/pion/ice/blob/39ae3a4fa890e29069c764039a98dff44ca94255/context.go#L23 -- which suggests that there are more such contexts.

@ChrisHines
Copy link
Contributor

FWIW, I have always interpreted the Err docs to mean it can only return nil, Canceled, or DeadlineExceeded, which is why I asked #46273 (comment).

If we add WithCancelReason how do you propose to update the docs for the Err method to reflect the new behavior?

I do prefer to use errors.Is(err, context.DeadlineExceeded) in new code since Go 1.13.

But that's not the end of it. In order for #46273 to be acceptable I think the context.Context interface docs should probably specify that all implementations only return errors for which errors.Is(err, DeadlineExceeded) || errors.Is(err, Canceled) is true.

My rationale is that the current docs pretty clearly (at least to me) specify that there are only two ways a context can become canceled and the context package provides error values to distinguish those two dispositions. Code that wants to make choices about how to handle a context cancelation based on the reason currently has a finite set of cases to handle whether you do it with == or errors.Is. Allowing Err to return arbitrary error values changes that equation.

The proposal sidesteps that by allowing contexts to provide two errors.

Questions about this proposal:

The proposed design of WithDeadlineCause and WithTimeoutCause are curious. I am trying to think through how to use them and why I might want to.

This seems maybe useful:

func orchestrate(ctx context.Context) {
    ctx, cancel := context.WithTimeoutCause(ctx, time.Minute, errors.New("outer timeout"))
    defer cancel()
    ...
    doWork(ctx)
}

func doWork(ctx context.Context) {
    ctx, cancel := context.WithTimeoutCause(ctx, time.Second, errors.New("inner timeout"))
    defer cancel()
    ...
    if err := ctx.Err(); err != nil {
        log.Printf("aborting work: %v, because: %v", err, context.Cause(ctx))
    }
}

Now our logs will tell us which of the two timeouts triggered the work to abort. It may still be ambiguous in the case that the inner timeout ended up having the same deadline as its parent due to it potentially being started near the end of the parent context's time range.

But what if orchestrate also has a reason to explicitly cancel the work before the timeout?

func orchestrate(ctx context.Context) {
    ctx, cancel := context.WithTimeoutCause(ctx, time.Minute, errors.New("outer timeout"))
    defer cancel()
    ...
    go doWork(ctx)
    ...
    if err := isSomethingWrong(); err != nil {
        cancel()
    }
}

Now we lack the ability to provide a cause without adding another wrapping context like this:

func orchestrate(ctx context.Context) {
    ctx, hardCancel := context.WithCancelCause(ctx)
    defer cancel(nil) // Is this idiomatic?
    ctx, cancel := context.WithTimeoutCause(ctx, time.Minute, errors.New("outer timeout"))
    defer cancel()
    ...
    go doWork(ctx)
    ...
    if err := isSomethingWrong(); err != nil {
        hardCancel(err)
    }
}

This seems a bit unfortunate and I wonder if there is a better way.

@Sajmani
Copy link
Contributor Author

Sajmani commented Mar 14, 2022

Great point about having causes both on deadline and direct cancelation.

IMO the code that calls WithCancelCause and WithTimeoutCause is the clearest: it makes it clear that there are two different reasons why this context might be canceled and provides them both.

cancel(nil) should work fine: it will cancel the context and record a nil Cause. Later cancelations won't write the cause because the Context has already been canceled.

While you don't technically need to call the second cancel—since the first deferred cancel will cascade down to the derived context and cancel it—it seems like good hygiene to write it as you suggest.

@bcmills
Copy link
Contributor

bcmills commented Mar 14, 2022

That example does suggest a synchronization consideration, though: both WithTimeoutCause and CancelCauseFunc should probably avoid annotating the cause if the Context was already done for some reason (for example, a parent cancellation).

That is: there should be at most one reason attributed to the Context, and it should be the one (if any) that actually caused the transition from not-done to done.

@rsc
Copy link
Contributor

rsc commented Mar 30, 2022

Does anyone object to adding this API? Specifically (quoting the top comment):

package context

// WithCancelCause behaves like WithCancel but returns a CancelCauseFunc instead of a CancelFunc.
// Calling cancel with a non-nil error (the "cause") records that error in ctx;
// it can then be retrieved using Cause(ctx).
//
// Example use:
//   ctx, cancel := context.WithCancelCause(parent)
//   cancel(myError)
//   ctx.Err() // returns context.Canceled
//   context.Cause(ctx) // returns myError
func WithCancelCause(parent Context) (ctx Context, cancel CancelCauseFunc)

// A CancelCauseFunc behaves like a CancelFunc but additionally sets the cancelation cause.
// This cause can be retrieved by calling Cause on the canceled Context or any of its derived Contexts.
// If the context has already been canceled, CancelCauseFunc does not set the cause.
type CancelCauseFunc func(cause error)

// Cause returns a non-nil error if a parent Context was canceled using a CancelCauseFunc that was passed that error.
// Otherwise Cause returns nil.
func Cause(c Context) error

// WithDeadlineCause behaves like WithDeadline but also sets the cause of the
// returned Context when the deadline is exceeded. The returned CancelFunc does
// not set the cause.
func WithDeadlineCause(parent Context, d time.Time, cause error) (Context, CancelFunc)

// WithTimeoutCause behaves like WithDeadlineCause(parent, time.Now().Add(timeout), cause).
func WithTimeoutCause(parent Context, timeout time.Duration, cause error) (Context, CancelFunc)

@hherman1
Copy link

Which cancel cause will be reported if multiple parent contexts have been cancelled with cause? I’m assuming it’s the nearest parent?

@andreimatei
Copy link
Contributor

I would make the following changes and/or clarifications to the API:

  1. Make and specify Cause(ctx) to return nil if ctx.Error() == nil.
  2. Make Cause(ctx) return Canceled or DeadlineExceeded (depending on the case) when ctx was canceled with through CancelFunc and not through a CancelCauseFunc, and when the timeout expired.
  3. Specify cancel(nil) to result in Cause(ctx) == Canceled.
  4. Make WithDeadlineCause and WithTimeoutCause return CancelCauseFunc instead of CancelFunc, with semantics similar to WithCancelCause. (*)
  5. Specify how a custom ctx implementation can participate in the Cause(ctx) determination, and export the stdlib key required to do so.

(*) About this point, @Sajmani you've said, in response to @ChrisHines:

IMO the code that calls WithCancelCause and WithTimeoutCause is the clearest: it makes it clear that there are two different reasons why this context might be canceled and provides them both.

I personally don't buy it. A single call to WithTimeoutCause returning a CancelCauseFunc that you immediately defer would also make it clear that there are two reasons why the ctx might be canceled.

@bcmills
Copy link
Contributor

bcmills commented Mar 31, 2022

@hherman1

Which cancel cause will be reported if multiple parent contexts have been cancelled with cause?

Personally, I would prefer that it be left unspecified (because synchronization makes guarantees difficult), but implementations should try to preserve the Cause that caused the Done channel to become closed. The other potential “causes” are not actually causal: they don't have any other observable effect on the Context.

@Sajmani
Copy link
Contributor Author

Sajmani commented Apr 1, 2022

@hherman1 @bcmills Indeed the behavior of the current prototype is that the cause associated with the first cancelation is propagated all the way down the chain, just like the Err is. This means a later cancelation, even if it's "nearer", won't update the cause, since the Context has already been canceled.

@Sajmani
Copy link
Contributor Author

Sajmani commented Apr 1, 2022

@andreimatei Of your suggestions, #5 is the one I'm most in agreement with, as that's a fundamental issue with the ability to define custom Contexts. The rest are all attempts to make Cause(ctx) an alternative to ctx.Err(), which is not my proposal. IMO it's best for code to check ctx.Err() as they do today, and consult Cause(ctx) for additional information. I'm happy to hear other points of view on this, of course, and if the proposal committee prefers your semantics, I'm OK with that.

@rsc
Copy link
Contributor

rsc commented Apr 13, 2022

On the one hand I'd like to enable custom contexts, but on the other hand I am not sure about exposing the implementation detail that is the context key. What stops someone from using WithCancelCause and then wrapping the result with their own implementation? Why do we need to expose the key?

@andreimatei
Copy link
Contributor

On the one hand I'd like to enable custom contexts, but on the other hand I am not sure about exposing the implementation detail that is the context key. What stops someone from using WithCancelCause and then wrapping the result with their own implementation? Why do we need to expose the key?

Nothing technically stops a custom ctx implementation from wrapping WithCancelCause, but that would mean at least one extra memory allocation.

On the one hand I'd like to enable custom contexts, but on the other hand I am not sure about exposing the implementation detail that is the context key.

I'd also bring a principle argument: Context is an interface to allow for custom implementations that interoperate with the standard library, but the proposal is to reserve functionality exclusively to the standard library; that seems wrong. I'd argue that, by making Cause use a special key, that "key" is no longer specific to the implementation of the standard library. Rather, it's logically part of the Context interface.

@rsc
Copy link
Contributor

rsc commented May 4, 2022

I'd argue that, by making Cause use a special key, that "key" is no longer specific to the implementation of the standard library. Rather, it's logically part of the Context interface.

Right, and that's a larger API change, which it would be good to avoid, at least at first. What if we want to optimize the implementation at some future point?

How often do custom contexts need this and would it be so awful for them to call WithCancelCause?

@rsc rsc modified the milestones: Proposal, Backlog Aug 12, 2022
@rsc
Copy link
Contributor

rsc commented Aug 12, 2022

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

@Sajmani Sajmani self-assigned this Aug 12, 2022
@gopherbot
Copy link

Change https://go.dev/cl/426215 mentions this issue: context: add APIs for writing and reading cancelation cause

@rekby
Copy link

rekby commented Aug 28, 2022

@Sajmani Hello.

In #51365 (comment) rsc suggested about Cause will return Context.Err if no cancel cause, but context cancelled.
And no discussions later.

In finish proposed Api changes and in PR - Cause will return nil if no Cancel cause (even if context cancelled)
Is it mistake / forgot?

@Sajmani
Copy link
Contributor Author

Sajmani commented Sep 14, 2022

@rekby Cause should return nil if no cause is set

@gopherbot
Copy link

Change https://go.dev/cl/375977 mentions this issue: context: add APIs for writing and reading cancelation cause

@Sajmani
Copy link
Contributor Author

Sajmani commented Nov 6, 2022

@rekby Actually I had forgotten Russ's comment; you are correct that Cause will return the same value as Err unless cause is set explicitly to a non-nil error. The current CL implements this behavior.

@Sajmani
Copy link
Contributor Author

Sajmani commented Nov 6, 2022

I think @ChrisHines is correct that we cannot easily arrange for a timeout or deadline to cancel a context with a custom cause, at least, not without an extra goroutine or timer. @ianlancetaylor 's sketch in #51365 (comment) doesn't handle the common case when a context is canceled before the timer expires. If we want to make this case work well, then I think we will need WithDeadlineCause and WithTimeoutCause as I had in my original prototype CL. I think we can wait to add those until people have had some time to use the new WithCancelCause API.

@Sajmani Sajmani reopened this Nov 8, 2022
@Sajmani
Copy link
Contributor Author

Sajmani commented Nov 8, 2022

I have submitted the CL to add WithCancelCause per @rsc 's spec in #51365 (comment)

I'm reopening this issue so that the proposal committee can discuss whether we should add WithDeadlineCause and WithTimeoutCause as well. As far as I can tell, there is no efficient way to implement their behavior by stacking the existing APIs, and I believe users will want to use causes to identify which deadline or timeout has expired in a context chain.

@dmitshur dmitshur modified the milestones: Backlog, Go1.20 Nov 8, 2022
@nightlyone
Copy link
Contributor

@Sajmani wouldn't that be a new proposal?

Since (business/community related) value is already achieved with the currently proposal as implemented right now and the proposed additions are optimizations on top that can get done in the next cycle.

If you propose to not ship the current version without the two additions discussed, keeping it open starts making more sense to me. Otherwise let's be fair and open a new proposal.

@rsc
Copy link
Contributor

rsc commented Nov 8, 2022

Closing this proposal as implemented but we can add a new one to focus specifically on WithDeadlineCause and WithTimeoutCause, which are subtly different from WithCancelCause.

WithCancelCause lets the cancel func set the cause at the cancellation moment.

WithDeadlineCause and WithTimeoutCause require you to say ahead of time what the cause will be when the timer goes off, and then that cause is used in place of the generic DeadlineExceeded. The cancel functions they return are plain CancelFuncs (with no user-specified cause), not CancelCauseFuncs, the reasoning being that the cancel on one of these is typically just for cleanup and/or to signal teardown that doesn't look at the cause anyway.

That distinction makes sense, but it makes WithDeadlineCause and WithTimeoutCause different in an important, subtle way from WithCancelCause. We missed that in the discussion, and although @ChrisHines did point it out after my July 27 message, I still don't think I understood it well enough until a discussion with @Sajmani just now.

Sameer is going to open a separate proposal for these two additions.

@mitar
Copy link
Contributor

mitar commented Oct 23, 2023

I have been following this issue for some time and was really happy that it landed. But now I tried to finally update code to use the new API and I was a bit surprised by it, primarily that it makes it hard to work with logic which expects that the error is context.Canceled and at the same time be able to retrieve the error at the end.

What I mean is that I have code which 1) obtains the error from the context, then it b) passes it through by returning the error through a chain of function call returns, and then 3) inspects the error and determines that it is about the context cancellation and wants to obtain the cause for it (e.g., log it).

Now, the tricky thing is that at point 3) I cannot know both if the error is context cancellation and what was the cause. I can do that only at point 1).

Options to me seems to be:

  • At point 1) do errors.Join(ctx.Err(), context.Cause(ctx)), but this is ugly if there was no cause, because context.Cause(ctx) in that case returns the same as ctx.Err()`, so I am joining two same errors. So more logic to combine only if they are different. Also returned error message is ugly.
  • I could use gitlab.com/tozd/go/errors's WrapWith which was made so that one error can be made a cause of another error, without modifying either of them. The error message is nicer, but the logic to check if they are different has still to be added.

I understand that originally ctx.Err could not return an context.Canceled error which would unwrap to the cause error to assure that err == context.Canceled continues to work. But why not make context.Cause return such an error already? So in my view, it would be better if context.Cause returned:

  • Same error as ctx.Err if there is no cause.
  • An error which satisfies both context.Is(context.Canceled) and can be unwrapped to the cause error. This can be done by having a custom Is method on the error.

For context.Cause returned errors we do not have to assure that err == context.Canceled holds.

I am not sure if we can still change context.Cause now? Probably not? But could we then introduce yet another context.CauseError or something, which would return an error with the behavior above? It can probably be made in 3rd party library as well, it is not too long, but I think it would be great to have it in standard library.

@rhcarvalho
Copy link
Contributor

What I mean is that I have code which 1) obtains the error from the context, then it b) passes it through by returning the error through a chain of function call returns, and then 3) inspects the error and determines that it is about the context cancellation and wants to obtain the cause for it (e.g., log it).

If you have a reference to the context that was possibly canceled (and possibly you have it because you presumably passed it down the function call chain), you could do at (3) the same that you'd do at (1) and consider both the error and the context.

Based on what I've seen of the bug tracker etiquette, I think a better place to continue the conversation is not this closed issue but the mailing list or forum/discord/slack/irc, and if there's a new proposal/feature request/bug we'd open a new GitHub issue.

@mitar
Copy link
Contributor

mitar commented Oct 23, 2023

you could do at (3) the same that you'd do at (1) and consider both the error and the context.

Not necessary, because the context at 3) might not be the cancelable context used in 1) (but its parent context).

if there's a new proposal/feature request/bug we'd open a new GitHub issue.

I am trying to first see if I missed something obvious and gauge interest before I open a new proposal.

@mitar
Copy link
Contributor

mitar commented Oct 26, 2023

I made #63759.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Proposals (old)
Likely Accept
Development

No branches or pull requests