context: panic in parentCancelCtx when context timeout #64307

wawababe · 2023-11-21T08:48:46Z

What version of Go are you using (`go version`)?

$ go version
1.17.3

Does this issue reproduce with the latest release?

Yes. It panics in version 1.21 too.

What operating system and processor architecture are you using (`go env`)?

uname -a Output

Linux Tencent-SNG 5.4.119-1-tlinux4-0010.3 #1 SMP Thu Jan 5 17:31:23 CST 2023 x86_64 x86_64 x86_64 GNU/Linux

What did you do?

The service is a typical tcp RPC service.
In my business handler, I will copy the context passed from the RPC handler or the redo machine, pass it to rpc calling function both in sync or async ways. It panic in our production environment occasionally, and I haven't found the actual path. But as NewThsContext creates a new context from the background, the panic no longer happens.

Here is the function I used to copy the context.

func NewThsContext(ctx context.Context, uid int64) *ths.Context {
	nctx := ths.NewContext(ctx)
	nctx.Uin = uint64(uid)
	h := th.NewTmeHeader()
	nauth := h.GetAuthInfo()
	nauth.UID = uint64(uid)
	h.SetAuthInfo(&nauth)
	if h.GetLocalServiceLen() == 0 {
		h.SetLocalServiceName("go_unified_ktv_chorus_game_svr")
	}
	nctx.Head = h
	return nctx
}

The context is passed by a lot of paths, I haven't found the path causing the panic.
Paths mainly include

1. RPC timeout context passed to the business handler

func (c *conn) serve(ctx context.Context) {
	ctx = context.WithValue(ctx, ClientAddr, c.remoteAddr)
	ctx, cancelCtx := context.WithCancel(ctx)
	c.cancelCtx = cancelCtx
	defer c.rwc.Close()
c.wg.Add(3)
go c.readRequests(ctx)
go c.handle(ctx)
go c.writeResponses(ctx)
c.wg.Wait()

    
      
      
    
  
}
func (c *conn) handle(ctx context.Context) {

defer func() {

c.cancelCtx()

c.wg.Done()

if err := recover(); err != nil {

c.rwc.SetDeadline(time.Now())

buf := make([]byte, RecoverStackSize)

buf = buf[:runtime.Stack(buf, false)]

log.Output(1, fmt.Sprintf("[PANIC] %v: %v\n%s", c.remoteAddr, err, buf))

}

}()
for {
	select {
	case <-ctx.Done():
		return
	case req := <-c.cin:
		go func() {
			subCtx, cancel := context.WithTimeout(ctx, c.server.msgTimeout)
			rsp, err := c.server.handler.Serve(subCtx, req)
			cancel()
			if err != nil {
				SysLog(err)
				return
			}
			select {
			case <-ctx.Done():
				SysLog("tcp handle business goroutine context done")
				return
			case c.cout <- rsp:
				SysLog("tcp handle business goroutine write rsp to cout channel")
			}
		}()
	}
}

    
      
      
    
  
}
func (sm ServeMutex) Serve(baseCtx context.Context, reqBuf []byte) ([]byte, error) {

conf := cat.GetConfig()

defer func() {

if e := recover(); e != nil {

dataBuf := make([]byte, 41024*1024)

dataBuf = dataBuf[:runtime.Stack(dataBuf, false)]

cat.Logf("[PANIC]%v\n>> %s", e, dataBuf)

}

}()
//parsing the business protocol  head
    head = parseHead(reqBuf)
    //set the context with info from the head
	baseCtx = context.WithValue(baseCtx, RemoteServiceName, head.GetLocalServiceName())
	baseCtx = context.WithValue(baseCtx, LocalServiceName, head.GetServiceName())
	baseCtx = context.WithValue(baseCtx, TraceID, head.GetTraceID())
	baseCtx = context.WithValue(baseCtx, SpanID, head.GetSpanID())
	baseCtx = context.WithValue(baseCtx, ParentSpanID, head.GetParentSpanID())
	baseCtx = context.WithValue(baseCtx, Env, head.GetEnv())

ctx := NewContext(baseCtx)
defer ctx.WriteLog()

// set the protocol head
ctx.Head = head

// found the handler
entry, ok := sm.mapEntries[head.GetCmdPattern()]
if !ok {
	return nil, fmt.Errorf("invalid cmd patrern:%s", head.GetCmdPattern())
}

// set the request and response type
ctx.Req = reflect.New(entry.reqType.Elem()).Interface()
ctx.Rsp = reflect.New(entry.rspType.Elem()).Interface()
    // deocde the request
var err error
    err = BodyUnmarshal(h, reqBuf, ctx.Req)
if err != nil {
	cat.Logf("protocol Decode buf failed, msg:%v", err)
	return nil, err
}

// 
entry.h.Process(ctx)

ctx.ErrCode = ctx.Head.GetResultCode()

// pack header、response
    .....
	return pkgBuf, nil

    
      
      
    
  
}
// NewContext 创建新上下文

func NewContext(ctx context.Context) *Context {

newCtx := Context{

Context:   cat.NewContext(ctx, cat.GetConfig().Log.LogWriter),

startTime: time.Now(),

}

newCtx.LogLevel = int(cat.GetConfig().Log.Level)

newCtx.Head = NewHeader()

return &newCtx

}

2. Redo timeout context

func (r *Redo) ExecAsync(obj interface{}, opts ...ExecOptionFunc) {

r.execWithWAL(true, r.ctx, obj, opts...)

}

func (r *Redo) execWithWAL(async bool, ctx context.Context, obj interface{}, opts ...ExecOptionFunc) error {

//.....

f := func() error {

err := r.exec(ctx, key, obj)

if requireCommit {

r.wal.Commit(key)

}

return err

}
if async {
	go f()
	return nil
}
return f()

    
      
      
    
  
}

The default redo context is created in function New.

func New(path string, h Handler, fs ...optionFunc) (*Redo, error) {

	r := Redo{
		h:      h,
		keyGen: newKeyGenerator(),
	}
	...
	r.ctx, r.cancel = context.WithCancel(context.Background())

	if r.opts.Retry.Thread > 1 {
		r.retryCh = make(chan []byte)
		for i := 0; i < int(r.opts.Retry.Thread); i++ {
			r.wg.Go(r.retryLoop)
		}
	}

	r.wg.Go(r.mainLoop)
	return &r, nil
}

What did you expect to see?

What did you see instead?

Here is the stack dumped in version 1.17.3 lately:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x4bc9c9]

goroutine 1684597594 [running]:
context.(*valueCtx).Done(0xc000042068)
        :1 +0x29
context.parentCancelCtx({0x1b9d600, 0xc0121dbc20})
        /usr/local/go/src/context/context.go:298 +0x2b
context.removeChild({0x1b9d600, 0xc0121dbc20}, {0x1b7fe30, 0xc0121dbce0})
        /usr/local/go/src/context/context.go:315 +0x31
context.(*timerCtx).cancel(0xc0121dbce0, 0x1, {0x1b731a0, 0x69709e0})
        /usr/local/go/src/context/context.go:486 +0x55
context.WithDeadline.func2()
        /usr/local/go/src/context/context.go:456 +0x30
created by time.goFunc
        /usr/local/go/src/time/sleep.go:180 +0x31

Both in go version 1.17.3 and 1.21, our services run in linux vm and container panic in parent.Done() in context/context.go

// parentCancelCtx returns the underlying *cancelCtx for parent.
// It does this by looking up parent.Value(&cancelCtxKey) to find
// the innermost enclosing *cancelCtx and then checking whether
// parent.Done() matches that *cancelCtx. (If not, the *cancelCtx
// has been wrapped in a custom implementation providing a
// different done channel, in which case we should not bypass it.)
func parentCancelCtx(parent Context) (*cancelCtx, bool) {
	done := parent.Done()
	if done == closedchan || done == nil {
		return nil, false
	}
	p, ok := parent.Value(&cancelCtxKey).(*cancelCtx)
	if !ok {
		return nil, false
	}
	pdone, _ := p.done.Load().(chan struct{})
	if pdone != done {
		return nil, false
	}
	return p, true
}

The text was updated successfully, but these errors were encountered:

bcmills · 2023-11-21T14:56:53Z

What did you do?

Please fill out the complete issue template. What is the surrounding code that creates the context.Context involved in the panic?

If possible, provide a runnable example using https://go.dev/play/.

wawababe · 2023-11-22T04:13:46Z

I have added more details into the section What did you do. The context involved too many calling paths. And the service code could not be run in the go playground.

What did you do?

Please fill out the complete issue template. What is the surrounding code that creates the context.Context involved in the panic?

If possible, provide a runnable example using https://go.dev/play/.

bcmills · 2023-11-27T23:47:18Z

Somehow the context is becoming corrupted. Any uses of unsafe in the surrounding code, or data races? If you build and run the program with the race detector enabled (the -race flag), does it report anything?

wawababe · 2023-12-12T08:24:18Z

Somehow the context is becoming corrupted. Any uses of unsafe in the surrounding code, or data races? If you build and run the program with the race detector enabled (the -race flag), does it report anything?

I see no uses of unsafe in the surrounding code. The panic only occurs in production environment, so only one machine could be used to run the old panic service with race flag enabled . It may takes a while to detect the data race.

gopherbot · 2024-01-12T08:44:09Z

Timed out in state WaitingForInfo. Closing.

(I am just a bot, though. Please speak up if this is a mistake or you have the requested information.)

bcmills changed the title ~~affected/context: panic in parentCancelCtx when context timeout~~ context: panic in parentCancelCtx when context timeout Nov 21, 2023

bcmills added WaitingForInfo NeedsInvestigation labels Nov 21, 2023

bcmills added this to the Backlog milestone Nov 21, 2023

bcmills removed the WaitingForInfo label Nov 27, 2023

bcmills added the WaitingForInfo label Nov 27, 2023

seankhliao added WaitingForInfo and removed WaitingForInfo labels Dec 12, 2023

gopherbot closed this as not planned Jan 12, 2024

golang locked and limited conversation to collaborators Jan 11, 2025

gopherbot added the FrozenDueToAge label Jan 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

context: panic in parentCancelCtx when context timeout #64307

context: panic in parentCancelCtx when context timeout #64307

wawababe commented Nov 21, 2023 •

edited

Loading

bcmills commented Nov 21, 2023

wawababe commented Nov 22, 2023

bcmills commented Nov 27, 2023

wawababe commented Dec 12, 2023

gopherbot commented Jan 12, 2024

context: panic in parentCancelCtx when context timeout #64307

context: panic in parentCancelCtx when context timeout #64307

Comments

wawababe commented Nov 21, 2023 • edited Loading

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

What did you do?

What did you expect to see?

What did you see instead?

bcmills commented Nov 21, 2023

wawababe commented Nov 22, 2023

bcmills commented Nov 27, 2023

wawababe commented Dec 12, 2023

gopherbot commented Jan 12, 2024

wawababe commented Nov 21, 2023 •

edited

Loading

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?