Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: SIGSEGV in the new timer code #35367

Closed
lni opened this issue Nov 5, 2019 · 8 comments
Closed

runtime: SIGSEGV in the new timer code #35367

lni opened this issue Nov 5, 2019 · 8 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Milestone

Comments

@lni
Copy link

lni commented Nov 5, 2019

What version of Go are you using (go version)?

go version devel +3c0fbeea7d Tue Nov 5 05:22:07 2019 +0000 darwin/amd64

What operating system and processor architecture are you using (go env)?

GO111MODULE=""
GOARCH="amd64"
GOBIN="/Users/lni/gobin"
GOCACHE="/Users/lni/Library/Caches/go-build"
GOENV="/Users/lni/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/lni/go"
GOPRIVATE=""
GOPROXY="mirrors.aliyun.com/goproxy"
GOROOT="/Users/lni/src/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/Users/lni/src/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Users/lni/src/go/src/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/b5/ndb6rntd4ys0451zchd64p_m0000gn/T/go-build169346949=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

One of my tests uses net.Conn.SetWriteDeadline(), it keeps crashing when using the latest devel version specified above.

What did you expect to see?

No SIGSEGV

What did you see instead?

SIGSEGV with the following stack trace:

fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x10524a9]

goroutine 87 [running]:
runtime.throw(0x14bc56a, 0x2a)
	/Users/lni/src/go/src/runtime/panic.go:1106 +0x72 fp=0xc0005d7290 sp=0xc0005d7260 pc=0x10321e2
runtime.sigpanic()
	/Users/lni/src/go/src/runtime/signal_unix.go:648 +0x46a fp=0xc0005d72c0 sp=0xc0005d7290 pc=0x104842a
runtime.modtimer(0x57626e0, 0x45d790aa5784e, 0x0, 0x14cb580, 0x14036e0, 0x5762640, 0x172)
	/Users/lni/src/go/src/runtime/time.go:640 +0xf9 fp=0xc0005d72e0 sp=0xc0005d72c0 pc=0x10524a9
internal/poll.runtime_pollSetDeadline(0x5762640, 0xb2d05cfe, 0x77)
	/Users/lni/src/go/src/runtime/netpoll.go:280 +0x357 fp=0xc0005d7368 sp=0xc0005d72e0 pc=0x102da37
internal/poll.setDeadlineImpl(0xc000374580, 0xbf686cdb238b51c8, 0xb9ac916d, 0x182d040, 0x77, 0x0, 0x0)
	/Users/lni/src/go/src/internal/poll/fd_poll_runtime.go:155 +0x149 fp=0xc0005d73d0 sp=0xc0005d7368 pc=0x108ec29
internal/poll.(*FD).SetWriteDeadline(...)
	/Users/lni/src/go/src/internal/poll/fd_poll_runtime.go:137
net.(*netFD).SetWriteDeadline(...)
	/Users/lni/src/go/src/net/fd_unix.go:280
net.(*conn).SetWriteDeadline(0xc0005de000, 0xbf686cdb238b51c8, 0xb9ac916d, 0x182d040, 0x0, 0x0)
	/Users/lni/src/go/src/net/net.go:262 +0x89 fp=0xc0005d7428 sp=0xc0005d73d0 pc=0x11f8149
github.com/lni/dragonboat/v3/internal/transport.(*connection).SetWriteDeadline(0xc000c66180, 0xbf686cdb238b51c8, 0xb9ac916d, 0x182d040, 0xbf686cdb238b51c8, 0xb9ac916d)
	/Users/lni/src/dragonboat/internal/transport/tcp.go:316 +0x51 fp=0xc0005d7468 sp=0xc0005d7428 pc=0x13b1c31
github.com/lni/dragonboat/v3/internal/transport.writeMessage(0x15521c0, 0xc000c66180, 0xc8, 0x35, 0x30922f7b, 0xc00024f840, 0x35, 0x35, 0xc00037e260, 0x12, ...)
	/Users/lni/src/dragonboat/internal/transport/tcp.go:157 +0x116 fp=0xc0005d74d8 sp=0xc0005d7468 pc=0x13b0986
github.com/lni/dragonboat/v3/internal/transport.(*TCPSnapshotConnection).SendSnapshotChunk(0xc000c661b0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/Users/lni/src/dragonboat/internal/transport/tcp.go:400 +0x120 fp=0xc0005d7568 sp=0xc0005d74d8 pc=0x13b2130
github.com/lni/dragonboat/v3/internal/transport.(*lane).sendChunk(0xc0002343f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/Users/lni/src/dragonboat/internal/transport/lane.go:236 +0x32f fp=0xc0005d78c0 sp=0xc0005d7568 pc=0x13abadf
github.com/lni/dragonboat/v3/internal/transport.(*lane).streamSnapshot(0xc0002343f0, 0x138530e, 0xc0000b6550)
	/Users/lni/src/dragonboat/internal/transport/lane.go:169 +0x1e7 fp=0xc0005d7da0 sp=0xc0005d78c0 pc=0x13aab37
github.com/lni/dragonboat/v3/internal/transport.(*lane).process(0xc0002343f0, 0xc0000ad2f0, 0xe)
	/Users/lni/src/dragonboat/internal/transport/lane.go:149 +0x40 fp=0xc0005d7dd8 sp=0xc0005d7da0 pc=0x13aa8b0
github.com/lni/dragonboat/v3/internal/transport.(*Transport).connectAndProcessSnapshot.func1(0xc0002343f0, 0xc0000ad2f0, 0xe, 0x64, 0x2, 0xc00012c1c0, 0xc0000cf680, 0x3, 0x0, 0x0, ...)
	/Users/lni/src/dragonboat/internal/transport/snapshot.go:160 +0x349 fp=0xc0005d7e68 sp=0xc0005d7dd8 pc=0x13cb929
github.com/lni/dragonboat/v3/internal/transport.(*Transport).connectAndProcessSnapshot(0xc00012c1c0, 0xc0002343f0, 0xc0000ad2f0, 0xe)
	/Users/lni/src/dragonboat/internal/transport/snapshot.go:166 +0xcb fp=0xc0005d7ee8 sp=0xc0005d7e68 pc=0x13aee6b
github.com/lni/dragonboat/v3/internal/transport.(*Transport).createConnection.func2()
	/Users/lni/src/dragonboat/internal/transport/snapshot.go:132 +0x64 fp=0xc0005d7f48 sp=0xc0005d7ee8 pc=0x13cb4f4
github.com/lni/goutils/syncutil.(*Stopper).runWorker.func1(0xc0000ba720, 0xc0000ad5c8, 0x0, 0x0, 0xc0000e1dc0)
	/Users/lni/go/pkg/mod/github.com/lni/goutils@v1.0.3/syncutil/stopper.go:104 +0x5f fp=0xc0005d7fb8 sp=0xc0005d7f48 pc=0x13a560f
runtime.goexit()
	/Users/lni/src/go/src/runtime/asm_amd64.s:1375 +0x1 fp=0xc0005d7fc0 sp=0xc0005d7fb8 pc=0x1063ad1
created by github.com/lni/goutils/syncutil.(*Stopper).runWorker
	/Users/lni/go/pkg/mod/github.com/lni/goutils@v1.0.3/syncutil/stopper.go:94 +0x98
@lni
Copy link
Author

lni commented Nov 5, 2019

81a74b4 (one commit before the new timer switch) works fine but SIGSEGV in 6becb03 (new timer switch).

@ianlancetaylor ianlancetaylor changed the title SIGSEGV in the new timer code runtime: SIGSEGV in the new timer code Nov 5, 2019
@ianlancetaylor ianlancetaylor added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker labels Nov 5, 2019
@ianlancetaylor ianlancetaylor added this to the Go1.14 milestone Nov 5, 2019
@ianlancetaylor
Copy link
Contributor

Is there a way that we can reproduce the problem ourselves?

@cuonglm
Copy link
Member

cuonglm commented Nov 5, 2019

@ianlancetaylor As I observe, somehow in modtimer, t.pp.ptr() can be nil. Does it mean the t was finished in other p, or the t itself is not on any p's heap?

A simple fix can be:

if tpp != nil && adjust != 0 {
    atomic.Xadd(&tpp.adjustTimers, adjust)
}

But I'm not sure it follows the right logic flow of new timer.

@ianlancetaylor
Copy link
Contributor

(That simple fix would be clearly wrong, sorry.)

I think it's a problem with a late change to adjusttimers. I think that adjusttimers sees timerModifiedEarlier or timerModifiedLater. It changes to timerMoving, updates the when field, and calls dodeltimer. dodeltimer sets the .pp field to 0. Then adjusttimers sets the timer to timerWaiting. But that is an inconsistent state: the timer is timerWaiting but is not attached to any P. That will be fixed shortly by the call to addAdjustedTimers. But if the program calls modtimer during that inconsistent state, it can see a crash like the one you describe.

@cuonglm
Copy link
Member

cuonglm commented Nov 5, 2019

(That simple fix would be clearly wrong, sorry.)

I think it's a problem with a late change to adjusttimers. I think that adjusttimers sees timerModifiedEarlier or timerModifiedLater. It changes to timerMoving, updates the when field, and calls dodeltimer. dodeltimer sets the .pp field to 0. Then adjusttimers sets the timer to timerWaiting. But that is an inconsistent state: the timer is timerWaiting but is not attached to any P. That will be fixed shortly by the call to addAdjustedTimers. But if the program calls modtimer during that inconsistent state, it can see a crash like the one you describe.

Thanks for details explanation. That's why I said I'm not sure it follow the right logic.

FWIW, I tried above simple fix, and go tool dist test passed.

@ianlancetaylor
Copy link
Contributor

The problem with skipping the change to adjustTimers is that we could be left with timers in timerModifiedEarlier status that were not recorded in the adjustTimers field for their P. With an unfortunate ordering of timers, that could cause the P to not observe the time that such a timer needs to fire, and for the P to sleep longer than it should, and for the timer to fire at an arbitrarily later time.

@gopherbot
Copy link

Change https://golang.org/cl/205418 mentions this issue: runtime: keep adjusted timers in timerMoving status until moved

@lni
Copy link
Author

lni commented Nov 6, 2019

@ianlancetaylor thanks for the detailed explanation. The change fixed the SIGSEGV issue in my tests.

@golang golang locked and limited conversation to collaborators Nov 5, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Projects
None yet
Development

No branches or pull requests

4 participants