Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: periodic crash in runtime/asm_amd64.s:623 #27978

Closed
rcoreilly opened this issue Oct 3, 2018 · 7 comments
Closed

runtime: periodic crash in runtime/asm_amd64.s:623 #27978

rcoreilly opened this issue Oct 3, 2018 · 7 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Milestone

Comments

@rcoreilly
Copy link

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go version go1.11 darwin/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

MacOS 10.13.6 Macbook Pro, amd64

What did you do?

calling cocoa api using cgo for gui app: https://github.com/goki/gi -- I tried to make a small standalone app to reproduce it and was unable to reproduce: https://github.com/rcoreilly/bug

If possible, provide a recipe for reproducing the error.
A complete runnable program is good.
A link on play.golang.org is best.

What did you expect to see?

no crashing :)

What did you see instead?

crashing :( This is very random -- mostly works fine but randomly crashes. I have attempted to protect everything with mutexes etc and nothing seems to make any difference. I have no idea what the proximal cause of the crash is -- can't seem to get good debugging in asm.s file or in the call above it. Couldn't find similar issues. Any ideas as to what might be going on here and how I could get more useful debugging info? Is it just some kind of random memory corruption? The C.doCloseWindow call is protected by a mutex that is used for all cgo calls. The "goroutines" list shows that a few other routines are at this exact same point in asm_amd64 -- both are in same thread..

cocoa.go:311
func closeWindow(id uintptr) {
	theApp.mu.Lock()
	w := theApp.windows[id]
	theApp.mu.Unlock()
	w.glctxMu.Lock()
	C.doCloseWindow(C.uintptr_t(id))  <- cgo call causing crash; w, id both look good / normal
	w.glctxMu.Unlock()
}

Command failed: bad access
> runtime.asmcgocall() /usr/local/Cellar/go/1.11/libexec/src/runtime/asm_amd64.s:623 (PC: 0x4061232)
Warning: debugging optimized function
   618:		JEQ	nosave
   619:		
   620:		// Switch to system stack.
   621:		MOVQ	m_g0(R8), SI
   622:		CALL	gosave<>(SB)
=> 623:		MOVQ	SI, g(CX)
   624:		MOVQ	(g_sched+gobuf_sp)(SI), SP
   625:	
   626:		// Now on a scheduling stack (a pthread-created stack).
   627:		// Make sure we have enough room for 4 stack-backed fast-call
   628:		// registers as per windows amd64 calling convention.

(dlv) bt
 0  0x0000000004061232 in runtime.asmcgocall
    at /usr/local/Cellar/go/1.11/libexec/src/runtime/asm_amd64.s:623
 1  0x0000000004005672 in runtime.cgocall
    at /usr/local/Cellar/go/1.11/libexec/src/runtime/cgocall.go:131
 2  0x0000000004884675 in github.com/goki/gi/oswin/driver/macdriver._Cfunc_doCloseWindow
    at _cgo_gotypes.go:334
 3  0x0000000004887f7a in github.com/goki/gi/oswin/driver/macdriver.closeWindow
    at /Users/oreilly/go/src/github.com/goki/gi/oswin/driver/macdriver/cocoa.go:316
 4  0x0000000004883db1 in github.com/goki/gi/oswin/driver/macdriver.(*windowImpl).Close
    at /Users/oreilly/go/src/github.com/goki/gi/oswin/driver/macdriver/window.go:436
 5  0x00000000043fc603 in github.com/goki/gi.(*Dialog).Close
    at /Users/oreilly/go/src/github.com/goki/gi/dialogs.go:205
 6  0x00000000043fc711 in github.com/goki/gi.(*Dialog).Accept
    at /Users/oreilly/go/src/github.com/goki/gi/dialogs.go:223
 7  0x000000000448cfc5 in github.com/goki/gi.(*Dialog).Open.func2
    at /Users/oreilly/go/src/github.com/goki/gi/dialogs.go:157
 8  0x0000000004482006 in github.com/goki/gi.(*WinEventRecv).Call
    at /Users/oreilly/go/src/github.com/goki/gi/window.go:1411
 9  0x0000000004482f17 in github.com/goki/gi.(*Window).SendEventSignal
    at /Users/oreilly/go/src/github.com/goki/gi/window.go:1532
10  0x0000000004480549 in github.com/goki/gi.(*Window).EventLoop
    at /Users/oreilly/go/src/github.com/goki/gi/window.go:1285

Frame 1: /usr/local/Cellar/go/1.11/libexec/src/runtime/cgocall.go:131 (PC: 4005672)
   126:		// and then re-enter the "system call" reusing the PC and SP
   127:		// saved by entersyscall here.
   128:		entersyscall()
   129:	
   130:		mp.incgo = true
=> 131:		errno := asmcgocall(fn, arg)
   132:	
   133:		// Call endcgo before exitsyscall because exitsyscall may
   134:		// reschedule us on to a different M.
   135:		endcgo(mp)
   136:	
(dlv) p fn 
(unreadable could not find loclist entry at 0xc4e for address 0x4005671)
(dlv) p mp
(unreadable could not find loclist entry at 0xc71 for address 0x4005671)

I get the same "could not find loclist entry.." errors in the other routines..

Thanks for any insights!

@agnivade agnivade changed the title periodic crash in runtime/asm_amd64.s:623 runtime: periodic crash in runtime/asm_amd64.s:623 Oct 3, 2018
@agnivade
Copy link
Contributor

agnivade commented Oct 3, 2018

Can you try with 1.10 and see if it crashes there too ? It would be good to see if this is a new bug or was always there.

@ianlancetaylor ianlancetaylor added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker labels Oct 3, 2018
@ianlancetaylor ianlancetaylor added this to the Go1.12 milestone Oct 3, 2018
@rcoreilly
Copy link
Author

yes it does happen with 1.10 -- just got it. Line 674 in this version:

Frame 0: /usr/local/Cellar/go@1.10/1.10.4/libexec/src/runtime/asm_amd64.s:674 (PC: 405e912)
   669:		JEQ	nosave
   670:		
   671:		// Switch to system stack.
   672:		MOVQ	m_g0(R8), SI
   673:		CALL	gosave<>(SB)
=> 674:		MOVQ	SI, g(CX)
   675:		MOVQ	(g_sched+gobuf_sp)(SI), SP
   676:	

@rcoreilly
Copy link
Author

I turned off the cursor blinking which uses a time.NewTicker -- that is what was those other routines sitting at asm_amd64.s:623 were. But just got the crash again. So that doesn't seem relevant, which is consistent with the "bug" version that also fails to replicate the crash, despite starting a lot of those tickers..

Anyway, would definitely appreciate if there is anything you could tell me about WHY it might be randomly crashing so reliably at this one place (i.e., what is that line of code doing, and how can I figure out what bit of memory is now corrupted / wrong etc to cause the crash)? The random nature of the crash combined with the fact that it ALWAYS happens at this one particular line of code seems like it can't be just pure memory corruption...

  Goroutine 1 - User: _cgo_gotypes.go:688 github.com/goki/gi/oswin/driver/macdriver._Cfunc_startDriver (0x4885af1)
  Goroutine 2 - User: /usr/local/Cellar/go/1.11/libexec/src/runtime/proc.go:303 runtime.gopark (0x4033ed4)
  Goroutine 3 - User: /Users/oreilly/go/src/github.com/goki/gi/oswin/driver/macdriver/cocoa.go:235 github.com/goki/gi/oswin/driver/macdriver.drawLoop (0x48877e4)
  Goroutine 4 - User: /usr/local/Cellar/go/1.11/libexec/src/runtime/sema.go:510 sync.runtime_notifyListWait (0x404633b)
  Goroutine 5 - User: /usr/local/Cellar/go/1.11/libexec/src/runtime/proc.go:303 runtime.gopark (0x4033ed4)
  Goroutine 18 - User: /usr/local/Cellar/go/1.11/libexec/src/runtime/proc.go:303 runtime.gopark (0x4033ed4)
  Goroutine 19 - User: /usr/local/Cellar/go/1.11/libexec/src/runtime/proc.go:303 runtime.gopark (0x4033ed4)
  Goroutine 20 - User: /usr/local/Cellar/go/1.11/libexec/src/runtime/proc.go:303 runtime.gopark (0x4033ed4)
  Goroutine 21 - User: /usr/local/Cellar/go/1.11/libexec/src/runtime/proc.go:303 runtime.gopark (0x4033ed4)
  Goroutine 22 - User: /usr/local/Cellar/go/1.11/libexec/src/runtime/proc.go:303 runtime.gopark (0x4033ed4)
  Goroutine 34 - User: /usr/local/Cellar/go/1.11/libexec/src/runtime/proc.go:303 runtime.gopark (0x4033ed4)
  Goroutine 35 - User: /usr/local/Cellar/go/1.11/libexec/src/runtime/proc.go:303 runtime.gopark (0x4033ed4)
  Goroutine 36 - User: /usr/local/Cellar/go/1.11/libexec/src/runtime/proc.go:303 runtime.gopark (0x4033ed4)
  Goroutine 37 - User: /usr/local/Cellar/go/1.11/libexec/src/runtime/proc.go:303 runtime.gopark (0x4033ed4)
  Goroutine 38 - User: /usr/local/Cellar/go/1.11/libexec/src/runtime/proc.go:303 runtime.gopark (0x4033ed4)
  Goroutine 50 - User: /usr/local/Cellar/go/1.11/libexec/src/runtime/proc.go:303 runtime.gopark (0x4033ed4)
  Goroutine 51 - User: /usr/local/Cellar/go/1.11/libexec/src/runtime/proc.go:303 runtime.gopark (0x4033ed4)
  Goroutine 52 - User: /usr/local/Cellar/go/1.11/libexec/src/runtime/proc.go:303 runtime.gopark (0x4033ed4)
  Goroutine 66 - User: /usr/local/Cellar/go/1.11/libexec/src/runtime/sema.go:56 sync.runtime_Semacquire (0x4044f89)
  Goroutine 68 - User: /usr/local/Cellar/go/1.11/libexec/src/runtime/proc.go:303 runtime.gopark (0x4033ed4)
  Goroutine 82 - User: /usr/local/Cellar/go/1.11/libexec/src/runtime/proc.go:303 runtime.gopark (0x4033ed4)
  Goroutine 88 - User: /Users/oreilly/go/src/github.com/goki/gi/oswin/driver/macdriver/cocoa.go:235 github.com/goki/gi/oswin/driver/macdriver.drawLoop (0x48877e4)
  Goroutine 160 - User: /usr/local/Cellar/go/1.11/libexec/src/runtime/sema.go:510 sync.runtime_notifyListWait (0x404633b)
  Goroutine 162 - User: /usr/local/Cellar/go/1.11/libexec/src/runtime/proc.go:303 runtime.gopark (0x4033ed4)
  Goroutine 658 - User: /usr/local/Cellar/go/1.11/libexec/src/runtime/proc.go:303 runtime.gopark (0x4033ed4)
  Goroutine 1050 - User: /usr/local/Cellar/go/1.11/libexec/src/runtime/netpoll.go:173 internal/poll.runtime_pollWait (0x402e8ae)
  Goroutine 2818 - User: _cgo_gotypes.go:334 github.com/goki/gi/oswin/driver/macdriver._Cfunc_doCloseWindow (0x4884675)
  Goroutine 2819 - User: /Users/oreilly/go/src/github.com/goki/gi/window.go:1510 github.com/goki/gi.(*Window).SendEventSignal (0x4482c1c) (thread 1651889)

@rcoreilly
Copy link
Author

Also, it isn't just the close window call that causes the crash: this clipClear is the ONLY other one that also causes the crash. There is NOTHING special I can see about this clipClear call, so it is very puzzling.

(dlv) bt
 0  0x0000000004060ba2 in runtime.asmcgocall
    at /usr/local/Cellar/go/1.11/libexec/src/runtime/asm_amd64.s:623
 1  0x0000000004004fe2 in runtime.cgocall
    at /usr/local/Cellar/go/1.11/libexec/src/runtime/cgocall.go:131
 2  0x0000000004884291 in github.com/goki/gi/oswin/driver/macdriver._Cfunc_clipClear
    at _cgo_gotypes.go:237
 3  0x000000000488aaec in github.com/goki/gi/oswin/driver/macdriver.(*clipImpl).Clear
    at /Users/oreilly/go/src/github.com/goki/gi/oswin/driver/macdriver/cocoa.go:762
 4  0x000000000488a89a in github.com/goki/gi/oswin/driver/macdriver.(*clipImpl).Write
    at /Users/oreilly/go/src/github.com/goki/gi/oswin/driver/macdriver/cocoa.go:739
 5  0x00000000047e42be in github.com/goki/gi/giv.(*TextView).Copy
    at /Users/oreilly/go/src/github.com/goki/gi/giv/textview.go:1602
 6  0x00000000047ed946 in github.com/goki/gi/giv.(*TextView).KeyInput
    at /Users/oreilly/go/src/github.com/goki/gi/giv/textview.go:2773
 7  0x00000000048122ff in github.com/goki/gi/giv.(*TextView).TextViewEvents.func4
    at /Users/oreilly/go/src/github.com/goki/gi/giv/textview.go:3041
 8  0x0000000004481b16 in github.com/goki/gi.(*WinEventRecv).Call
    at /Users/oreilly/go/src/github.com/goki/gi/window.go:1411
 9  0x0000000004482a27 in github.com/goki/gi.(*Window).SendEventSignal
    at /Users/oreilly/go/src/github.com/goki/gi/window.go:1532
10  0x0000000004480059 in github.com/goki/gi.(*Window).EventLoop
    at /Users/oreilly/go/src/github.com/goki/gi/window.go:1285

@rcoreilly
Copy link
Author

Actually, I think I might have found an issue with clipClear -- will test and report back later. If so, it suggests that this is a failure mode for multiple free in the objective C code -- clipClear was potentially freeing something that it didn't own anymore..

@bradfitz bradfitz added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Oct 3, 2018
@rcoreilly
Copy link
Author

no crashing so far in clipClear -- it seems highly likely that this is just a double-free corruption kind of issue.

The one remaining question is: shouldn't this be a relatively common type of crash when using cgo, and would it be possible to add just a tiny bit of checking at the point of the crash, along with some more useful diagnostic message that this is likely some kind of c-side memory corruption issue?

@rcoreilly
Copy link
Author

in case anyone finds this later: the runtime/asm_amd64.s:623 appears to be some kind of wait state around the cgo call, and is NOT where the crash is happening -- it is just the delve doesn't give you accurate debugging info for the cgo part of the code. Now that I'm using lldb (couldn't get gdb working on mac despite all attempts), I can see where the crash is actually happening deep inside of AppKit.. Still no idea about what is actually causing the crash, but anyway it is not a go issue.

@golang golang locked and limited conversation to collaborators Oct 6, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Projects
None yet
Development

No branches or pull requests

5 participants