Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: split stack overflow on *-386 #35349

Closed
FiloSottile opened this issue Nov 4, 2019 · 12 comments
Closed

runtime: split stack overflow on *-386 #35349

FiloSottile opened this issue Nov 4, 2019 · 12 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Milestone

Comments

@FiloSottile
Copy link
Contributor

https://storage.googleapis.com/go-build-log/2a4850e8/linux-386_a62e5e86.log from the https://golang.org/cl/205057 TryBot.

runtime: newstack sp=0x8e98e8c stack=[0x8e99000, 0x8e9a000]
	morebuf={pc:0x8126ddd sp:0x8e98e94 lr:0x0}
	sched={pc:0x8127353 sp:0x8e98e90 lr:0x0 ctxt:0x0}
runtime: gp=0x8c5a2a0, goid=37, gp->status=0x2
 runtime: split stack overflow: 0x8e98e8c < 0x8e99000
fatal error: runtime: split stack overflow

/cc @ianlancetaylor @aclements

@FiloSottile FiloSottile added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker labels Nov 4, 2019
@FiloSottile FiloSottile added this to the Go1.14 milestone Nov 4, 2019
@FiloSottile
Copy link
Contributor Author

The only hit on greplogs in recent months which does not look like #22553 is 2019-09-23T16:50:00-a14efb1/plan9-386-0intro.

@ianlancetaylor
Copy link
Contributor

Well that's strange. I can't think of any reason why that error would not always happen. There aren't any random elements that I can see.

@bcmills
Copy link
Contributor

bcmills commented Nov 7, 2019

@bcmills
Copy link
Contributor

bcmills commented Nov 7, 2019

There aren't any random elements that I can see.

With a garbage collector, there are always random elements. (Is it possible that this has something to do with the GC's stack-shrinking?)

@bcmills
Copy link
Contributor

bcmills commented Nov 8, 2019

@bcmills bcmills changed the title runtime: split stack overflow runtime: split stack overflow o6 Nov 8, 2019
@bcmills bcmills changed the title runtime: split stack overflow o6 runtime: split stack overflow on linux-386 Nov 8, 2019
@bcmills bcmills changed the title runtime: split stack overflow on linux-386 runtime: split stack overflow on *-386 Nov 8, 2019
@ianlancetaylor
Copy link
Contributor

In each case the stack trace looks approximately like

cmd/compile/internal/gc.(*ssafn).Log(0xa0678a0, 0x0)
	/workdir/go/src/cmd/compile/internal/gc/ssa.go:6792 +0x24 fp=0x985a564 sp=0x985a560 pc=0x887bc04
cmd/compile/internal/ssa.(*Func).Log(...)
	/workdir/go/src/cmd/compile/internal/ssa/func.go:624
cmd/compile/internal/ssa.Compile(0xa004cc0)
	/workdir/go/src/cmd/compile/internal/ssa/compile.go:32 +0x6a fp=0x985d878 sp=0x985a564 pc=0x820c31a
cmd/compile/internal/gc.buildssa(0x988a0d0, 0x0, 0x0)
	/workdir/go/src/cmd/compile/internal/gc/ssa.go:444 +0xa71 fp=0x985d948 sp=0x985d878 pc=0x884ad31
cmd/compile/internal/gc.compileSSA(0x988a0d0, 0x0)
	/workdir/go/src/cmd/compile/internal/gc/pgen.go:298 +0x52 fp=0x985d9e4 sp=0x985d948 pc=0x881c132

@bcmills
Copy link
Contributor

bcmills commented Nov 8, 2019

That makes it even stranger, because that function is literally trivial:

func (e *ssafn) Log() bool {
return e.log
}

@bcmills
Copy link
Contributor

bcmills commented Nov 8, 2019

So, maybe something to do with inlining?

@ianlancetaylor
Copy link
Contributor

That function is trivial, but that's just the function where the problem is noticed. The function that is pushing the stack too low is cmd/compile/internal/ssa.Compile, which has quite a large stack frame. You can see this by looking at the changes in sp values in the traceback.

@ianlancetaylor
Copy link
Contributor

OK, let's consider the possibility of an ill-timed shrinkstack. The total available stack is 0x2000 bytes. Of that, the ssa.Compile function needs 0x3314 bytes, which overflows. Each time the stack shrinks, it is cut in half. So let's say that the stack was originally 0x4000 bytes, which would leave enough room for ssa.Compile. So we would be in trouble if the stack shrank after ssa.Compile decided it had enough space but before ssa.Compile actually adjusted the stack pointer.

The prologue of ssa.Compile looks like this:

  compile.go:29         0x82104d0               658b0d00000000          MOVL GS:0, CX                                                   
  compile.go:29         0x82104d7               8b89fcffffff            MOVL 0xfffffffc(CX), CX                                         
  compile.go:29         0x82104dd               8b7108                  MOVL 0x8(CX), SI                                                
  compile.go:29         0x82104e0               81fedefaffff            CMPL $0xfffffade, SI                                            
  compile.go:29         0x82104e6               0f8433130000            JE 0x821181f                                                    
  compile.go:29         0x82104ec               8d842480030000          LEAL 0x380(SP), AX                                              
  compile.go:29         0x82104f3               29f0                    SUBL SI, AX                                                     
  compile.go:29         0x82104f5               3d10360000              CMPL $0x3610, AX                                                
  compile.go:29         0x82104fa               0f861f130000            JBE 0x821181f                                                   
  compile.go:29         0x8210500               81ec10330000            SUBL $0x3310, SP                                                

So it seems to me that we would be in trouble if something shrank the stack while executing from address 0x82104f3 through address 0x82104fa.

But I don't see any way that could happen. The runtime will not shrink the stack of a goroutine that was preempted due to a signal.

I have not been able to recreate the problem on my laptop.

@cherrymui
Copy link
Member

Just a guess:

  compile.go:29         0x82104d0               658b0d00000000          MOVL GS:0, CX                                                   
  compile.go:29         0x82104d7               8b89fcffffff            MOVL 0xfffffffc(CX), CX                                         

this is loading G from TLS. If it is preempted between these two instructions, parked, and then resumed on a different thread, the TLS address may become invalid (still pointing to the TLS in the old thread), therefore may load a wrong stack bound.

If this is the case, two-instruction TLS access probably needs to be marked nonpreemptible.

@gopherbot
Copy link

Change https://golang.org/cl/206903 mentions this issue: cmd/internal/obj/x86: mark 2-instruction TLS access nonpreemptible

@golang golang locked and limited conversation to collaborators Nov 14, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Projects
None yet
Development

No branches or pull requests

5 participants