Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: spanBytesAlloc underflow on windows 32bit #18043

Closed
fmzquant opened this issue Nov 25, 2016 · 24 comments
Closed

runtime: spanBytesAlloc underflow on windows 32bit #18043

fmzquant opened this issue Nov 25, 2016 · 24 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@fmzquant
Copy link

bug

this happened in windows 32 bit some times, never happen in linux 64 bit,
Very difficult to reproduce.

@fmzquant
Copy link
Author

go version is 1.7.3

@odeke-em odeke-em changed the title spanBytesAlloc underflow under windows 32 bit runtime: spanBytesAlloc underflow on windows 32bit Nov 25, 2016
@odeke-em
Copy link
Member

/cc @aclements @RLH

@aclements aclements added this to the Go1.8 milestone Nov 29, 2016
@aclements aclements added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Nov 29, 2016
@aclements
Copy link
Member

@botvs, thanks for the report. I think I know what's going on. You mentioned that this is "very difficult to reproduce", which suggests that you have at least been able to reproduce it. If I sent you a patch to add some additional debugging to the runtime to confirm my suspicion, how hard do you think it would be for you to test?

@aclements
Copy link
Member

As for what I think it going on: if mcentral.cacheSpan is preempted by mark termination between the call to deductSweepCredit and the call to reimbuseSweepCredit, then mheap_.spanBytesAlloc will be zeroed and reimbuseSweepCredit will underflow. Probably the right fix is for reimbuseSweepCredit to check if we're still in the same GC cycle as the deductSweepCredit it's supposed to be balanced with and just do nothing if we're not.

@aclements
Copy link
Member

On the other hand, mcentral.cacheSpan happens while g.m.locks != 0, so it's not, in fact, preemptible.

@aclements
Copy link
Member

Another plausible but incorrect theory is that mheap_.spanBytesAlloc wasn't 8-byte aligned, since 1.7.3's runtime/internal/atomic.Xadd64 didn't check this (it does on master), but it appears it is in fact aligned. (And I'm pretty sure Xadd64 genuinely doesn't require alignment on 386 like the other 64-bit atomics do, though that's hard to confirm.)

@fmzquant
Copy link
Author

fmzquant commented Dec 2, 2016

I'm sorry, I just saw this. How to patch it ?

@aclements
Copy link
Member

@botvs, are you set up to compile your own Go toolchain? If not, that's the first step to patching. If so, I can upload a change to the code review tool and you can patch that in pretty easily. I'll have to figure out exactly what debug info I want, though, since none of my theories so far have actually panned out.

@aclements
Copy link
Member

@botvs, ping.

@aclements aclements added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Dec 13, 2016
@aclements aclements modified the milestones: Go1.8Maybe, Go1.8 Dec 13, 2016
@nathanjackson
Copy link

Hello, I am facing the exact same issue in Go 1.6.3 installed from the Fedora EPEL on CentOS 7.3.1611 64-bit. Any chance your patch might fix it? Its really not clear where the issue is coming from on my end.

@aclements
Copy link
Member

@nathanjackson, can you post more details, such as the error and stack trace?

I don't have a patch to fix this yet because I don't know what the problem is and haven't been able to reproduce this locally. If I made a patch against Go 1.6.3 to add more debugging information, would you be able to apply it and reproduce the issue?

@nathanjackson
Copy link

@aclements I should be able to do that. I'll see what I can gather and post.

If it helps at all, setting GODEBUG to gcstoptheworld=2 prevents the crash.

@nathanjackson
Copy link

@aclements Is there any way I can e-mail you my stack trace? I don't think I can post it publicly.

@aclements
Copy link
Member

aclements commented Dec 14, 2016

@nathanjackson, you can email it to austin <at> google.com. Or feel free to redact it. I really only need to see the frames in package runtime.

@nathanjackson
Copy link

Alright, so I have to get approval in order to send the stack trace unfortunately. However, I can say it looks nearly identical to the above stack trace from OP's screenshot.

@aclements
Copy link
Member

@botvs or @nathanjackson, if either of you could try to reproduce this with https://go-review.googlesource.com/34612 applied to your Go tree, it could really help me figure this out. It's based on current master, but I'm pretty sure you can cherry-pick it cleanly to any 1.7 checkout using the following command: git fetch https://go.googlesource.com/go refs/changes/12/34612/1 && git cherry-pick FETCH_HEAD

@gopherbot
Copy link

CL https://golang.org/cl/34612 mentions this issue.

@nathanjackson
Copy link

@aclements You should have an e-mail.

I will see what I can do with your patch. Something interesting that I found on my end, if I use 1.7.3 or better, the error does go away, but one of three things happens, randomly.

  1. The program executes completes without error.
  2. Panic with a leftover defer error.
  3. Deadlock, the program seems to just stop executing.

All of these issues go away with GODEBUG=gcstoptheworld=1.

@aclements
Copy link
Member

@nathanjackson, thanks for the traceback. Does your application use unsafe, cgo, or fail in -race mode? I ask because the original report shows a reasonable argument to reimburseSweepCredit (even though it then crashes because the global state is wrong), but your traceback shows a huge argument to reimbuseSweepCredit, which suggests a different root cause in which the span state (or possibly the stack) got corrupted. Incorrect use of unsafe, cgo, or having races could also cause the other failures you're seeing.

@nathanjackson
Copy link

@aclements Sorry for not getting back with you sooner. I looked into this issue and it turns out that one of the libraries I was using was incorrectly using cgo\unsafe. Writing my own interface around the C library corrected the issue in 1.6.3 and 1.7.4. Thank you for your help.

@aclements
Copy link
Member

@nathanjackson, no worries. Sorry for taking so long to look at your traceback. Thanks for the update!

@aclements
Copy link
Member

@botvs, were you able to try the patch that I posted? Also, does your application use unsafe, cgo, or fail in -race mode (I should have asked this at the beginning).

gopherbot pushed a commit that referenced this issue Jan 10, 2017
Updates #18043.

Change-Id: I24e687fdd5521c48b672987f15f0d5de9f308884
Reviewed-on: https://go-review.googlesource.com/34612
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@aclements
Copy link
Member

Moving to Go 1.9. Hopefully we'll get some more diagnostics from Go 1.8.

@aclements aclements modified the milestones: Go1.9, Go1.8Maybe Jan 17, 2017
@bradfitz bradfitz removed the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Mar 21, 2017
@gopherbot
Copy link

CL https://golang.org/cl/39832 mentions this issue.

@golang golang locked and limited conversation to collaborators Apr 21, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

6 participants