Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: goroutines aren't scheduled in time #29394

Closed
zhao-kun opened this issue Dec 22, 2018 · 4 comments
Closed

runtime: goroutines aren't scheduled in time #29394

zhao-kun opened this issue Dec 22, 2018 · 4 comments
Labels
FrozenDueToAge WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.

Comments

@zhao-kun
Copy link

zhao-kun commented Dec 22, 2018

What version of Go are you using (go version)?

$ go 1.8.3

Does this issue reproduce with the latest release?

Unkonwn

What operating system and processor architecture are you using (go env)?

The program run in Centos7.4

What did you do?

I have a kubernetes cluster which version is 1.7.4, build with go 1.8.3. Yesterday at noon, one of the nodes of my cluster didn't work, which stopped reporting its status to master. The node didn't work due to a program named the kubelet hung. I checked the logs, found the program stop printing log at 12:09, and after 20 minutes I killed ABRT the kubelet process. After dumping all goroutines stacktrace it exited.

I grep goroutine, the result is:

...
3144:12月 21 12:31:29 machine-name kubelet[28486]: goroutine 6413899 [select, 22 minutes]:
3149:12月 21 12:31:29 machine-name kubelet[28486]: goroutine 6121144 [select, 22 minutes]:
3154:12月 21 12:31:29 machine-name kubelet[28486]: goroutine 6147441 [sleep, 22 minutes]:
3161:12月 21 12:31:29 machine-name kubelet[28486]: goroutine 16532140 [chan receive, 1283 minutes]:
...

I notice many gorutines were blocked about 20 minitues which is almost to kubelet's hunging time
I check the goroutine 614744 stacktrace which is

...
12月 21 12:31:29 machine-name kubelet[28486]: goroutine 6147441 [sleep, 22 minutes]:
12月 21 12:31:29 machine-name kubelet[28486]: time.Sleep(0x2ba79f8ef)
12月 21 12:31:29 machine-name kubelet[28486]: /usr/local/go/src/runtime/time.go:59 +0xf9
12月 21 12:31:29 machine-name kubelet[28486]: k8s.io/kubernetes/vendor/github.com/google/cadvisor/manager.(*containerData).housekeeping(0xc422794f00)
12月 21 12:31:29 machine-name kubelet[28486]: /workspace/anago-v1.7.16-beta.0.18+e8846c1d7e7e63/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/google/cadvisor/manager/container.go:457 +0x340
....

I found the goroutine hung at time.Sleep(0x2ba79f8ef) , the parameter value is 0x2ba79f8ef and nearly equal 11 seconds which is expected as code logic.

My question is the goroutine should sleep nearly 11 seconds, but why did it block almost 22 minutes? I think that the golang's runtime didn't schedule it in time, What's happened at the 12:09 which made scheduler didn't work

PS: there are about 83 goroutines blocked at sleep.

The attachment is whole goroutine stacktrace, I desensitized machine name.

abort.log

@agnivade
Copy link
Contributor

Hi @zhao-kun - 1.8.3 is quite an old version. Please give a try with 1.12 beta1 and let us know.

Also, without proper steps to reproduce this issue, it is very hard to understand what's going on by just looking at the stack trace. Is there a way you can give us the exact steps for us to reproduce this issue ?

@agnivade agnivade added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Dec 24, 2018
@zhao-kun
Copy link
Author

Hi @agnivade , It's hard to reproduce. We have no extra operation on our K8s cluster (or Kubelet program), we run it in the normal way. But we have known there are some issues in the 1.7.4 version of the K8s, especially in the Cadvisor implementation, we have the plan to upgrade to the K8s latest version in the future.

So if current information is too few to help you diagnose the problem, can you give me some pieces of advice from the Golang aspect, we can do something to help diagnose it when the issue occurs next time

@agnivade
Copy link
Contributor

You could try with the 1.12beta version and see.

Overall, it is hard to say whether the problem is with Go or K8s. I would advise filing an issue on the K8s repo and investigating that. And only when there is some concrete evidence that this is a Go issue, then file a new issue with proper repro steps.

@odeke-em odeke-em changed the title Goroutines aren't scheduled in time runtime: goroutines aren't scheduled in time Dec 25, 2018
@agnivade agnivade added WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. and removed WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. labels Dec 27, 2018
@gopherbot
Copy link

Timed out in state WaitingForInfo. Closing.

(I am just a bot, though. Please speak up if this is a mistake or you have the requested information.)

@golang golang locked and limited conversation to collaborators Jan 27, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Projects
None yet
Development

No branches or pull requests

3 participants