New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net: 1.5rc1 testing: stuck for long time if there are many TCP connections #12051
Comments
Can you please try one or more of the following and report your results on this issue.
Thanks Dave
|
Is your code available for us to try to recreate this problem? |
@davecheney I can do first and second now, there is the output:
Then there is no output and process stuck. I
|
On server: the last log before push is
Then I push messages and stuck, I
|
@yangzhe1991 If you make it get stuck many times, and you kill -3 it each time it is stuck, is that first stack always munmap called from sysFree called from gcCopySpans, or does it move around? Thanks. |
Also, did you remove all your own goroutines from the stack trace? I see nothing but system goroutines. Maybe the problem is that your program is deadlocked. Even if you can't show us the stack traces, please look at what your goroutines are doing and make sure they're not just staring at each other. Thanks. |
The client code is very simple, each goroutine dial tcp to server and block at Read like this:
or
The only difference is IO wait and runnable. And I have another 2 goroutines that I didn't post, one is
to sleep forever in the end of main() and another is to output log per second |
@rsc I think the goroutine "munmap called from sysFree called from gcCopySpans" is only if there is a fatal |
And if I only build 500,000 tcp connections, it will work well. So I doubt it is a performance issue. |
Phil, to echo Ian's request, the simplest path to a solution is if you can Thanks Dave On Fri, Aug 7, 2015 at 3:20 PM, Phil Yang notifications@github.com wrote:
|
I simplify the code on https://gist.github.com/yangzhe1991/c43f66aba140d651ffde Then I use go 1.4.2 to build them, the server can not handle 1.5 million, either :( So I think 1.5's performance is a little lower than 1.4? And is it expected that it will stuck for many minutes? |
Thanks for the test case. Your code has a number of race conditions. The one on the global variable "list" in server.go looks particularly serious. One goroutine is appending values to list while a different goroutine is iterating over it. Your first step in analyzing this problem has to be to remove all these race conditions. You should see them if you build and run your programs with the -race option. See http://blog.golang.org/race-detector for background on the race detector. |
Hi And I find another issue that in the client code(which I can not post,so sorry) when GC triggered, there are a small stuck for two or three seconds that it will not build any connection at all. Log:
and
and
and more You can see the hbCount is not increased 2-3 seconds before gc log output. I have no idea about the meaning of gc logs' timing like And moreover, if I turn off client's GC, the pushing can work well just like 1.4.2, won't stuck any more |
And when I
There should be about 1 million goroutines, but the log stop when it has printed only 29309 goroutines |
Another try, output is not same for every time. After
the last part:
|
To make progress on this we really need working, non-racy, code that demonstrates this problem. Thanks. |
Timed out in state WaitingForInfo. Closing. (I am just a bot, though. Please speak up if this is a mistake or you have the requested information.) |
Hi,
I use 1.5rc to test my program, which will hold millions of tcp connections and push messages to client.
I test the program by make a fake client(also in go) and build 1 million tcp to the server. Server bind 100 ports and client connect 10000 tcp per port so totally 1,000,000. Server and fake client are on same server and both have
runtime.GOMAXPROCS(runtime.NumCPU())
.If I build them on go 1.4.2(cross-platform compile from my MacBook to linux/amd64 server), when I push message to all tcp connections almost concurrently, the CPU is high but it works as expected that all client tcp can receive messages within less than 20 seconds.
However if I build them with same code on go 1.5rc1(also cross-platform compile), when I push message, both server and client are stuck and system load is high for several minutes. Two process' status will sometimes change to 'D' which means "uninterruptible sleep" although CPU may be 0.0% at all. If I kill client, the server will become normal.
Why 1.4.2 works well but 1.5rc1 not? Does 1.5 has TCP performance issue or goroutine scheduling issue?
Thanks
The text was updated successfully, but these errors were encountered: