-
Notifications
You must be signed in to change notification settings - Fork 18k
runtime: failed to create new OS thread (have 7466 already; errno=1455) on Windows 10 #23613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
FYI: errno 1455 is ERROR_COMMITMENT_LIMIT. And number of threads limit on Windows is around 6600.
|
syscall.Socket is a blocking call, so each call to syscall.Socket will take up 1 thread. If you issue 10000 simultaneous syscall.Socket calls and these take long time to complete, you will run out of threads / memory. Is that what is happening in your program? ALex |
My program spawns up less than 40 goroutines total. So it's quite weird to see over 7000 threads created. That's why I'm reporting the issue here. By avoiding use of bulk update API in Elasticsearch, I can at least workaround the problem even on Windows. So this is not about me not being able to solve my issue. :) Memory is reported to stay stable around 220MB according to the task manager (when no OS threads stack up). So I'm quite sure it's not a memory issue. As I described above, on Mac, there is no such problem observed with the exactly same code, against the same Elasticsearch node, and with the same input. As I mentioned, I plan to make a small sample code to reproduce the problem around Elasticsearch. But wanted to let others know before the sample available with a hope that someone may know what's going wrong with the symptom only. |
I can post the whole log files if it helps. But if all of you think that this issue is not worth more investigation only with the currently available information. Then feel free to close this issue. I'll put more effort to understand the situation and come back when I got enough context. Thanks. |
I think this is worth more investigation. Please do make the complete stack dump available somewhere, perhaps in a gist. Thanks. |
Here goes a full log file. The REST call I make to an Elasticsearch node in this case is a bulk update API call (actually 20 updates in a batch). If I switch to 20 separate non-bulk updates, then the problem goes away. But AFAIK, the two APIs share the same networking code. I'll try to create a sample project that does the bulk API calls from fixed number of goroutines, in the hope that the problem can be at least consistently reproducible, and come back. |
Here goes another. This and the previous one are all taken under |
gitlab.com/fluidic/insight-ingester is closed source? |
@mattn yes, unfortunately. That's why I want to separate out the troublesome part in a separate project. If I succeed, will share here after making it available on GitHub. |
@dynaxis Thanks for the logs. The first log shows 14,962 goroutines. So I question your statement above that your program spawns less than 40 goroutines total. Something in your program is creating many more goroutines than that. There are 7454 goroutines waiting in Essentially all the
Those line numbers are not plausible for Go1.10rc1, so I assume this log was generated using Go1.9.3. These goroutines are waiting for an HTTP connection to complete. The
This is the other side of the goroutines waiting for a dial to complete: it's the goroutines trying to dial. They are all waiting for Can you show us a stack dump with |
I'll try to run the troublesome code under BTW, I've also been very curious on why Winsock |
The following log is taken with I tried on |
Ok, this time on Hope this helps. |
Thanks. That wasn't as helpful as I hoped. Your new4.log file shows 3193 goroutines mostly waiting to complete a call to I think these goroutines are being started by
I recommend that you take a close look at how often that function is being called. If that is being called thousands of times, and then the operation is canceled, that would cause what you are seeing. |
@ianlancetaylor the Anyway, examining more into its source code, I found that it creates a goroutine per Elasticsearch node (in my case, only one) for testing if each Elasticsearch node is alive with a timeout of 1 second. I previously overlooked that part. But it kicks in only once in a minute or when all nodes are previously marked as dead on Also I don't set any deadline to the context to be used for HTTP requests inside the Elasticsearch client, and don't cancel it. Only use Anyway I'll try to separate the case out to a separate simple project in the hope that it will make the examination of this issue easier. Thanks. |
@ianlancetaylor I'll also take a closer look at the frequency of |
I did my homework, and think the following message is the key to understanding the cause of this issue: "Only one usage of each socket address (protocol/network address/port) is normally permitted" It's from Windows, and caused by too many TCP connections in On Mac, the default timeout for But since my code keeps issuing many REST calls (from 38 goroutines via Elasticsearch client), I don't clearly get why that many TCP connections are stacked up in If it's perfectly normal to have this kind of situations, then I think it's safe to close this issue. |
The |
I think it's better to investigate more by inserting more logs here and there to the current code than preparing a kernel of it that manifests this problem for now. I'll dig a little bit more. |
Turned off all possible requests made by spawning new goroutines and all the retries. But I still see immediate and relatively fast increase in the number of According to the manual, Elasticsearch defaults to keep connections alive. So I guess Elasticsearch is not the one who terminates all HTTP connections. First, I plan to check if Next I'll try http tracing to see how the default HTTP client uses connections. Will take a few days. |
Any useful followup? In any case, this doesn't seem like a bug in the Go runtime. Closing in the hopes that it is really a problem with the program. Please comment if you disagree. |
What version of Go are you using (
go version
)?go version go1.10rc1 windows/amd64
1.9.2
and1.9.3
have this problem too.But when run on a Macbook, the same problem doesn't manifest at all.
So suspect it is Windows only.
Does this issue reproduce with the latest release?
Yes. As I mentioned above, even on
1.10rc1
.What operating system and processor architecture are you using (
go env
)?What did you do?
My code issues concurrent bulk update API calls to an Elasticsearch node, which is not overloaded. Over time, OS threads stack up and the program panics. I use
github.com/olivere/elastic
to make calls to Elasticsearch.The above is where the over 7000 OS threads are blocking on.
It's easily reproducible, but at least, in the current setting, it's difficult to set the environment up elasewhere. I don't have enough time right now, but will try to narrow down to a simple example that can reproduce the problem soon.
What did you expect to see?
Even though my code issues many concurrent HTTP calls (to an Elasticsearch node), the number of goroutines I use is fixed, so there should be no such thing like over 7000 threads blocking on Windows socket function.
What did you see instead?
Over 7000 OS threads created, and panic over failure to create more.
The following is part of a log captured during such a failure. Over 7000 threads are blocking on Winsock socket function:
The text was updated successfully, but these errors were encountered: