Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: failed to create new OS thread (have 7466 already; errno=1455) on Windows 10 #23613

Closed
dynaxis opened this issue Jan 30, 2018 · 21 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Windows
Milestone

Comments

@dynaxis
Copy link

dynaxis commented Jan 30, 2018

What version of Go are you using (go version)?

go version go1.10rc1 windows/amd64
1.9.2 and 1.9.3 have this problem too.
But when run on a Macbook, the same problem doesn't manifest at all.
So suspect it is Windows only.

Does this issue reproduce with the latest release?

Yes. As I mentioned above, even on 1.10rc1.

What operating system and processor architecture are you using (go env)?

set GOARCH=amd64
set GOBIN=
set GOCACHE=C:\Users\dynaxis\AppData\Local\go-build
set GOEXE=.exe
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOOS=windows
set GOPATH=c:\Users\dynaxis\go
set GORACE=
set GOROOT=C:\Go
set GOTMPDIR=
set GOTOOLDIR=C:\Go\pkg\tool\windows_amd64
set GCCGO=gccgo
set CC=gcc
set CXX=g++
set CGO_ENABLED=1
set CGO_CFLAGS=-g -O2
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-g -O2
set CGO_FFLAGS=-g -O2
set CGO_LDFLAGS=-g -O2
set PKG_CONFIG=pkg-config
set GOGCCFLAGS=-m64 -mthreads -fmessage-length=0 -fdebug-prefix-map=C:\Users\dynaxis\AppData\Local\Temp\go-build003648370=/tmp/go-build -gno-record-gcc-switches

What did you do?

My code issues concurrent bulk update API calls to an Elasticsearch node, which is not overloaded. Over time, OS threads stack up and the program panics. I use github.com/olivere/elastic to make calls to Elasticsearch.

syscall.Syscall(0x7ff8ac7ec9b0, 0x3, 0x2, 0x1, 0x0, 0x1ed06a8, 0x0, 0xc04e1b9348)
	C:/Go/src/runtime/syscall_windows.go:163 +0x69

The above is where the over 7000 OS threads are blocking on.

It's easily reproducible, but at least, in the current setting, it's difficult to set the environment up elasewhere. I don't have enough time right now, but will try to narrow down to a simple example that can reproduce the problem soon.

What did you expect to see?

Even though my code issues many concurrent HTTP calls (to an Elasticsearch node), the number of goroutines I use is fixed, so there should be no such thing like over 7000 threads blocking on Windows socket function.

What did you see instead?

Over 7000 OS threads created, and panic over failure to create more.
The following is part of a log captured during such a failure. Over 7000 threads are blocking on Winsock socket function:

runtime: failed to create new OS thread (have 7466 already; errno=1455)
fatal error: runtime.newosproc

runtime stack:
runtime.throw(0xb525c0, 0x11)
	C:/Go/src/runtime/panic.go:605 +0x9c
runtime.newosproc(0xc05a8c9180, 0x0)
	C:/Go/src/runtime/os_windows.go:641 +0x15a
runtime.newm(0xb74280, 0xc042017300)
	C:/Go/src/runtime/proc.go:1657 +0x154
runtime.startm(0xc042017300, 0xc0506b2301)
	C:/Go/src/runtime/proc.go:1728 +0x146
runtime.handoffp(0xc042017300)
	C:/Go/src/runtime/proc.go:1766 +0x2a1
runtime.entersyscallblock_handoff()
	C:/Go/src/runtime/proc.go:2613 +0x39
runtime.systemstack(0xc042018600)
	C:/Go/src/runtime/asm_amd64.s:344 +0x7e
runtime.mstart()
	C:/Go/src/runtime/proc.go:1135

goroutine 5 [syscall]:
runtime.notetsleepg(0xf403a0, 0x3b8b84a0, 0x1)
	C:/Go/src/runtime/lock_sema.go:280 +0x59 fp=0xc042021f60 sp=0xc042021f20 pc=0x410179
runtime.timerproc()
	C:/Go/src/runtime/time.go:216 +0x313 fp=0xc042021fe0 sp=0xc042021f60 pc=0x449653
runtime.goexit()
	C:/Go/src/runtime/asm_amd64.s:2337 +0x1 fp=0xc042021fe8 sp=0xc042021fe0 pc=0x458031
created by runtime.addtimerLocked
	C:/Go/src/runtime/time.go:122 +0xf4

goroutine 1 [semacquire, 1 minutes]:
sync.runtime_Semacquire(0xc0421e21f4)
	C:/Go/src/runtime/sema.go:56 +0x40
sync.(*WaitGroup).Wait(0xc0421e21e8)
	C:/Go/src/sync/waitgroup.go:131 +0x79
gitlab.com/fluidic/insight-ingester/ingest.(*Service).Wait(0xc0421e21c0)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/ingest/service.go:74 +0x38
gitlab.com/fluidic/insight-ingester/pipeline.Run(0xf406c0, 0xc0421c8360, 0x0, 0xc0421c8360)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/pipeline/run.go:43 +0x48b
gitlab.com/fluidic/insight-ingester/cmd.run(0xf388a0, 0xc042130a80, 0x1, 0x1, 0x0, 0x0)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/cmd/run.go:59 +0x23d
gitlab.com/fluidic/insight-ingester/vendor/github.com/spf13/cobra.(*Command).execute(0xf388a0, 0xc042130a50, 0x1, 0x1, 0xf388a0, 0xc042130a50)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/vendor/github.com/spf13/cobra/command.go:698 +0x481
gitlab.com/fluidic/insight-ingester/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0xf38680, 0x2f, 0xf387c0, 0xc042059f58)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/vendor/github.com/spf13/cobra/command.go:783 +0x315
gitlab.com/fluidic/insight-ingester/vendor/github.com/spf13/cobra.(*Command).Execute(0xf38680, 0xc042059f48, 0x3)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/vendor/github.com/spf13/cobra/command.go:736 +0x32
gitlab.com/fluidic/insight-ingester/cmd.Execute()
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/cmd/root.go:38 +0xa3
main.main()
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/main.go:6 +0x27

goroutine 153668 [chan receive]:
net/http.(*Transport).getConn.func2.1(0xc051453860, 0xf37560, 0xb73f30)
	C:/Go/src/net/http/transport.go:932 +0x57
created by net/http.(*Transport).getConn.func2
	C:/Go/src/net/http/transport.go:931 +0x81

goroutine 157928 [chan receive]:
net/http.(*Transport).getConn.func2.1(0xc044177bc0, 0xf37560, 0xb73f30)
	C:/Go/src/net/http/transport.go:932 +0x57
created by net/http.(*Transport).getConn.func2
	C:/Go/src/net/http/transport.go:931 +0x81

goroutine 33 [select, 1 minutes]:
gitlab.com/fluidic/insight-ingester/vendor/github.com/olivere/elastic.(*Client).sniffer(0xc04201ca80)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/vendor/github.com/olivere/elastic/client.go:799 +0x197
created by gitlab.com/fluidic/insight-ingester/vendor/github.com/olivere/elastic.NewClient
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/vendor/github.com/olivere/elastic/client.go:277 +0x882

goroutine 146006 [runnable, locked to thread]:
syscall.Syscall(0x7ff8ac7ec9b0, 0x3, 0x2, 0x1, 0x0, 0x1ed06a8, 0x0, 0xc04e1b9348)
	C:/Go/src/runtime/syscall_windows.go:163 +0x69
syscall.socket(0x100000002, 0xc000000000, 0x0, 0x0, 0xc050cea480)
	C:/Go/src/syscall/zsyscall_windows.go:1407 +0x73
syscall.Socket(0x2, 0x1, 0x0, 0xc050cea480, 0x0, 0xc04e1b9440)
	C:/Go/src/syscall/syscall_windows.go:706 +0x58
net.sysSocket(0x2, 0x1, 0x0, 0xc0432bf320, 0xa, 0x10)
	C:/Go/src/net/sock_windows.go:21 +0x61
net.socket(0xf021a0, 0xc0464c1e00, 0xb4690d, 0x3, 0x2, 0x1, 0x0, 0x0, 0xf04120, 0x0, ...)
	C:/Go/src/net/sock_posix.go:42 +0x61
net.internetSocket(0xf021a0, 0xc0464c1e00, 0xb4690d, 0x3, 0xf04120, 0x0, 0xf04120, 0xc0461f6c60, 0x1, 0x0, ...)
	C:/Go/src/net/ipsock_posix.go:141 +0x158
net.doDialTCP(0xf021a0, 0xc0464c1e00, 0xb4690d, 0x3, 0x0, 0xc0461f6c60, 0xf5ded8, 0x0, 0x0)
	C:/Go/src/net/tcpsock_posix.go:62 +0xc0
net.dialTCP(0xf021a0, 0xc0464c1e00, 0xb4690d, 0x3, 0x0, 0xc0461f6c60, 0xbe9236e19a2ea3c8, 0x18c0e42339, 0xf40420)
	C:/Go/src/net/tcpsock_posix.go:58 +0xeb
net.dialSingle(0xf021a0, 0xc0464c1e00, 0xc050999680, 0xeff6a0, 0xc0461f6c60, 0x0, 0x0, 0x0, 0x0)
	C:/Go/src/net/dial.go:547 +0x3e9
net.dialSerial(0xf021a0, 0xc0464c1e00, 0xc050999680, 0xc050cea4a0, 0x1, 0x1, 0x0, 0x0, 0x0, 0x0)
	C:/Go/src/net/dial.go:515 +0x24e
net.(*Dialer).DialContext(0xc042034240, 0xf02160, 0xc04200a0f0, 0xb4690d, 0x3, 0xc0433734a0, 0x12, 0x0, 0x0, 0x0, ...)
	C:/Go/src/net/dial.go:397 +0x6f5
net.(*Dialer).DialContext-fm(0xf02160, 0xc04200a0f0, 0xb4690d, 0x3, 0xc0433734a0, 0x12, 0x4035d0, 0x60, 0x728de0, 0xc04200a0f0)
	C:/Go/src/net/http/transport.go:46 +0x7a
net/http.(*Transport).dial(0xf37560, 0xf02160, 0xc04200a0f0, 0xb4690d, 0x3, 0xc0433734a0, 0x12, 0x0, 0x0, 0x0, ...)
	C:/Go/src/net/http/transport.go:884 +0x22a
net/http.(*Transport).dialConn(0xf37560, 0xf02160, 0xc04200a0f0, 0x0, 0xc0435afa40, 0x4, 0xc0433734a0, 0x12, 0x0, 0xc0458914a0, ...)
	C:/Go/src/net/http/transport.go:1060 +0x1d69
net/http.(*Transport).getConn.func4(0xf37560, 0xf02160, 0xc04200a0f0, 0xc0461f6ba0, 0xc042ac1260)
	C:/Go/src/net/http/transport.go:943 +0x7f
created by net/http.(*Transport).getConn
	C:/Go/src/net/http/transport.go:942 +0x39a

goroutine 160290 [syscall, locked to thread]:
syscall.Syscall(0x7ff8ac7ec9b0, 0x3, 0x2, 0x1, 0x0, 0x1ed06a8, 0x0, 0xc051951348)
	C:/Go/src/runtime/syscall_windows.go:163 +0x69
syscall.socket(0x100000002, 0xc000000000, 0x0, 0x0, 0xc053fbb4f0)
	C:/Go/src/syscall/zsyscall_windows.go:1407 +0x73
syscall.Socket(0x2, 0x1, 0x0, 0xc053fbb4f0, 0x0, 0xc051951440)
	C:/Go/src/syscall/syscall_windows.go:706 +0x58
net.sysSocket(0x2, 0x1, 0x0, 0xc04389ad40, 0xa, 0x10)
	C:/Go/src/net/sock_windows.go:21 +0x61
net.socket(0xf021a0, 0xc055db3bc0, 0xb4690d, 0x3, 0x2, 0x1, 0x0, 0x0, 0xf04120, 0x0, ...)
	C:/Go/src/net/sock_posix.go:42 +0x61
net.internetSocket(0xf021a0, 0xc055db3bc0, 0xb4690d, 0x3, 0xf04120, 0x0, 0xf04120, 0xc05702a150, 0x1, 0x0, ...)
	C:/Go/src/net/ipsock_posix.go:141 +0x158
net.doDialTCP(0xf021a0, 0xc055db3bc0, 0xb4690d, 0x3, 0x0, 0xc05702a150, 0xf5ded8, 0x0, 0x0)
	C:/Go/src/net/tcpsock_posix.go:62 +0xc0
net.dialTCP(0xf021a0, 0xc055db3bc0, 0xb4690d, 0x3, 0x0, 0xc05702a150, 0xbe9236e2d5bdccec, 0x19e6793e5d, 0xf40420)
	C:/Go/src/net/tcpsock_posix.go:58 +0xeb
net.dialSingle(0xf021a0, 0xc055db3bc0, 0xc0466fff80, 0xeff6a0, 0xc05702a150, 0x0, 0x0, 0x0, 0x0)
	C:/Go/src/net/dial.go:547 +0x3e9
net.dialSerial(0xf021a0, 0xc055db3bc0, 0xc0466fff80, 0xc053fbb500, 0x1, 0x1, 0x0, 0x0, 0x0, 0x0)
	C:/Go/src/net/dial.go:515 +0x24e
net.(*Dialer).DialContext(0xc042034240, 0xf02160, 0xc04200a0f0, 0xb4690d, 0x3, 0xc045310c20, 0x12, 0x0, 0x0, 0x0, ...)
	C:/Go/src/net/dial.go:397 +0x6f5
net.(*Dialer).DialContext-fm(0xf02160, 0xc04200a0f0, 0xb4690d, 0x3, 0xc045310c20, 0x12, 0x4035d0, 0x60, 0x728de0, 0xc04200a0f0)
	C:/Go/src/net/http/transport.go:46 +0x7a
net/http.(*Transport).dial(0xf37560, 0xf02160, 0xc04200a0f0, 0xb4690d, 0x3, 0xc045310c20, 0x12, 0x0, 0x0, 0x0, ...)
	C:/Go/src/net/http/transport.go:884 +0x22a
net/http.(*Transport).dialConn(0xf37560, 0xf02160, 0xc04200a0f0, 0x0, 0xc043c45620, 0x4, 0xc045310c20, 0x12, 0x101, 0xc04ac89fb8, ...)
	C:/Go/src/net/http/transport.go:1060 +0x1d69
net/http.(*Transport).getConn.func4(0xf37560, 0xf02160, 0xc04200a0f0, 0xc05702a0c0, 0xc054075140)
	C:/Go/src/net/http/transport.go:943 +0x7f
created by net/http.(*Transport).getConn
	C:/Go/src/net/http/transport.go:942 +0x39a

goroutine 34 [select]:
gitlab.com/fluidic/insight-ingester/vendor/github.com/olivere/elastic.(*Client).healthchecker(0xc04201ca80)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/vendor/github.com/olivere/elastic/client.go:986 +0x19c
created by gitlab.com/fluidic/insight-ingester/vendor/github.com/olivere/elastic.NewClient
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/vendor/github.com/olivere/elastic/client.go:280 +0x858

goroutine 35 [select]:
gitlab.com/fluidic/insight-ingester/vendor/github.com/olivere/elastic.(*bulkWorker).work(0xc0421e2180, 0xf02160, 0xc04200a0f0)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/vendor/github.com/olivere/elastic/bulk_processor.go:439 +0x142
created by gitlab.com/fluidic/insight-ingester/vendor/github.com/olivere/elastic.(*BulkProcessor).Start
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/vendor/github.com/olivere/elastic/bulk_processor.go:302 +0x1c1

goroutine 36 [runnable]:
net/http.(*Transport).getConn.func2()
	C:/Go/src/net/http/transport.go:929 +0x90
net/http.(*Transport).getConn(0xf37560, 0xc052fdd9e0, 0x0, 0xc0422260f0, 0x4, 0xc05093cac0, 0x12, 0x0, 0x0, 0x199)
	C:/Go/src/net/http/transport.go:982 +0xb5f
net/http.(*Transport).RoundTrip(0xf37560, 0xc04d2dbf00, 0xf37560, 0x0, 0x0)
	C:/Go/src/net/http/transport.go:400 +0x6ad
net/http.send(0xc04d2dbf00, 0xefc060, 0xf37560, 0x0, 0x0, 0x0, 0xc0480c08a0, 0xc042207bf0, 0xc046cb1318, 0x1)
	C:/Go/src/net/http/client.go:249 +0x1b0
net/http.(*Client).send(0xf401c0, 0xc04d2dbf00, 0x0, 0x0, 0x0, 0xc0480c08a0, 0x0, 0x1, 0x1dbb450)
	C:/Go/src/net/http/client.go:173 +0x104
net/http.(*Client).Do(0xf401c0, 0xc04d2dbf00, 0xc04200a0f0, 0xc04d2dbf00, 0x0)
	C:/Go/src/net/http/client.go:602 +0x294
gitlab.com/fluidic/insight-ingester/vendor/github.com/olivere/elastic.(*Client).PerformRequest(0xc04201ca80, 0xf02160, 0xc04200a0f0, 0xb46d1b, 0x4, 0xc05093c9e0, 0x12, 0xc052fdd830, 0xa14740, 0xc0569e00c0, ...)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/vendor/github.com/olivere/elastic/client.go:1264 +0x41b
gitlab.com/fluidic/insight-ingester/vendor/github.com/olivere/elastic.(*IndicesAnalyzeService).Do(0xc046cb18d0, 0xf02160, 0xc04200a0f0, 0xa8ed01, 0x1, 0x0)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/vendor/github.com/olivere/elastic/indices_analyze.go:192 +0x154
gitlab.com/fluidic/insight-ingester/job/goods/internal/refine.(*Refiner).tokens(0xc0422063f0, 0x0, 0x0, 0xc045d8e04c, 0x2, 0xb745e8, 0xc046cb1990, 0x494f16)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/job/goods/internal/refine/tokens.go:20 +0x15f
gitlab.com/fluidic/insight-ingester/job/goods/internal/refine.(*Refiner).sortedTokens(0xc0422063f0, 0x0, 0x0, 0xc045d8e04c, 0x2, 0x5a628680, 0x0, 0x0)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/job/goods/internal/refine/tokens.go:32 +0x6d
gitlab.com/fluidic/insight-ingester/job/goods/internal/refine.(*Refiner).SizeAndColor(0xc0422063f0, 0xc04889be00, 0xf40400, 0x0, 0x0, 0x0, 0x0, 0xed1f3fef0, 0xf40420, 0x0, ...)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/job/goods/internal/refine/get.go:51 +0xa6
gitlab.com/fluidic/insight-ingester/job/goods.(*ingestJob).buildLogs(0xc042243520, 0x1cb3, 0x0, 0xed1f2ad70, 0xf40420, 0xc05704ce40)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/job/goods/job.go:164 +0x936
gitlab.com/fluidic/insight-ingester/job/goods.(*ingestJob).ingestInventories(0xc042243520)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/job/goods/ingestivts.go:39 +0xa07
gitlab.com/fluidic/insight-ingester/job/goods.(*ingestJob).Run(0xc042243520, 0xc044c05fb8, 0x1)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/job/goods/job.go:98 +0xe2
gitlab.com/fluidic/insight-ingester/ingest/internal/worker.(*Pool).loop(0xc0422061b0)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/ingest/internal/worker/worker.go:47 +0xe0
created by gitlab.com/fluidic/insight-ingester/ingest/internal/worker.New
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/ingest/internal/worker/worker.go:21 +0xd6

goroutine 37 [sleep]:
time.Sleep(0x2540be400)
	C:/Go/src/runtime/time.go:65 +0x145
gitlab.com/fluidic/insight-ingester/vendor/github.com/olivere/elastic.(*Client).PerformRequest(0xc04201ca80, 0xf02160, 0xc04200a0f0, 0xb46d1b, 0x4, 0xc043b86a18, 0x6, 0xc0520a1080, 0xa3d860, 0xc056cf89e0, ...)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/vendor/github.com/olivere/elastic/client.go:1230 +0x14e2
gitlab.com/fluidic/insight-ingester/vendor/github.com/olivere/elastic.(*BulkService).Do(0xc04ee63b98, 0xf02160, 0xc04200a0f0, 0x1, 0xc04ee63b98, 0xc04ee63a78)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/vendor/github.com/olivere/elastic/bulk.go:246 +0x33a
gitlab.com/fluidic/insight-ingester/job/goods.(*ingestJob).normalizeReleaseDates(0xc044d68820, 0xc0564adf18, 0x3)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/job/goods/reldate.go:53 +0xb18
gitlab.com/fluidic/insight-ingester/job/goods.(*ingestJob).Run(0xc044d68820, 0xc0564adfb8, 0x1)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/job/goods/job.go:89 +0x7a
gitlab.com/fluidic/insight-ingester/ingest/internal/worker.(*Pool).loop(0xc0422061b0)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/ingest/internal/worker/worker.go:47 +0xe0
created by gitlab.com/fluidic/insight-ingester/ingest/internal/worker.New
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/ingest/internal/worker/worker.go:21 +0xd6

goroutine 38 [select]:
net/http.(*Transport).getConn(0xf37560, 0xc052d8db00, 0x0, 0xc0487c82a0, 0x4, 0xc04340d280, 0x12, 0x0, 0x0, 0x0)
	C:/Go/src/net/http/transport.go:948 +0x5c6
net/http.(*Transport).RoundTrip(0xf37560, 0xc044255900, 0xf37560, 0x0, 0x0)
	C:/Go/src/net/http/transport.go:400 +0x6ad
net/http.send(0xc044255900, 0xefc060, 0xf37560, 0x0, 0x0, 0x0, 0xc049b94a70, 0xc042207bf0, 0xc042b71688, 0x1)
	C:/Go/src/net/http/client.go:249 +0x1b0
net/http.(*Client).send(0xf401c0, 0xc044255900, 0x0, 0x0, 0x0, 0xc049b94a70, 0x0, 0x1, 0x1df3018)
	C:/Go/src/net/http/client.go:173 +0x104
net/http.(*Client).Do(0xf401c0, 0xc044255900, 0xc04200a0f0, 0xc044255900, 0x0)
	C:/Go/src/net/http/client.go:602 +0x294
gitlab.com/fluidic/insight-ingester/vendor/github.com/olivere/elastic.(*Client).PerformRequest(0xc04201ca80, 0xf02160, 0xc04200a0f0, 0xb46d1b, 0x4, 0xc04340d240, 0x12, 0xc052d8d9e0, 0xa14740, 0xc056e9faf0, ...)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/vendor/github.com/olivere/elastic/client.go:1264 +0x41b
gitlab.com/fluidic/insight-ingester/vendor/github.com/olivere/elastic.(*IndicesAnalyzeService).Do(0xc042b71c40, 0xf02160, 0xc04200a0f0, 0x30, 0xa3d860, 0xc00000cf54)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/vendor/github.com/olivere/elastic/indices_analyze.go:192 +0x154
gitlab.com/fluidic/insight-ingester/job/goods/internal/refine.(*Refiner).tokens(0xc0422063f0, 0x0, 0x0, 0xc0434cf000, 0x3e, 0x3, 0x4444e0, 0xc0448c23a6)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/job/goods/internal/refine/tokens.go:20 +0x15f
gitlab.com/fluidic/insight-ingester/job/goods/internal/refine.(*Refiner).Materials(0xc0422063f0, 0xc04502e838, 0xc0448c23a6, 0x9, 0x40b600)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/job/goods/internal/refine/get.go:30 +0xbc
gitlab.com/fluidic/insight-ingester/job/goods.(*ingestJob).fillGoodsIn(0xc042c36340, 0xc046ceee20, 0xc04502e838, 0xc049382870, 0x14, 0x14)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/job/goods/ingestitems.go:109 +0x3b8
gitlab.com/fluidic/insight-ingester/job/goods.(*ingestJob).ingestItems(0xc042c36340)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/job/goods/ingestitems.go:23 +0x326
gitlab.com/fluidic/insight-ingester/job/goods.(*ingestJob).Run(0xc042c36340, 0xc04a0a7fb8, 0x1)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/job/goods/job.go:87 +0x69
gitlab.com/fluidic/insight-ingester/ingest/internal/worker.(*Pool).loop(0xc0422061b0)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/ingest/internal/worker/worker.go:47 +0xe0
created by gitlab.com/fluidic/insight-ingester/ingest/internal/worker.New
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/ingest/internal/worker/worker.go:21 +0xd6

goroutine 39 [select]:
net/http.(*persistConn).roundTrip(0xc054780d80, 0xc0520a1e60, 0x0, 0x0, 0x0)
	C:/Go/src/net/http/transport.go:1970 +0x60f
net/http.(*Transport).RoundTrip(0xf37560, 0xc04f504f00, 0xf37560, 0x0, 0x0)
	C:/Go/src/net/http/transport.go:413 +0x999
net/http.send(0xc04f504f00, 0xefc060, 0xf37560, 0x0, 0x0, 0x0, 0xc04869c688, 0xc042207bf0, 0xc05820b318, 0x1)
	C:/Go/src/net/http/client.go:249 +0x1b0
net/http.(*Client).send(0xf401c0, 0xc04f504f00, 0x0, 0x0, 0x0, 0xc04869c688, 0x0, 0x1, 0x18dd1d0)
	C:/Go/src/net/http/client.go:173 +0x104
net/http.(*Client).Do(0xf401c0, 0xc04f504f00, 0xc04200a0f0, 0xc04f504f00, 0x0)
	C:/Go/src/net/http/client.go:602 +0x294
gitlab.com/fluidic/insight-ingester/vendor/github.com/olivere/elastic.(*Client).PerformRequest(0xc04201ca80, 0xf02160, 0xc04200a0f0, 0xb46d1b, 0x4, 0xc050b708c0, 0x12, 0xc0520a1bf0, 0xa14740, 0xc056cf9130, ...)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/vendor/github.com/olivere/elastic/client.go:1264 +0x41b
gitlab.com/fluidic/insight-ingester/vendor/github.com/olivere/elastic.(*IndicesAnalyzeService).Do(0xc05820b8d0, 0xf02160, 0xc04200a0f0, 0xa8ed00, 0x2, 0x0)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/vendor/github.com/olivere/elastic/indices_analyze.go:192 +0x154
gitlab.com/fluidic/insight-ingester/job/goods/internal/refine.(*Refiner).tokens(0xc0422063f0, 0x0, 0x0, 0xc045fd7520, 0x6, 0xb745e8, 0xc05820b990, 0x494f16)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/job/goods/internal/refine/tokens.go:20 +0x15f
gitlab.com/fluidic/insight-ingester/job/goods/internal/refine.(*Refiner).sortedTokens(0xc0422063f0, 0x0, 0x0, 0xc045fd7520, 0x6, 0x5a613500, 0x3c0000000000, 0x310)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/job/goods/internal/refine/tokens.go:32 +0x6d
gitlab.com/fluidic/insight-ingester/job/goods/internal/refine.(*Refiner).SizeAndColor(0xc0422063f0, 0xc04bb5b400, 0x0, 0xc05ab7fc00, 0x6, 0x6, 0x0, 0xed1f2ad70, 0xf40420, 0x0, ...)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/job/goods/internal/refine/get.go:51 +0xa6
gitlab.com/fluidic/insight-ingester/job/goods.(*ingestJob).buildLogs(0xc048978680, 0x1e32, 0x0, 0xed1f944f0, 0xf40420, 0xc042f270b0)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/job/goods/job.go:164 +0x936
gitlab.com/fluidic/insight-ingester/job/goods.(*ingestJob).ingestInventories(0xc048978680)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/job/goods/ingestivts.go:39 +0xa07
gitlab.com/fluidic/insight-ingester/job/goods.(*ingestJob).Run(0xc048978680, 0xc04c975fb8, 0x1)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/job/goods/job.go:98 +0xe2
gitlab.com/fluidic/insight-ingester/ingest/internal/worker.(*Pool).loop(0xc0422061b0)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/ingest/internal/worker/worker.go:47 +0xe0
created by gitlab.com/fluidic/insight-ingester/ingest/internal/worker.New
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/ingest/internal/worker/worker.go:21 +0xd6

goroutine 40 [select]:
gitlab.com/fluidic/insight-ingester/vendor/github.com/olivere/elastic.(*Client).healthcheck(0xc04201ca80, 0x3b9aca00, 0xc0420e8e00)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/vendor/github.com/olivere/elastic/client.go:1044 +0x3b3
gitlab.com/fluidic/insight-ingester/vendor/github.com/olivere/elastic.(*Client).PerformRequest(0xc04201ca80, 0xf02160, 0xc04200a0f0, 0xb46d1b, 0x4, 0xc050299a20, 0x12, 0xc052df8d50, 0xa14740, 0xc056e15ee0, ...)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/vendor/github.com/olivere/elastic/client.go:1220 +0x1583
gitlab.com/fluidic/insight-ingester/vendor/github.com/olivere/elastic.(*IndicesAnalyzeService).Do(0xc04ddb38d0, 0xf02160, 0xc04200a0f0, 0x0, 0xefc3e0, 0xc052df8c60)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/vendor/github.com/olivere/elastic/indices_analyze.go:192 +0x154
gitlab.com/fluidic/insight-ingester/job/goods/internal/refine.(*Refiner).tokens(0xc0422063f0, 0x0, 0x0, 0xc0424d1360, 0x12, 0x0, 0x0, 0x0)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/job/goods/internal/refine/tokens.go:20 +0x15f
gitlab.com/fluidic/insight-ingester/job/goods/internal/refine.(*Refiner).sortedTokens(0xc0422063f0, 0x0, 0x0, 0xc0424d1360, 0x12, 0x0, 0x0, 0x0)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/job/goods/internal/refine/tokens.go:32 +0x6d
gitlab.com/fluidic/insight-ingester/job/goods/internal/refine.(*Refiner).SizeAndColor(0xc0422063f0, 0xc04b895f80, 0xf40400, 0x0, 0x0, 0x0, 0x0, 0xed1f3fef0, 0xf40420, 0x0, ...)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/job/goods/internal/refine/get.go:65 +0x1de
gitlab.com/fluidic/insight-ingester/job/goods.(*ingestJob).buildLogs(0xc042243860, 0x1d60, 0x0, 0xed1f6a1f0, 0xf40420, 0xc04472a690)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/job/goods/job.go:164 +0x936
gitlab.com/fluidic/insight-ingester/job/goods.(*ingestJob).ingestInventories(0xc042243860)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/job/goods/ingestivts.go:39 +0xa07
gitlab.com/fluidic/insight-ingester/job/goods.(*ingestJob).Run(0xc042243860, 0xc04ddb3fb8, 0x1)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/job/goods/job.go:98 +0xe2
gitlab.com/fluidic/insight-ingester/ingest/internal/worker.(*Pool).loop(0xc0422061b0)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/ingest/internal/worker/worker.go:47 +0xe0
created by gitlab.com/fluidic/insight-ingester/ingest/internal/worker.New
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/ingest/internal/worker/worker.go:21 +0xd6

goroutine 41 [select]:
net/http.(*Transport).getConn(0xf37560, 0xc052c4e2d0, 0x0, 0xc0487c82d0, 0x4, 0xc0518f82e0, 0x12, 0x0, 0x0, 0x199)
	C:/Go/src/net/http/transport.go:948 +0x5c6
net/http.(*Transport).RoundTrip(0xf37560, 0xc044255b00, 0xf37560, 0x0, 0x0)
	C:/Go/src/net/http/transport.go:400 +0x6ad
net/http.send(0xc044255b00, 0xefc060, 0xf37560, 0x0, 0x0, 0x0, 0xc049b94aa8, 0xc042207bf0, 0xc0453e5318, 0x1)
	C:/Go/src/net/http/client.go:249 +0x1b0
net/http.(*Client).send(0xf401c0, 0xc044255b00, 0x0, 0x0, 0x0, 0xc049b94aa8, 0x0, 0x1, 0x1db9738)
	C:/Go/src/net/http/client.go:173 +0x104
net/http.(*Client).Do(0xf401c0, 0xc044255b00, 0xc04200a0f0, 0xc044255b00, 0x0)
	C:/Go/src/net/http/client.go:602 +0x294
gitlab.com/fluidic/insight-ingester/vendor/github.com/olivere/elastic.(*Client).PerformRequest(0xc04201ca80, 0xf02160, 0xc04200a0f0, 0xb46d1b, 0x4, 0xc0518f8260, 0x12, 0xc052c4e0c0, 0xa14740, 0xc0564ae200, ...)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/vendor/github.com/olivere/elastic/client.go:1264 +0x41b
gitlab.com/fluidic/insight-ingester/vendor/github.com/olivere/elastic.(*IndicesAnalyzeService).Do(0xc0453e58d0, 0xf02160, 0xc04200a0f0, 0xa8ed20, 0x1, 0x60000000001)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/vendor/github.com/olivere/elastic/indices_analyze.go:192 +0x154
gitlab.com/fluidic/insight-ingester/job/goods/internal/refine.(*Refiner).tokens(0xc0422063f0, 0x0, 0x0, 0xc0446ce8a0, 0x10, 0x2, 0x233a, 0xc0453e5a50)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/job/goods/internal/refine/tokens.go:20 +0x15f
gitlab.com/fluidic/insight-ingester/job/goods/internal/refine.(*Refiner).sortedTokens(0xc0422063f0, 0x0, 0x0, 0xc0446ce8a0, 0x10, 0x2, 0x2338, 0x1)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/job/goods/internal/refine/tokens.go:32 +0x6d
gitlab.com/fluidic/insight-ingester/job/goods/internal/refine.(*Refiner).SizeAndColor(0xc0422063f0, 0xc04b388480, 0xf40400, 0x0, 0x0, 0x0, 0x0, 0xed1f3fef0, 0xf40420, 0xc0446ce8a0, ...)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/job/goods/internal/refine/get.go:65 +0x1de
gitlab.com/fluidic/insight-ingester/job/goods.(*ingestJob).buildLogs(0xc042363ba0, 0x1e44, 0x0, 0xed1f944f0, 0xf40420, 0xc04d11fda0)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/job/goods/job.go:164 +0x936
gitlab.com/fluidic/insight-ingester/job/goods.(*ingestJob).ingestInventories(0xc042363ba0)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/job/goods/ingestivts.go:39 +0xa07
gitlab.com/fluidic/insight-ingester/job/goods.(*ingestJob).Run(0xc042363ba0, 0xc044a4bfb8, 0x1)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/job/goods/job.go:98 +0xe2
gitlab.com/fluidic/insight-ingester/ingest/internal/worker.(*Pool).loop(0xc0422061b0)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/ingest/internal/worker/worker.go:47 +0xe0
created by gitlab.com/fluidic/insight-ingester/ingest/internal/worker.New
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/ingest/internal/worker/worker.go:21 +0xd6

goroutine 42 [select]:
net/http.(*persistConn).roundTrip(0xc048ce2480, 0xc052ce0420, 0x0, 0x0, 0x0)
	C:/Go/src/net/http/transport.go:1970 +0x60f
net/http.(*Transport).RoundTrip(0xf37560, 0xc04f505100, 0xf37560, 0x0, 0x0)
	C:/Go/src/net/http/transport.go:413 +0x999
net/http.send(0xc04f505100, 0xefc060, 0xf37560, 0x0, 0x0, 0x0, 0xc04869c6a8, 0xc042207bf0, 0xc05768f318, 0x1)
	C:/Go/src/net/http/client.go:249 +0x1b0
net/http.(*Client).send(0xf401c0, 0xc04f505100, 0x0, 0x0, 0x0, 0xc04869c6a8, 0x0, 0x1, 0x18dd1d0)
	C:/Go/src/net/http/client.go:173 +0x104
net/http.(*Client).Do(0xf401c0, 0xc04f505100, 0xc04200a0f0, 0xc04f505100, 0x0)
	C:/Go/src/net/http/client.go:602 +0x294
gitlab.com/fluidic/insight-ingester/vendor/github.com/olivere/elastic.(*Client).PerformRequest(0xc04201ca80, 0xf02160, 0xc04200a0f0, 0xb46d1b, 0x4, 0xc050b70a00, 0x12, 0xc052ce02d0, 0xa14740, 0xc056cf9520, ...)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/vendor/github.com/olivere/elastic/client.go:1264 +0x41b
gitlab.com/fluidic/insight-ingester/vendor/github.com/olivere/elastic.(*IndicesAnalyzeService).Do(0xc05768f8d0, 0xf02160, 0xc04200a0f0, 0xa8ed00, 0x4, 0x2)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/vendor/github.com/olivere/elastic/indices_analyze.go:192 +0x154
gitlab.com/fluidic/insight-ingester/job/goods/internal/refine.(*Refiner).tokens(0xc0422063f0, 0x0, 0x0, 0xc043b67ac0, 0x1b, 0xb745e8, 0xc05768f990, 0x494f16)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/job/goods/internal/refine/tokens.go:20 +0x15f
gitlab.com/fluidic/insight-ingester/job/goods/internal/refine.(*Refiner).sortedTokens(0xc0422063f0, 0x0, 0x0, 0xc043b67ac0, 0x1b, 0x5a628680, 0x380000000000, 0x288)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/job/goods/internal/refine/tokens.go:32 +0x6d
gitlab.com/fluidic/insight-ingester/job/goods/internal/refine.(*Refiner).SizeAndColor(0xc0422063f0, 0xc048e0e000, 0x0, 0xc05049d080, 0x5, 0x5, 0x0, 0xed1f3fef0, 0xf40420, 0x0, ...)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/job/goods/internal/refine/get.go:51 +0xa6
gitlab.com/fluidic/insight-ingester/job/goods.(*ingestJob).buildLogs(0xc044d681a0, 0x1e60, 0x0, 0xed1f944f0, 0xf40420, 0xc0551205d0)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/job/goods/job.go:164 +0x936
gitlab.com/fluidic/insight-ingester/job/goods.(*ingestJob).ingestInventories(0xc044d681a0)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/job/goods/ingestivts.go:39 +0xa07
gitlab.com/fluidic/insight-ingester/job/goods.(*ingestJob).Run(0xc044d681a0, 0xc05823ffb8, 0x1)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/job/goods/job.go:98 +0xe2
gitlab.com/fluidic/insight-ingester/ingest/internal/worker.(*Pool).loop(0xc0422061b0)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/ingest/internal/worker/worker.go:47 +0xe0
created by gitlab.com/fluidic/insight-ingester/ingest/internal/worker.New
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/ingest/internal/worker/worker.go:21 +0xd6
@ALTree ALTree changed the title [Windows 10 64-bit] runtime: failed to create new OS thread (have 7466 already; errno=1455) runtime: failed to create new OS thread (have 7466 already; errno=1455) on Windows 10 Jan 30, 2018
@ALTree ALTree added OS-Windows NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Jan 30, 2018
@mattn
Copy link
Member

mattn commented Jan 30, 2018

FYI: errno 1455 is ERROR_COMMITMENT_LIMIT. And number of threads limit on Windows is around 6600.

The paging file is too small for this operation to complete.

https://msdn.microsoft.com/en-us/library/windows/desktop/ms681385%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396

@alexbrainman
Copy link
Member

Even though my code issues many concurrent HTTP calls (to an Elasticsearch node), the number of goroutines I use is fixed, so there should be no such thing like over 7000 threads blocking on Windows socket function.

syscall.Socket is a blocking call, so each call to syscall.Socket will take up 1 thread. If you issue 10000 simultaneous syscall.Socket calls and these take long time to complete, you will run out of threads / memory. Is that what is happening in your program?

ALex

@dynaxis
Copy link
Author

dynaxis commented Jan 30, 2018

My program spawns up less than 40 goroutines total. So it's quite weird to see over 7000 threads created. That's why I'm reporting the issue here. By avoiding use of bulk update API in Elasticsearch, I can at least workaround the problem even on Windows. So this is not about me not being able to solve my issue. :)

Memory is reported to stay stable around 220MB according to the task manager (when no OS threads stack up). So I'm quite sure it's not a memory issue.

As I described above, on Mac, there is no such problem observed with the exactly same code, against the same Elasticsearch node, and with the same input.

As I mentioned, I plan to make a small sample code to reproduce the problem around Elasticsearch. But wanted to let others know before the sample available with a hope that someone may know what's going wrong with the symptom only.

@dynaxis
Copy link
Author

dynaxis commented Jan 30, 2018

I can post the whole log files if it helps. But if all of you think that this issue is not worth more investigation only with the currently available information. Then feel free to close this issue. I'll put more effort to understand the situation and come back when I got enough context. Thanks.

@ianlancetaylor ianlancetaylor added this to the Go1.11 milestone Jan 30, 2018
@ianlancetaylor
Copy link
Contributor

I think this is worth more investigation. Please do make the complete stack dump available somewhere, perhaps in a gist. Thanks.

@dynaxis
Copy link
Author

dynaxis commented Jan 30, 2018

i.log

Here goes a full log file.

The REST call I make to an Elasticsearch node in this case is a bulk update API call (actually 20 updates in a batch). If I switch to 20 separate non-bulk updates, then the problem goes away. But AFAIK, the two APIs share the same networking code.

I'll try to create a sample project that does the bulk API calls from fixed number of goroutines, in the hope that the problem can be at least consistently reproducible, and come back.

@dynaxis
Copy link
Author

dynaxis commented Jan 30, 2018

ingest.log

Here goes another. This and the previous one are all taken under go1.9.2. But no difference in 1.10rc1.

@mattn
Copy link
Member

mattn commented Jan 30, 2018

gitlab.com/fluidic/insight-ingester is closed source?

@dynaxis
Copy link
Author

dynaxis commented Jan 30, 2018

@mattn yes, unfortunately. That's why I want to separate out the troublesome part in a separate project. If I succeed, will share here after making it available on GitHub.

@ianlancetaylor
Copy link
Contributor

@dynaxis Thanks for the logs. The first log shows 14,962 goroutines. So I question your statement above that your program spawns less than 40 goroutines total. Something in your program is creating many more goroutines than that.

There are 7454 goroutines waiting in chan receive state and 7459 goroutines marked runnable, locked to thread.

Essentially all the chan receive goroutines have this stack trace:

net/http.(*Transport).getConn.func2.1(0xc0521253e0, 0xf37560, 0xb73f30)
        C:/Go/src/net/http/transport.go:932 +0x57
created by net/http.(*Transport).getConn.func2
        C:/Go/src/net/http/transport.go:931 +0x81

Those line numbers are not plausible for Go1.10rc1, so I assume this log was generated using Go1.9.3. These goroutines are waiting for an HTTP connection to complete.

The runnable, locked to thread goroutines look like this:

syscall.Syscall(0x7ff8ac7ec9b0, 0x3, 0x2, 0x1, 0x0, 0x1ed0000, 0x0, 0xc05065f348)
	C:/Go/src/runtime/syscall_windows.go:163 +0x69
syscall.socket(0x100000002, 0xc000000000, 0x0, 0x0, 0xc046883480)
	C:/Go/src/syscall/zsyscall_windows.go:1407 +0x73
syscall.Socket(0x2, 0x1, 0x0, 0xc046883480, 0x0, 0xc05065f440)
	C:/Go/src/syscall/syscall_windows.go:706 +0x58
net.sysSocket(0x2, 0x1, 0x0, 0xc042642680, 0xa, 0x10)
	C:/Go/src/net/sock_windows.go:21 +0x61
net.socket(0xf021a0, 0xc048d5fe00, 0xb4690d, 0x3, 0x2, 0x1, 0x0, 0x0, 0xf04120, 0x0, ...)
	C:/Go/src/net/sock_posix.go:42 +0x61
net.internetSocket(0xf021a0, 0xc048d5fe00, 0xb4690d, 0x3, 0xf04120, 0x0, 0xf04120, 0xc044380600, 0x1, 0x0, ...)
	C:/Go/src/net/ipsock_posix.go:141 +0x158
net.doDialTCP(0xf021a0, 0xc048d5fe00, 0xb4690d, 0x3, 0x0, 0xc044380600, 0xf5ded8, 0x0, 0x0)
	C:/Go/src/net/tcpsock_posix.go:62 +0xc0
net.dialTCP(0xf021a0, 0xc048d5fe00, 0xb4690d, 0x3, 0x0, 0xc044380600, 0xbe9236e265bffae8, 0x197f45d859, 0xf40420)
	C:/Go/src/net/tcpsock_posix.go:58 +0xeb
net.dialSingle(0xf021a0, 0xc048d5fe00, 0xc04300cc80, 0xeff6a0, 0xc044380600, 0x0, 0x0, 0x0, 0x0)
	C:/Go/src/net/dial.go:547 +0x3e9
net.dialSerial(0xf021a0, 0xc048d5fe00, 0xc04300cc80, 0xc0468834c0, 0x1, 0x1, 0x0, 0x0, 0x0, 0x0)
	C:/Go/src/net/dial.go:515 +0x24e
net.(*Dialer).DialContext(0xc042034240, 0xf02160, 0xc04200a0f0, 0xb4690d, 0x3, 0xc0466be660, 0x12, 0x0, 0x0, 0x0, ...)
	C:/Go/src/net/dial.go:397 +0x6f5
net.(*Dialer).DialContext-fm(0xf02160, 0xc04200a0f0, 0xb4690d, 0x3, 0xc0466be660, 0x12, 0x4035d0, 0x60, 0x728de0, 0xc04200a0f0)
	C:/Go/src/net/http/transport.go:46 +0x7a
net/http.(*Transport).dial(0xf37560, 0xf02160, 0xc04200a0f0, 0xb4690d, 0x3, 0xc0466be660, 0x12, 0x0, 0x0, 0x0, ...)
	C:/Go/src/net/http/transport.go:884 +0x22a
net/http.(*Transport).dialConn(0xf37560, 0xf02160, 0xc04200a0f0, 0x0, 0xc044895620, 0x4, 0xc0466be660, 0x12, 0x101, 0xc0469d5fb8, ...)
	C:/Go/src/net/http/transport.go:1060 +0x1d69
net/http.(*Transport).getConn.func4(0xf37560, 0xf02160, 0xc04200a0f0, 0xc044380540, 0xc04247be00)
	C:/Go/src/net/http/transport.go:943 +0x7f
created by net/http.(*Transport).getConn
	C:/Go/src/net/http/transport.go:942 +0x39a

This is the other side of the goroutines waiting for a dial to complete: it's the goroutines trying to dial. They are all waiting for syscall.Socket to create a socket. I have no idea why all these goroutines are sitting there waiting for a socket, an operation that should be quite fast.

Can you show us a stack dump with GOTRACEBACK=system set in the environment? Thanks.

@dynaxis
Copy link
Author

dynaxis commented Jan 31, 2018

I'll try to run the troublesome code under 1.10rc1 with GOTRACEBACK=system and come back. Hope I can do it today.

BTW, I've also been very curious on why Winsock socket() blocks that way. As you mentioned, it's supposed to return quickly. No special case mentioned in MSDN.

@dynaxis
Copy link
Author

dynaxis commented Jan 31, 2018

The following log is taken with GOTRACEBACK=system on go1.9.3, which is compressed due to its size.
new2.zip

I tried on go1.10rc1 again too, but this time, the Elasticsearch client spits out an error instead of a panic. The weird thing is I upgraded to go1.10rc1 solely for testing this problem, got the same panic, and this time got different results with no code changes since the last try. The error from the Elasticsearch client is also considered abnormal, though, which I need to look into anyway. The only change in my env since then is MS upgrading my Windows 10 to Fall Creators Update (I thought it's just a minor upgrade, since Windows was not so frank about it. :( ).

@dynaxis
Copy link
Author

dynaxis commented Jan 31, 2018

Ok, this time on go1.10rc with GOTRACEBACK=system. I called panic() when I get an error back from the Elasticsearch client. The number of goroutines is quite excessive (over 6000. But as far as I know my code and the Elasticsearch client don't create them that way).

new4.zip

Hope this helps.

@ianlancetaylor
Copy link
Contributor

Thanks. That wasn't as helpful as I hoped. Your new4.log file shows 3193 goroutines mostly waiting to complete a call to WSASocket and 3188 goroutines mostly waiting for a socket connection to complete.

I think these goroutines are being started by

gitlab.com/fluidic/insight-ingester/vendor/github.com/olivere/elastic.(*Client).PerformRequest(0xc042052300, 0xb910c0, 0xc042062098, 0xb123f7, 0x4, 0xc04aa22760, 0x12, 0xc0435b17a0, 0x9dbbc0, 0xc043090bd0, ...)
	c:/Users/dynaxis/go/src/gitlab.com/fluidic/insight-ingester/vendor/github.com/olivere/elastic/client.go:1264 +0x408 fp=0xc0486a97b0 sp=0xc0486a9598 pc=0x79aff8

I recommend that you take a close look at how often that function is being called. If that is being called thousands of times, and then the operation is canceled, that would cause what you are seeing.

@dynaxis
Copy link
Author

dynaxis commented Jan 31, 2018

@ianlancetaylor the PerformRequest is part of github.com/olivere/elastic, which is open source.

Anyway, examining more into its source code, I found that it creates a goroutine per Elasticsearch node (in my case, only one) for testing if each Elasticsearch node is alive with a timeout of 1 second. I previously overlooked that part.

But it kicks in only once in a minute or when all nodes are previously marked as dead on PerformRequest. So very unlikely to cause the problem, IMHO. The problem manifest within a couple of minutes.

Also I don't set any deadline to the context to be used for HTTP requests inside the Elasticsearch client, and don't cancel it. Only use context.Background().

Anyway I'll try to separate the case out to a separate simple project in the hope that it will make the examination of this issue easier.

Thanks.

@dynaxis
Copy link
Author

dynaxis commented Jan 31, 2018

@ianlancetaylor I'll also take a closer look at the frequency of PerformRequest.

@dynaxis
Copy link
Author

dynaxis commented Jan 31, 2018

I did my homework, and think the following message is the key to understanding the cause of this issue: "Only one usage of each socket address (protocol/network address/port) is normally permitted"

It's from Windows, and caused by too many TCP connections in TIME_WAIT state exhausting all available ports, as you all know.

On Mac, the default timeout for TIME_WAIT is 15 seconds. On Windows, it's 4 minutes.
I guess the difference in the default values causes this issue to seemingly appear only on Windows.

But since my code keeps issuing many REST calls (from 38 goroutines via Elasticsearch client), I don't clearly get why that many TCP connections are stacked up in TIME_WAIT considering the default keep alive time is 30 seconds.

If it's perfectly normal to have this kind of situations, then I think it's safe to close this issue.

@ianlancetaylor
Copy link
Contributor

The TIME_WAIT state shouldn't cause WSASocket to hang creating a socket. At least, that would not happen on Unix systems and I don't see why it would happen on Windows.

@dynaxis
Copy link
Author

dynaxis commented Feb 1, 2018

I think it's better to investigate more by inserting more logs here and there to the current code than preparing a kernel of it that manifests this problem for now. I'll dig a little bit more.

@dynaxis
Copy link
Author

dynaxis commented Feb 1, 2018

Turned off all possible requests made by spawning new goroutines and all the retries. But I still see immediate and relatively fast increase in the number of TIME_WAITed connections after starting my program.

According to the manual, Elasticsearch defaults to keep connections alive. So I guess Elasticsearch is not the one who terminates all HTTP connections.

First, I plan to check if WSASocket blocks when all TCP ports are exhausted. I know that it shouldn't but just to be sure.

Next I'll try http tracing to see how the default HTTP client uses connections.

Will take a few days.

@ianlancetaylor
Copy link
Contributor

Any useful followup?

In any case, this doesn't seem like a bug in the Go runtime. Closing in the hopes that it is really a problem with the program. Please comment if you disagree.

@golang golang locked and limited conversation to collaborators Jul 9, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Windows
Projects
None yet
Development

No branches or pull requests

6 participants