-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: GC 3x slower in tip vs 1.1.2 for large numbers of goroutines #6482
Labels
Milestone
Comments
Simple, don't ship 1.2 until it is fixed, or rip it out until it is ready. I'm not trying to be flip, but Go's main selling point is the ease of creating large quantities of concurrent processes with acceptable performance. This is hardly an edge condition. This is such a severe problem that it will prevent many serious Go users from upgrading. From where I stand, the additional precision is meaningless if it increases GC time 9x. With 1.1.x the VM grows to about 2x resident, but the resident stays fairly constant, and the VM doesn't appear to grow without bounds. Not ideal, but a mild problem compared to a 9x increase in GC time. |
I think your micro benchmark is leading you astray. In the Go1 benchmarks, gc bound tasks like binary tree have _decreased_ in runtime, which runs counter to your assertion. Full results: lucky(~/devel/autobench) % make /home/dfc/devel/autobench/work/go.11/misc/benchcmp /home/dfc/devel/autobench/work/go1-11.txt /home/dfc/devel/autobench/work/go1-tip.txt benchmark old ns/op new ns/op delta BenchmarkBinaryTree17 2147483647 2147483647 -16.35% BenchmarkFannkuch11 2147483647 2147483647 -2.93% BenchmarkFmtFprintfEmpty 103 100 -2.91% BenchmarkFmtFprintfString 302 253 -16.23% BenchmarkFmtFprintfInt 200 221 +10.50% BenchmarkFmtFprintfIntInt 335 327 -2.39% BenchmarkFmtFprintfPrefixedInt 373 291 -21.98% BenchmarkFmtFprintfFloat 529 455 -13.99% BenchmarkFmtManyArgs 1408 1207 -14.28% BenchmarkGobDecode 13340638 10182422 -23.67% BenchmarkGobEncode 14378134 15727007 +9.38% BenchmarkGzip 511413238 466426341 -8.80% BenchmarkGunzip 141615409 115645734 -18.34% BenchmarkHTTPClientServer 58614 49320 -15.86% BenchmarkJSONEncode 51195827 29764960 -41.86% BenchmarkJSONDecode 109983875 110148725 +0.15% BenchmarkMandelbrot200 4878674 4846368 -0.66% BenchmarkGoParse 7006652 6030072 -13.94% BenchmarkRegexpMatchEasy0_32 136 140 +2.94% BenchmarkRegexpMatchEasy0_1K 350 383 +9.43% BenchmarkRegexpMatchEasy1_32 107 116 +8.41% BenchmarkRegexpMatchEasy1_1K 922 934 +1.30% BenchmarkRegexpMatchMedium_32 197 203 +3.05% BenchmarkRegexpMatchMedium_1K 74180 73690 -0.66% BenchmarkRegexpMatchHard_32 3744 3428 -8.44% BenchmarkRegexpMatchHard_1K 118619 111374 -6.11% BenchmarkRevcomp 973314967 761606833 -21.75% BenchmarkTemplate 147874541 133148104 -9.96% BenchmarkTimeParse 521 448 -14.01% BenchmarkTimeFormat 583 450 -22.81% benchmark old MB/s new MB/s speedup BenchmarkGobDecode 57.53 75.38 1.31x BenchmarkGobEncode 53.38 48.80 0.91x BenchmarkGzip 37.94 41.60 1.10x BenchmarkGunzip 137.02 167.79 1.22x BenchmarkJSONEncode 37.90 65.19 1.72x BenchmarkJSONDecode 17.64 17.62 1.00x BenchmarkGoParse 8.27 9.61 1.16x BenchmarkRegexpMatchEasy0_32 235.21 227.39 0.97x BenchmarkRegexpMatchEasy0_1K 2924.41 2671.68 0.91x BenchmarkRegexpMatchEasy1_32 297.64 275.47 0.93x BenchmarkRegexpMatchEasy1_1K 1110.12 1095.48 0.99x BenchmarkRegexpMatchMedium_32 5.06 4.92 0.97x BenchmarkRegexpMatchMedium_1K 13.80 13.90 1.01x BenchmarkRegexpMatchHard_32 8.55 9.33 1.09x BenchmarkRegexpMatchHard_1K 8.63 9.19 1.06x BenchmarkRevcomp 261.14 333.72 1.28x BenchmarkTemplate 13.12 14.57 1.11x /home/dfc/devel/autobench/work/go.11/misc/benchcmp /home/dfc/devel/autobench/work/runtime-11.txt /home/dfc/devel/autobench/work/runtime-tip.txt benchmark old ns/op new ns/op delta BenchmarkAppend 104 104 +0.00% BenchmarkAppend1Byte 213 142 -33.33% BenchmarkAppend4Bytes 299 132 -55.85% BenchmarkAppend8Bytes 382 134 -64.92% BenchmarkAppend16Bytes 529 232 -56.14% BenchmarkAppend32Bytes 531 139 -73.82% BenchmarkAppendSpecialCase 45 39 -13.35% BenchmarkSelectUncontended 265 272 +2.64% BenchmarkSelectContended 266 266 +0.00% BenchmarkSelectNonblock 112 112 +0.00% BenchmarkChanUncontended 69 67 -1.88% BenchmarkChanContended 70 68 -2.42% BenchmarkChanSync 152 153 +0.66% BenchmarkChanProdCons0 158 155 -1.90% BenchmarkChanProdCons10 93 95 +1.82% BenchmarkChanProdCons100 71 72 +1.12% BenchmarkChanProdConsWork0 777 782 +0.64% BenchmarkChanProdConsWork10 720 722 +0.28% BenchmarkChanProdConsWork100 699 698 -0.14% BenchmarkChanCreation 194 194 +0.00% BenchmarkChanSem 66 65 -0.30% BenchmarkCallClosure 2 2 +0.00% BenchmarkCallClosure1 3 3 +5.83% BenchmarkCallClosure2 55 45 -17.09% BenchmarkCallClosure3 56 45 -18.51% BenchmarkCallClosure4 55 46 -16.99% BenchmarkComplex128DivNormal 24 24 +0.00% BenchmarkComplex128DivNisNaN 15 15 +5.33% BenchmarkComplex128DivDisNaN 15 15 +1.31% BenchmarkComplex128DivNisInf 10 12 +14.68% BenchmarkComplex128DivDisInf 11 11 +3.60% BenchmarkConvT2ESmall 10 10 +3.92% BenchmarkConvT2EUintptr 0 0 +4.69% BenchmarkConvT2ELarge 63 61 -3.31% BenchmarkConvT2ISmall 12 12 -0.79% BenchmarkConvT2IUintptr 1 1 +0.90% BenchmarkConvT2ILarge 66 61 -8.41% BenchmarkConvI2E 4 4 +7.64% BenchmarkConvI2I 16 16 +2.42% BenchmarkAssertE2T 11 11 -4.20% BenchmarkAssertE2TLarge 12 12 +0.00% BenchmarkAssertE2I 18 18 +0.00% BenchmarkAssertI2T 12 11 -4.17% BenchmarkAssertI2I 25 18 -28.63% BenchmarkAssertI2E 4 4 -1.11% BenchmarkAssertE2E 0 0 +0.00% BenchmarkMalloc8 52 42 -19.70% BenchmarkMalloc16 56 47 -16.13% BenchmarkMallocTypeInfo8 75 60 -19.41% BenchmarkMallocTypeInfo16 79 65 -16.84% BenchmarkHashStringSpeed 27 28 +2.54% BenchmarkHashInt32Speed 17 20 +18.24% BenchmarkHashInt64Speed 17 18 +3.45% BenchmarkHashStringArraySpeed 85 85 -0.35% BenchmarkMegMap 18 23 +25.95% BenchmarkMegOneMap 25 14 -43.97% BenchmarkMegEqMap 80808 77276 -4.37% BenchmarkMegEmptyMap 3 3 +7.91% BenchmarkSmallStrMap 20 22 +10.40% BenchmarkMapStringKeysEight_16 21 25 +21.43% BenchmarkMapStringKeysEight_32 18 22 +19.57% BenchmarkMapStringKeysEight_64 18 22 +19.57% BenchmarkMapStringKeysEight_1M 18 22 +19.57% BenchmarkIntMap 13 19 +46.67% BenchmarkRepeatedLookupStrMapKey32 40 40 -0.74% BenchmarkRepeatedLookupStrMapKey1M 340521 336266 -1.25% BenchmarkNewEmptyMap 135 137 +1.48% BenchmarkMemmove32 22 5 -76.68% BenchmarkMemmove4K 185 177 -4.32% BenchmarkMemmove64K 3593 3547 -1.28% BenchmarkMemmove4M 830265 823422 -0.82% BenchmarkMemmove64M 13271665 13210000 -0.46% BenchmarkFinalizer 171 179 +4.68% BenchmarkFinalizerRun 717 698 -2.65% BenchmarkStackGrowth 738 878 +18.97% BenchmarkStackGrowthDeep 105341 105403 +0.06% BenchmarkCreateGoroutines 115 118 +2.61% BenchmarkCreateGoroutinesParallel 114 118 +3.51% BenchmarkMatmult 10 11 +2.78% BenchmarkIfaceCmp100 236 227 -3.81% BenchmarkIfaceCmpNil100 174 172 -1.15% BenchmarkDefer 97 71 -26.57% BenchmarkDefer10 90 65 -27.56% BenchmarkDeferMany 130 103 -20.77% BenchmarkCompareStringEqual 10 11 +3.70% BenchmarkCompareStringIdentical 4 4 +10.99% BenchmarkCompareStringSameLength 9 9 +3.58% BenchmarkCompareStringDifferentLength 1 2 +3.63% BenchmarkCompareStringBigUnaligned 86791 81568 -6.02% BenchmarkCompareStringBig 77180 77602 +0.55% benchmark old MB/s new MB/s speedup BenchmarkMemmove32 1396.98 5987.51 4.29x BenchmarkMemmove4K 22137.54 23092.03 1.04x BenchmarkMemmove64K 18237.54 18475.72 1.01x BenchmarkMemmove4M 5051.76 5093.75 1.01x BenchmarkMemmove64M 5056.55 5080.16 1.00x BenchmarkCompareStringBigUnaligned 12081.62 12855.27 1.06x BenchmarkCompareStringBig 13586.08 13512.33 0.99x /home/dfc/devel/autobench/work/go.11/misc/benchcmp /home/dfc/devel/autobench/work/http-11.txt /home/dfc/devel/autobench/work/http-tip.txt benchmark old ns/op new ns/op delta BenchmarkHeaderWriteSubset 1455 1119 -23.09% BenchmarkReadRequestChrome 7697 6959 -9.59% BenchmarkReadRequestCurl 4065 3573 -12.10% BenchmarkReadRequestApachebench 4017 3555 -11.50% BenchmarkReadRequestSiege 5321 4675 -12.14% BenchmarkReadRequestWrk 2698 2651 -1.74% BenchmarkClientServer 58534 57838 -1.19% BenchmarkClientServerParallel4 54188 55070 +1.63% BenchmarkClientServerParallel64 56747 53766 -5.25% BenchmarkServer 335560 299159 -10.85% BenchmarkServerFakeConnNoKeepAlive 16659 12884 -22.66% BenchmarkServerFakeConnWithKeepAlive 13830 10271 -25.73% BenchmarkServerFakeConnWithKeepAliveLite 9030 6427 -28.83% BenchmarkServerHandlerTypeLen 11513 8706 -24.38% BenchmarkServerHandlerNoLen 10593 7930 -25.14% BenchmarkServerHandlerNoType 11033 8248 -25.24% BenchmarkServerHandlerNoHeader 8611 5716 -33.62% benchmark old MB/s new MB/s speedup BenchmarkReadRequestChrome 79.37 87.79 1.11x BenchmarkReadRequestCurl 19.19 21.83 1.14x BenchmarkReadRequestApachebench 20.41 23.06 1.13x BenchmarkReadRequestSiege 28.38 32.29 1.14x BenchmarkReadRequestWrk 14.83 15.08 1.02x /home/dfc/devel/autobench/work/go.11/misc/benchcmp /home/dfc/devel/autobench/work/floats-11.txt /home/dfc/devel/autobench/work/floats-tip.txt benchmark old ns/op new ns/op delta BenchmarkMinSmall 12 13 +0.78% BenchmarkMinMed 693 692 -0.14% BenchmarkMinLarge 67208 66055 -1.72% BenchmarkMinHuge 8936179 9025296 +1.00% BenchmarkAddTwoSmall 21 25 +17.59% BenchmarkAddFourSmall 43 48 +11.81% BenchmarkAddTwoMed 982 1018 +3.67% BenchmarkAddFourMed 2903 2929 +0.90% BenchmarkAddTwoLarge 103809 101980 -1.76% BenchmarkAddFourLarge 343695 342168 -0.44% BenchmarkAddTwoHuge 16345354 16320656 -0.15% BenchmarkAddFourHuge 49105734 49138748 +0.07% BenchmarkLogSumExpSmall 265 264 -0.38% BenchmarkLogSumExpMed 20086 20072 -0.07% BenchmarkLogSumExpLarge 2015764 2012523 -0.16% BenchmarkLogSumExpHuge 206482185 206366944 -0.06% BenchmarkDotSmall 15 16 +5.23% BenchmarkDotMed 1024 1032 +0.78% BenchmarkDotLarge 106982 109836 +2.67% BenchmarkDotHuge 13493495 13540756 +0.35% |
This issue was closed by revision c3dadca. Status changed to Fixed. |
Hi Russ, Thanks for checking this out. I ran my app with the pre- and post-fix versions and it looks like the GC time is back under control (and slightly better than 1.1.2's performance, as Dave predicted). With 7266a3768bfa (post-fix) and around 280k goroutines I see GC times like: gc59(8): 564+287+183 ms, 4586 -> 2621 MB roughly 3.7µs per goroutine (ignoring the effect that heap size has on GC time :) ) With 4971493b0e14 (pre-fix) and around 300k goroutines I see GC times like: gc78(8): 5391+282+1142 ms, 4527 -> 2511 MB roughly 22.7µs per goroutine And with go1.1.2 and around 540k goroutines I see GC times like: gc167(8): 1566+381+335 ms, 11746 -> 9231 MB roughly 4.2µs per goroutine This looks resolved to me, thank you! |
This issue was closed.
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
The text was updated successfully, but these errors were encountered: