Source file src/runtime/mgc.go
Documentation: runtime
1 // Copyright 2009 The Go Authors. All rights reserved. 2 // Use of this source code is governed by a BSD-style 3 // license that can be found in the LICENSE file. 4 5 // Garbage collector (GC). 6 // 7 // The GC runs concurrently with mutator threads, is type accurate (aka precise), allows multiple 8 // GC thread to run in parallel. It is a concurrent mark and sweep that uses a write barrier. It is 9 // non-generational and non-compacting. Allocation is done using size segregated per P allocation 10 // areas to minimize fragmentation while eliminating locks in the common case. 11 // 12 // The algorithm decomposes into several steps. 13 // This is a high level description of the algorithm being used. For an overview of GC a good 14 // place to start is Richard Jones' gchandbook.org. 15 // 16 // The algorithm's intellectual heritage includes Dijkstra's on-the-fly algorithm, see 17 // Edsger W. Dijkstra, Leslie Lamport, A. J. Martin, C. S. Scholten, and E. F. M. Steffens. 1978. 18 // On-the-fly garbage collection: an exercise in cooperation. Commun. ACM 21, 11 (November 1978), 19 // 966-975. 20 // For journal quality proofs that these steps are complete, correct, and terminate see 21 // Hudson, R., and Moss, J.E.B. Copying Garbage Collection without stopping the world. 22 // Concurrency and Computation: Practice and Experience 15(3-5), 2003. 23 // 24 // 1. GC performs sweep termination. 25 // 26 // a. Stop the world. This causes all Ps to reach a GC safe-point. 27 // 28 // b. Sweep any unswept spans. There will only be unswept spans if 29 // this GC cycle was forced before the expected time. 30 // 31 // 2. GC performs the mark phase. 32 // 33 // a. Prepare for the mark phase by setting gcphase to _GCmark 34 // (from _GCoff), enabling the write barrier, enabling mutator 35 // assists, and enqueueing root mark jobs. No objects may be 36 // scanned until all Ps have enabled the write barrier, which is 37 // accomplished using STW. 38 // 39 // b. Start the world. From this point, GC work is done by mark 40 // workers started by the scheduler and by assists performed as 41 // part of allocation. The write barrier shades both the 42 // overwritten pointer and the new pointer value for any pointer 43 // writes (see mbarrier.go for details). Newly allocated objects 44 // are immediately marked black. 45 // 46 // c. GC performs root marking jobs. This includes scanning all 47 // stacks, shading all globals, and shading any heap pointers in 48 // off-heap runtime data structures. Scanning a stack stops a 49 // goroutine, shades any pointers found on its stack, and then 50 // resumes the goroutine. 51 // 52 // d. GC drains the work queue of grey objects, scanning each grey 53 // object to black and shading all pointers found in the object 54 // (which in turn may add those pointers to the work queue). 55 // 56 // e. Because GC work is spread across local caches, GC uses a 57 // distributed termination algorithm to detect when there are no 58 // more root marking jobs or grey objects (see gcMarkDone). At this 59 // point, GC transitions to mark termination. 60 // 61 // 3. GC performs mark termination. 62 // 63 // a. Stop the world. 64 // 65 // b. Set gcphase to _GCmarktermination, and disable workers and 66 // assists. 67 // 68 // c. Perform housekeeping like flushing mcaches. 69 // 70 // 4. GC performs the sweep phase. 71 // 72 // a. Prepare for the sweep phase by setting gcphase to _GCoff, 73 // setting up sweep state and disabling the write barrier. 74 // 75 // b. Start the world. From this point on, newly allocated objects 76 // are white, and allocating sweeps spans before use if necessary. 77 // 78 // c. GC does concurrent sweeping in the background and in response 79 // to allocation. See description below. 80 // 81 // 5. When sufficient allocation has taken place, replay the sequence 82 // starting with 1 above. See discussion of GC rate below. 83 84 // Concurrent sweep. 85 // 86 // The sweep phase proceeds concurrently with normal program execution. 87 // The heap is swept span-by-span both lazily (when a goroutine needs another span) 88 // and concurrently in a background goroutine (this helps programs that are not CPU bound). 89 // At the end of STW mark termination all spans are marked as "needs sweeping". 90 // 91 // The background sweeper goroutine simply sweeps spans one-by-one. 92 // 93 // To avoid requesting more OS memory while there are unswept spans, when a 94 // goroutine needs another span, it first attempts to reclaim that much memory 95 // by sweeping. When a goroutine needs to allocate a new small-object span, it 96 // sweeps small-object spans for the same object size until it frees at least 97 // one object. When a goroutine needs to allocate large-object span from heap, 98 // it sweeps spans until it frees at least that many pages into heap. There is 99 // one case where this may not suffice: if a goroutine sweeps and frees two 100 // nonadjacent one-page spans to the heap, it will allocate a new two-page 101 // span, but there can still be other one-page unswept spans which could be 102 // combined into a two-page span. 103 // 104 // It's critical to ensure that no operations proceed on unswept spans (that would corrupt 105 // mark bits in GC bitmap). During GC all mcaches are flushed into the central cache, 106 // so they are empty. When a goroutine grabs a new span into mcache, it sweeps it. 107 // When a goroutine explicitly frees an object or sets a finalizer, it ensures that 108 // the span is swept (either by sweeping it, or by waiting for the concurrent sweep to finish). 109 // The finalizer goroutine is kicked off only when all spans are swept. 110 // When the next GC starts, it sweeps all not-yet-swept spans (if any). 111 112 // GC rate. 113 // Next GC is after we've allocated an extra amount of memory proportional to 114 // the amount already in use. The proportion is controlled by GOGC environment variable 115 // (100 by default). If GOGC=100 and we're using 4M, we'll GC again when we get to 8M 116 // (this mark is tracked in next_gc variable). This keeps the GC cost in linear 117 // proportion to the allocation cost. Adjusting GOGC just changes the linear constant 118 // (and also the amount of extra memory used). 119 120 // Oblets 121 // 122 // In order to prevent long pauses while scanning large objects and to 123 // improve parallelism, the garbage collector breaks up scan jobs for 124 // objects larger than maxObletBytes into "oblets" of at most 125 // maxObletBytes. When scanning encounters the beginning of a large 126 // object, it scans only the first oblet and enqueues the remaining 127 // oblets as new scan jobs. 128 129 package runtime 130 131 import ( 132 "internal/cpu" 133 "runtime/internal/atomic" 134 "unsafe" 135 ) 136 137 const ( 138 _DebugGC = 0 139 _ConcurrentSweep = true 140 _FinBlockSize = 4 * 1024 141 142 // debugScanConservative enables debug logging for stack 143 // frames that are scanned conservatively. 144 debugScanConservative = false 145 146 // sweepMinHeapDistance is a lower bound on the heap distance 147 // (in bytes) reserved for concurrent sweeping between GC 148 // cycles. 149 sweepMinHeapDistance = 1024 * 1024 150 ) 151 152 // heapminimum is the minimum heap size at which to trigger GC. 153 // For small heaps, this overrides the usual GOGC*live set rule. 154 // 155 // When there is a very small live set but a lot of allocation, simply 156 // collecting when the heap reaches GOGC*live results in many GC 157 // cycles and high total per-GC overhead. This minimum amortizes this 158 // per-GC overhead while keeping the heap reasonably small. 159 // 160 // During initialization this is set to 4MB*GOGC/100. In the case of 161 // GOGC==0, this will set heapminimum to 0, resulting in constant 162 // collection even when the heap size is small, which is useful for 163 // debugging. 164 var heapminimum uint64 = defaultHeapMinimum 165 166 // defaultHeapMinimum is the value of heapminimum for GOGC==100. 167 const defaultHeapMinimum = 4 << 20 168 169 // Initialized from $GOGC. GOGC=off means no GC. 170 var gcpercent int32 171 172 func gcinit() { 173 if unsafe.Sizeof(workbuf{}) != _WorkbufSize { 174 throw("size of Workbuf is suboptimal") 175 } 176 177 // No sweep on the first cycle. 178 mheap_.sweepdone = 1 179 180 // Set a reasonable initial GC trigger. 181 memstats.triggerRatio = 7 / 8.0 182 183 // Fake a heap_marked value so it looks like a trigger at 184 // heapminimum is the appropriate growth from heap_marked. 185 // This will go into computing the initial GC goal. 186 memstats.heap_marked = uint64(float64(heapminimum) / (1 + memstats.triggerRatio)) 187 188 // Set gcpercent from the environment. This will also compute 189 // and set the GC trigger and goal. 190 _ = setGCPercent(readgogc()) 191 192 work.startSema = 1 193 work.markDoneSema = 1 194 lockInit(&work.sweepWaiters.lock, lockRankSweepWaiters) 195 lockInit(&work.assistQueue.lock, lockRankAssistQueue) 196 lockInit(&work.wbufSpans.lock, lockRankWbufSpans) 197 } 198 199 func readgogc() int32 { 200 p := gogetenv("GOGC") 201 if p == "off" { 202 return -1 203 } 204 if n, ok := atoi32(p); ok { 205 return n 206 } 207 return 100 208 } 209 210 // gcenable is called after the bulk of the runtime initialization, 211 // just before we're about to start letting user code run. 212 // It kicks off the background sweeper goroutine, the background 213 // scavenger goroutine, and enables GC. 214 func gcenable() { 215 // Kick off sweeping and scavenging. 216 c := make(chan int, 2) 217 go bgsweep(c) 218 go bgscavenge(c) 219 <-c 220 <-c 221 memstats.enablegc = true // now that runtime is initialized, GC is okay 222 } 223 224 //go:linkname setGCPercent runtime/debug.setGCPercent 225 func setGCPercent(in int32) (out int32) { 226 // Run on the system stack since we grab the heap lock. 227 systemstack(func() { 228 lock(&mheap_.lock) 229 out = gcpercent 230 if in < 0 { 231 in = -1 232 } 233 gcpercent = in 234 heapminimum = defaultHeapMinimum * uint64(gcpercent) / 100 235 // Update pacing in response to gcpercent change. 236 gcSetTriggerRatio(memstats.triggerRatio) 237 unlock(&mheap_.lock) 238 }) 239 240 // If we just disabled GC, wait for any concurrent GC mark to 241 // finish so we always return with no GC running. 242 if in < 0 { 243 gcWaitOnMark(atomic.Load(&work.cycles)) 244 } 245 246 return out 247 } 248 249 // Garbage collector phase. 250 // Indicates to write barrier and synchronization task to perform. 251 var gcphase uint32 252 253 // The compiler knows about this variable. 254 // If you change it, you must change builtin/runtime.go, too. 255 // If you change the first four bytes, you must also change the write 256 // barrier insertion code. 257 var writeBarrier struct { 258 enabled bool // compiler emits a check of this before calling write barrier 259 pad [3]byte // compiler uses 32-bit load for "enabled" field 260 needed bool // whether we need a write barrier for current GC phase 261 cgo bool // whether we need a write barrier for a cgo check 262 alignme uint64 // guarantee alignment so that compiler can use a 32 or 64-bit load 263 } 264 265 // gcBlackenEnabled is 1 if mutator assists and background mark 266 // workers are allowed to blacken objects. This must only be set when 267 // gcphase == _GCmark. 268 var gcBlackenEnabled uint32 269 270 const ( 271 _GCoff = iota // GC not running; sweeping in background, write barrier disabled 272 _GCmark // GC marking roots and workbufs: allocate black, write barrier ENABLED 273 _GCmarktermination // GC mark termination: allocate black, P's help GC, write barrier ENABLED 274 ) 275 276 //go:nosplit 277 func setGCPhase(x uint32) { 278 atomic.Store(&gcphase, x) 279 writeBarrier.needed = gcphase == _GCmark || gcphase == _GCmarktermination 280 writeBarrier.enabled = writeBarrier.needed || writeBarrier.cgo 281 } 282 283 // gcMarkWorkerMode represents the mode that a concurrent mark worker 284 // should operate in. 285 // 286 // Concurrent marking happens through four different mechanisms. One 287 // is mutator assists, which happen in response to allocations and are 288 // not scheduled. The other three are variations in the per-P mark 289 // workers and are distinguished by gcMarkWorkerMode. 290 type gcMarkWorkerMode int 291 292 const ( 293 // gcMarkWorkerDedicatedMode indicates that the P of a mark 294 // worker is dedicated to running that mark worker. The mark 295 // worker should run without preemption. 296 gcMarkWorkerDedicatedMode gcMarkWorkerMode = iota 297 298 // gcMarkWorkerFractionalMode indicates that a P is currently 299 // running the "fractional" mark worker. The fractional worker 300 // is necessary when GOMAXPROCS*gcBackgroundUtilization is not 301 // an integer. The fractional worker should run until it is 302 // preempted and will be scheduled to pick up the fractional 303 // part of GOMAXPROCS*gcBackgroundUtilization. 304 gcMarkWorkerFractionalMode 305 306 // gcMarkWorkerIdleMode indicates that a P is running the mark 307 // worker because it has nothing else to do. The idle worker 308 // should run until it is preempted and account its time 309 // against gcController.idleMarkTime. 310 gcMarkWorkerIdleMode 311 ) 312 313 // gcMarkWorkerModeStrings are the strings labels of gcMarkWorkerModes 314 // to use in execution traces. 315 var gcMarkWorkerModeStrings = [...]string{ 316 "GC (dedicated)", 317 "GC (fractional)", 318 "GC (idle)", 319 } 320 321 // gcController implements the GC pacing controller that determines 322 // when to trigger concurrent garbage collection and how much marking 323 // work to do in mutator assists and background marking. 324 // 325 // It uses a feedback control algorithm to adjust the memstats.gc_trigger 326 // trigger based on the heap growth and GC CPU utilization each cycle. 327 // This algorithm optimizes for heap growth to match GOGC and for CPU 328 // utilization between assist and background marking to be 25% of 329 // GOMAXPROCS. The high-level design of this algorithm is documented 330 // at https://golang.org/s/go15gcpacing. 331 // 332 // All fields of gcController are used only during a single mark 333 // cycle. 334 var gcController gcControllerState 335 336 type gcControllerState struct { 337 // scanWork is the total scan work performed this cycle. This 338 // is updated atomically during the cycle. Updates occur in 339 // bounded batches, since it is both written and read 340 // throughout the cycle. At the end of the cycle, this is how 341 // much of the retained heap is scannable. 342 // 343 // Currently this is the bytes of heap scanned. For most uses, 344 // this is an opaque unit of work, but for estimation the 345 // definition is important. 346 scanWork int64 347 348 // bgScanCredit is the scan work credit accumulated by the 349 // concurrent background scan. This credit is accumulated by 350 // the background scan and stolen by mutator assists. This is 351 // updated atomically. Updates occur in bounded batches, since 352 // it is both written and read throughout the cycle. 353 bgScanCredit int64 354 355 // assistTime is the nanoseconds spent in mutator assists 356 // during this cycle. This is updated atomically. Updates 357 // occur in bounded batches, since it is both written and read 358 // throughout the cycle. 359 assistTime int64 360 361 // dedicatedMarkTime is the nanoseconds spent in dedicated 362 // mark workers during this cycle. This is updated atomically 363 // at the end of the concurrent mark phase. 364 dedicatedMarkTime int64 365 366 // fractionalMarkTime is the nanoseconds spent in the 367 // fractional mark worker during this cycle. This is updated 368 // atomically throughout the cycle and will be up-to-date if 369 // the fractional mark worker is not currently running. 370 fractionalMarkTime int64 371 372 // idleMarkTime is the nanoseconds spent in idle marking 373 // during this cycle. This is updated atomically throughout 374 // the cycle. 375 idleMarkTime int64 376 377 // markStartTime is the absolute start time in nanoseconds 378 // that assists and background mark workers started. 379 markStartTime int64 380 381 // dedicatedMarkWorkersNeeded is the number of dedicated mark 382 // workers that need to be started. This is computed at the 383 // beginning of each cycle and decremented atomically as 384 // dedicated mark workers get started. 385 dedicatedMarkWorkersNeeded int64 386 387 // assistWorkPerByte is the ratio of scan work to allocated 388 // bytes that should be performed by mutator assists. This is 389 // computed at the beginning of each cycle and updated every 390 // time heap_scan is updated. 391 assistWorkPerByte float64 392 393 // assistBytesPerWork is 1/assistWorkPerByte. 394 assistBytesPerWork float64 395 396 // fractionalUtilizationGoal is the fraction of wall clock 397 // time that should be spent in the fractional mark worker on 398 // each P that isn't running a dedicated worker. 399 // 400 // For example, if the utilization goal is 25% and there are 401 // no dedicated workers, this will be 0.25. If the goal is 402 // 25%, there is one dedicated worker, and GOMAXPROCS is 5, 403 // this will be 0.05 to make up the missing 5%. 404 // 405 // If this is zero, no fractional workers are needed. 406 fractionalUtilizationGoal float64 407 408 _ cpu.CacheLinePad 409 } 410 411 // startCycle resets the GC controller's state and computes estimates 412 // for a new GC cycle. The caller must hold worldsema. 413 func (c *gcControllerState) startCycle() { 414 c.scanWork = 0 415 c.bgScanCredit = 0 416 c.assistTime = 0 417 c.dedicatedMarkTime = 0 418 c.fractionalMarkTime = 0 419 c.idleMarkTime = 0 420 421 // Ensure that the heap goal is at least a little larger than 422 // the current live heap size. This may not be the case if GC 423 // start is delayed or if the allocation that pushed heap_live 424 // over gc_trigger is large or if the trigger is really close to 425 // GOGC. Assist is proportional to this distance, so enforce a 426 // minimum distance, even if it means going over the GOGC goal 427 // by a tiny bit. 428 if memstats.next_gc < memstats.heap_live+1024*1024 { 429 memstats.next_gc = memstats.heap_live + 1024*1024 430 } 431 432 // Compute the background mark utilization goal. In general, 433 // this may not come out exactly. We round the number of 434 // dedicated workers so that the utilization is closest to 435 // 25%. For small GOMAXPROCS, this would introduce too much 436 // error, so we add fractional workers in that case. 437 totalUtilizationGoal := float64(gomaxprocs) * gcBackgroundUtilization 438 c.dedicatedMarkWorkersNeeded = int64(totalUtilizationGoal + 0.5) 439 utilError := float64(c.dedicatedMarkWorkersNeeded)/totalUtilizationGoal - 1 440 const maxUtilError = 0.3 441 if utilError < -maxUtilError || utilError > maxUtilError { 442 // Rounding put us more than 30% off our goal. With 443 // gcBackgroundUtilization of 25%, this happens for 444 // GOMAXPROCS<=3 or GOMAXPROCS=6. Enable fractional 445 // workers to compensate. 446 if float64(c.dedicatedMarkWorkersNeeded) > totalUtilizationGoal { 447 // Too many dedicated workers. 448 c.dedicatedMarkWorkersNeeded-- 449 } 450 c.fractionalUtilizationGoal = (totalUtilizationGoal - float64(c.dedicatedMarkWorkersNeeded)) / float64(gomaxprocs) 451 } else { 452 c.fractionalUtilizationGoal = 0 453 } 454 455 // In STW mode, we just want dedicated workers. 456 if debug.gcstoptheworld > 0 { 457 c.dedicatedMarkWorkersNeeded = int64(gomaxprocs) 458 c.fractionalUtilizationGoal = 0 459 } 460 461 // Clear per-P state 462 for _, p := range allp { 463 p.gcAssistTime = 0 464 p.gcFractionalMarkTime = 0 465 } 466 467 // Compute initial values for controls that are updated 468 // throughout the cycle. 469 c.revise() 470 471 if debug.gcpacertrace > 0 { 472 print("pacer: assist ratio=", c.assistWorkPerByte, 473 " (scan ", memstats.heap_scan>>20, " MB in ", 474 work.initialHeapLive>>20, "->", 475 memstats.next_gc>>20, " MB)", 476 " workers=", c.dedicatedMarkWorkersNeeded, 477 "+", c.fractionalUtilizationGoal, "\n") 478 } 479 } 480 481 // revise updates the assist ratio during the GC cycle to account for 482 // improved estimates. This should be called either under STW or 483 // whenever memstats.heap_scan, memstats.heap_live, or 484 // memstats.next_gc is updated (with mheap_.lock held). 485 // 486 // It should only be called when gcBlackenEnabled != 0 (because this 487 // is when assists are enabled and the necessary statistics are 488 // available). 489 func (c *gcControllerState) revise() { 490 gcpercent := gcpercent 491 if gcpercent < 0 { 492 // If GC is disabled but we're running a forced GC, 493 // act like GOGC is huge for the below calculations. 494 gcpercent = 100000 495 } 496 live := atomic.Load64(&memstats.heap_live) 497 498 // Assume we're under the soft goal. Pace GC to complete at 499 // next_gc assuming the heap is in steady-state. 500 heapGoal := int64(memstats.next_gc) 501 502 // Compute the expected scan work remaining. 503 // 504 // This is estimated based on the expected 505 // steady-state scannable heap. For example, with 506 // GOGC=100, only half of the scannable heap is 507 // expected to be live, so that's what we target. 508 // 509 // (This is a float calculation to avoid overflowing on 510 // 100*heap_scan.) 511 scanWorkExpected := int64(float64(memstats.heap_scan) * 100 / float64(100+gcpercent)) 512 513 if live > memstats.next_gc || c.scanWork > scanWorkExpected { 514 // We're past the soft goal, or we've already done more scan 515 // work than we expected. Pace GC so that in the worst case it 516 // will complete by the hard goal. 517 const maxOvershoot = 1.1 518 heapGoal = int64(float64(memstats.next_gc) * maxOvershoot) 519 520 // Compute the upper bound on the scan work remaining. 521 scanWorkExpected = int64(memstats.heap_scan) 522 } 523 524 // Compute the remaining scan work estimate. 525 // 526 // Note that we currently count allocations during GC as both 527 // scannable heap (heap_scan) and scan work completed 528 // (scanWork), so allocation will change this difference 529 // slowly in the soft regime and not at all in the hard 530 // regime. 531 scanWorkRemaining := scanWorkExpected - c.scanWork 532 if scanWorkRemaining < 1000 { 533 // We set a somewhat arbitrary lower bound on 534 // remaining scan work since if we aim a little high, 535 // we can miss by a little. 536 // 537 // We *do* need to enforce that this is at least 1, 538 // since marking is racy and double-scanning objects 539 // may legitimately make the remaining scan work 540 // negative, even in the hard goal regime. 541 scanWorkRemaining = 1000 542 } 543 544 // Compute the heap distance remaining. 545 heapRemaining := heapGoal - int64(live) 546 if heapRemaining <= 0 { 547 // This shouldn't happen, but if it does, avoid 548 // dividing by zero or setting the assist negative. 549 heapRemaining = 1 550 } 551 552 // Compute the mutator assist ratio so by the time the mutator 553 // allocates the remaining heap bytes up to next_gc, it will 554 // have done (or stolen) the remaining amount of scan work. 555 c.assistWorkPerByte = float64(scanWorkRemaining) / float64(heapRemaining) 556 c.assistBytesPerWork = float64(heapRemaining) / float64(scanWorkRemaining) 557 } 558 559 // endCycle computes the trigger ratio for the next cycle. 560 func (c *gcControllerState) endCycle() float64 { 561 if work.userForced { 562 // Forced GC means this cycle didn't start at the 563 // trigger, so where it finished isn't good 564 // information about how to adjust the trigger. 565 // Just leave it where it is. 566 return memstats.triggerRatio 567 } 568 569 // Proportional response gain for the trigger controller. Must 570 // be in [0, 1]. Lower values smooth out transient effects but 571 // take longer to respond to phase changes. Higher values 572 // react to phase changes quickly, but are more affected by 573 // transient changes. Values near 1 may be unstable. 574 const triggerGain = 0.5 575 576 // Compute next cycle trigger ratio. First, this computes the 577 // "error" for this cycle; that is, how far off the trigger 578 // was from what it should have been, accounting for both heap 579 // growth and GC CPU utilization. We compute the actual heap 580 // growth during this cycle and scale that by how far off from 581 // the goal CPU utilization we were (to estimate the heap 582 // growth if we had the desired CPU utilization). The 583 // difference between this estimate and the GOGC-based goal 584 // heap growth is the error. 585 goalGrowthRatio := gcEffectiveGrowthRatio() 586 actualGrowthRatio := float64(memstats.heap_live)/float64(memstats.heap_marked) - 1 587 assistDuration := nanotime() - c.markStartTime 588 589 // Assume background mark hit its utilization goal. 590 utilization := gcBackgroundUtilization 591 // Add assist utilization; avoid divide by zero. 592 if assistDuration > 0 { 593 utilization += float64(c.assistTime) / float64(assistDuration*int64(gomaxprocs)) 594 } 595 596 triggerError := goalGrowthRatio - memstats.triggerRatio - utilization/gcGoalUtilization*(actualGrowthRatio-memstats.triggerRatio) 597 598 // Finally, we adjust the trigger for next time by this error, 599 // damped by the proportional gain. 600 triggerRatio := memstats.triggerRatio + triggerGain*triggerError 601 602 if debug.gcpacertrace > 0 { 603 // Print controller state in terms of the design 604 // document. 605 H_m_prev := memstats.heap_marked 606 h_t := memstats.triggerRatio 607 H_T := memstats.gc_trigger 608 h_a := actualGrowthRatio 609 H_a := memstats.heap_live 610 h_g := goalGrowthRatio 611 H_g := int64(float64(H_m_prev) * (1 + h_g)) 612 u_a := utilization 613 u_g := gcGoalUtilization 614 W_a := c.scanWork 615 print("pacer: H_m_prev=", H_m_prev, 616 " h_t=", h_t, " H_T=", H_T, 617 " h_a=", h_a, " H_a=", H_a, 618 " h_g=", h_g, " H_g=", H_g, 619 " u_a=", u_a, " u_g=", u_g, 620 " W_a=", W_a, 621 " goalΔ=", goalGrowthRatio-h_t, 622 " actualΔ=", h_a-h_t, 623 " u_a/u_g=", u_a/u_g, 624 "\n") 625 } 626 627 return triggerRatio 628 } 629 630 // enlistWorker encourages another dedicated mark worker to start on 631 // another P if there are spare worker slots. It is used by putfull 632 // when more work is made available. 633 // 634 //go:nowritebarrier 635 func (c *gcControllerState) enlistWorker() { 636 // If there are idle Ps, wake one so it will run an idle worker. 637 // NOTE: This is suspected of causing deadlocks. See golang.org/issue/19112. 638 // 639 // if atomic.Load(&sched.npidle) != 0 && atomic.Load(&sched.nmspinning) == 0 { 640 // wakep() 641 // return 642 // } 643 644 // There are no idle Ps. If we need more dedicated workers, 645 // try to preempt a running P so it will switch to a worker. 646 if c.dedicatedMarkWorkersNeeded <= 0 { 647 return 648 } 649 // Pick a random other P to preempt. 650 if gomaxprocs <= 1 { 651 return 652 } 653 gp := getg() 654 if gp == nil || gp.m == nil || gp.m.p == 0 { 655 return 656 } 657 myID := gp.m.p.ptr().id 658 for tries := 0; tries < 5; tries++ { 659 id := int32(fastrandn(uint32(gomaxprocs - 1))) 660 if id >= myID { 661 id++ 662 } 663 p := allp[id] 664 if p.status != _Prunning { 665 continue 666 } 667 if preemptone(p) { 668 return 669 } 670 } 671 } 672 673 // findRunnableGCWorker returns the background mark worker for _p_ if it 674 // should be run. This must only be called when gcBlackenEnabled != 0. 675 func (c *gcControllerState) findRunnableGCWorker(_p_ *p) *g { 676 if gcBlackenEnabled == 0 { 677 throw("gcControllerState.findRunnable: blackening not enabled") 678 } 679 if _p_.gcBgMarkWorker == 0 { 680 // The mark worker associated with this P is blocked 681 // performing a mark transition. We can't run it 682 // because it may be on some other run or wait queue. 683 return nil 684 } 685 686 if !gcMarkWorkAvailable(_p_) { 687 // No work to be done right now. This can happen at 688 // the end of the mark phase when there are still 689 // assists tapering off. Don't bother running a worker 690 // now because it'll just return immediately. 691 return nil 692 } 693 694 decIfPositive := func(ptr *int64) bool { 695 if *ptr > 0 { 696 if atomic.Xaddint64(ptr, -1) >= 0 { 697 return true 698 } 699 // We lost a race 700 atomic.Xaddint64(ptr, +1) 701 } 702 return false 703 } 704 705 if decIfPositive(&c.dedicatedMarkWorkersNeeded) { 706 // This P is now dedicated to marking until the end of 707 // the concurrent mark phase. 708 _p_.gcMarkWorkerMode = gcMarkWorkerDedicatedMode 709 } else if c.fractionalUtilizationGoal == 0 { 710 // No need for fractional workers. 711 return nil 712 } else { 713 // Is this P behind on the fractional utilization 714 // goal? 715 // 716 // This should be kept in sync with pollFractionalWorkerExit. 717 delta := nanotime() - gcController.markStartTime 718 if delta > 0 && float64(_p_.gcFractionalMarkTime)/float64(delta) > c.fractionalUtilizationGoal { 719 // Nope. No need to run a fractional worker. 720 return nil 721 } 722 // Run a fractional worker. 723 _p_.gcMarkWorkerMode = gcMarkWorkerFractionalMode 724 } 725 726 // Run the background mark worker 727 gp := _p_.gcBgMarkWorker.ptr() 728 casgstatus(gp, _Gwaiting, _Grunnable) 729 if trace.enabled { 730 traceGoUnpark(gp, 0) 731 } 732 return gp 733 } 734 735 // pollFractionalWorkerExit reports whether a fractional mark worker 736 // should self-preempt. It assumes it is called from the fractional 737 // worker. 738 func pollFractionalWorkerExit() bool { 739 // This should be kept in sync with the fractional worker 740 // scheduler logic in findRunnableGCWorker. 741 now := nanotime() 742 delta := now - gcController.markStartTime 743 if delta <= 0 { 744 return true 745 } 746 p := getg().m.p.ptr() 747 selfTime := p.gcFractionalMarkTime + (now - p.gcMarkWorkerStartTime) 748 // Add some slack to the utilization goal so that the 749 // fractional worker isn't behind again the instant it exits. 750 return float64(selfTime)/float64(delta) > 1.2*gcController.fractionalUtilizationGoal 751 } 752 753 // gcSetTriggerRatio sets the trigger ratio and updates everything 754 // derived from it: the absolute trigger, the heap goal, mark pacing, 755 // and sweep pacing. 756 // 757 // This can be called any time. If GC is the in the middle of a 758 // concurrent phase, it will adjust the pacing of that phase. 759 // 760 // This depends on gcpercent, memstats.heap_marked, and 761 // memstats.heap_live. These must be up to date. 762 // 763 // mheap_.lock must be held or the world must be stopped. 764 func gcSetTriggerRatio(triggerRatio float64) { 765 // Compute the next GC goal, which is when the allocated heap 766 // has grown by GOGC/100 over the heap marked by the last 767 // cycle. 768 goal := ^uint64(0) 769 if gcpercent >= 0 { 770 goal = memstats.heap_marked + memstats.heap_marked*uint64(gcpercent)/100 771 } 772 773 // Set the trigger ratio, capped to reasonable bounds. 774 if gcpercent >= 0 { 775 scalingFactor := float64(gcpercent) / 100 776 // Ensure there's always a little margin so that the 777 // mutator assist ratio isn't infinity. 778 maxTriggerRatio := 0.95 * scalingFactor 779 if triggerRatio > maxTriggerRatio { 780 triggerRatio = maxTriggerRatio 781 } 782 783 // If we let triggerRatio go too low, then if the application 784 // is allocating very rapidly we might end up in a situation 785 // where we're allocating black during a nearly always-on GC. 786 // The result of this is a growing heap and ultimately an 787 // increase in RSS. By capping us at a point >0, we're essentially 788 // saying that we're OK using more CPU during the GC to prevent 789 // this growth in RSS. 790 // 791 // The current constant was chosen empirically: given a sufficiently 792 // fast/scalable allocator with 48 Ps that could drive the trigger ratio 793 // to <0.05, this constant causes applications to retain the same peak 794 // RSS compared to not having this allocator. 795 minTriggerRatio := 0.6 * scalingFactor 796 if triggerRatio < minTriggerRatio { 797 triggerRatio = minTriggerRatio 798 } 799 } else if triggerRatio < 0 { 800 // gcpercent < 0, so just make sure we're not getting a negative 801 // triggerRatio. This case isn't expected to happen in practice, 802 // and doesn't really matter because if gcpercent < 0 then we won't 803 // ever consume triggerRatio further on in this function, but let's 804 // just be defensive here; the triggerRatio being negative is almost 805 // certainly undesirable. 806 triggerRatio = 0 807 } 808 memstats.triggerRatio = triggerRatio 809 810 // Compute the absolute GC trigger from the trigger ratio. 811 // 812 // We trigger the next GC cycle when the allocated heap has 813 // grown by the trigger ratio over the marked heap size. 814 trigger := ^uint64(0) 815 if gcpercent >= 0 { 816 trigger = uint64(float64(memstats.heap_marked) * (1 + triggerRatio)) 817 // Don't trigger below the minimum heap size. 818 minTrigger := heapminimum 819 if !isSweepDone() { 820 // Concurrent sweep happens in the heap growth 821 // from heap_live to gc_trigger, so ensure 822 // that concurrent sweep has some heap growth 823 // in which to perform sweeping before we 824 // start the next GC cycle. 825 sweepMin := atomic.Load64(&memstats.heap_live) + sweepMinHeapDistance 826 if sweepMin > minTrigger { 827 minTrigger = sweepMin 828 } 829 } 830 if trigger < minTrigger { 831 trigger = minTrigger 832 } 833 if int64(trigger) < 0 { 834 print("runtime: next_gc=", memstats.next_gc, " heap_marked=", memstats.heap_marked, " heap_live=", memstats.heap_live, " initialHeapLive=", work.initialHeapLive, "triggerRatio=", triggerRatio, " minTrigger=", minTrigger, "\n") 835 throw("gc_trigger underflow") 836 } 837 if trigger > goal { 838 // The trigger ratio is always less than GOGC/100, but 839 // other bounds on the trigger may have raised it. 840 // Push up the goal, too. 841 goal = trigger 842 } 843 } 844 845 // Commit to the trigger and goal. 846 memstats.gc_trigger = trigger 847 memstats.next_gc = goal 848 if trace.enabled { 849 traceNextGC() 850 } 851 852 // Update mark pacing. 853 if gcphase != _GCoff { 854 gcController.revise() 855 } 856 857 // Update sweep pacing. 858 if isSweepDone() { 859 mheap_.sweepPagesPerByte = 0 860 } else { 861 // Concurrent sweep needs to sweep all of the in-use 862 // pages by the time the allocated heap reaches the GC 863 // trigger. Compute the ratio of in-use pages to sweep 864 // per byte allocated, accounting for the fact that 865 // some might already be swept. 866 heapLiveBasis := atomic.Load64(&memstats.heap_live) 867 heapDistance := int64(trigger) - int64(heapLiveBasis) 868 // Add a little margin so rounding errors and 869 // concurrent sweep are less likely to leave pages 870 // unswept when GC starts. 871 heapDistance -= 1024 * 1024 872 if heapDistance < _PageSize { 873 // Avoid setting the sweep ratio extremely high 874 heapDistance = _PageSize 875 } 876 pagesSwept := atomic.Load64(&mheap_.pagesSwept) 877 pagesInUse := atomic.Load64(&mheap_.pagesInUse) 878 sweepDistancePages := int64(pagesInUse) - int64(pagesSwept) 879 if sweepDistancePages <= 0 { 880 mheap_.sweepPagesPerByte = 0 881 } else { 882 mheap_.sweepPagesPerByte = float64(sweepDistancePages) / float64(heapDistance) 883 mheap_.sweepHeapLiveBasis = heapLiveBasis 884 // Write pagesSweptBasis last, since this 885 // signals concurrent sweeps to recompute 886 // their debt. 887 atomic.Store64(&mheap_.pagesSweptBasis, pagesSwept) 888 } 889 } 890 891 gcPaceScavenger() 892 } 893 894 // gcEffectiveGrowthRatio returns the current effective heap growth 895 // ratio (GOGC/100) based on heap_marked from the previous GC and 896 // next_gc for the current GC. 897 // 898 // This may differ from gcpercent/100 because of various upper and 899 // lower bounds on gcpercent. For example, if the heap is smaller than 900 // heapminimum, this can be higher than gcpercent/100. 901 // 902 // mheap_.lock must be held or the world must be stopped. 903 func gcEffectiveGrowthRatio() float64 { 904 egogc := float64(memstats.next_gc-memstats.heap_marked) / float64(memstats.heap_marked) 905 if egogc < 0 { 906 // Shouldn't happen, but just in case. 907 egogc = 0 908 } 909 return egogc 910 } 911 912 // gcGoalUtilization is the goal CPU utilization for 913 // marking as a fraction of GOMAXPROCS. 914 const gcGoalUtilization = 0.30 915 916 // gcBackgroundUtilization is the fixed CPU utilization for background 917 // marking. It must be <= gcGoalUtilization. The difference between 918 // gcGoalUtilization and gcBackgroundUtilization will be made up by 919 // mark assists. The scheduler will aim to use within 50% of this 920 // goal. 921 // 922 // Setting this to < gcGoalUtilization avoids saturating the trigger 923 // feedback controller when there are no assists, which allows it to 924 // better control CPU and heap growth. However, the larger the gap, 925 // the more mutator assists are expected to happen, which impact 926 // mutator latency. 927 const gcBackgroundUtilization = 0.25 928 929 // gcCreditSlack is the amount of scan work credit that can 930 // accumulate locally before updating gcController.scanWork and, 931 // optionally, gcController.bgScanCredit. Lower values give a more 932 // accurate assist ratio and make it more likely that assists will 933 // successfully steal background credit. Higher values reduce memory 934 // contention. 935 const gcCreditSlack = 2000 936 937 // gcAssistTimeSlack is the nanoseconds of mutator assist time that 938 // can accumulate on a P before updating gcController.assistTime. 939 const gcAssistTimeSlack = 5000 940 941 // gcOverAssistWork determines how many extra units of scan work a GC 942 // assist does when an assist happens. This amortizes the cost of an 943 // assist by pre-paying for this many bytes of future allocations. 944 const gcOverAssistWork = 64 << 10 945 946 var work struct { 947 full lfstack // lock-free list of full blocks workbuf 948 empty lfstack // lock-free list of empty blocks workbuf 949 pad0 cpu.CacheLinePad // prevents false-sharing between full/empty and nproc/nwait 950 951 wbufSpans struct { 952 lock mutex 953 // free is a list of spans dedicated to workbufs, but 954 // that don't currently contain any workbufs. 955 free mSpanList 956 // busy is a list of all spans containing workbufs on 957 // one of the workbuf lists. 958 busy mSpanList 959 } 960 961 // Restore 64-bit alignment on 32-bit. 962 _ uint32 963 964 // bytesMarked is the number of bytes marked this cycle. This 965 // includes bytes blackened in scanned objects, noscan objects 966 // that go straight to black, and permagrey objects scanned by 967 // markroot during the concurrent scan phase. This is updated 968 // atomically during the cycle. Updates may be batched 969 // arbitrarily, since the value is only read at the end of the 970 // cycle. 971 // 972 // Because of benign races during marking, this number may not 973 // be the exact number of marked bytes, but it should be very 974 // close. 975 // 976 // Put this field here because it needs 64-bit atomic access 977 // (and thus 8-byte alignment even on 32-bit architectures). 978 bytesMarked uint64 979 980 markrootNext uint32 // next markroot job 981 markrootJobs uint32 // number of markroot jobs 982 983 nproc uint32 984 tstart int64 985 nwait uint32 986 ndone uint32 987 988 // Number of roots of various root types. Set by gcMarkRootPrepare. 989 nFlushCacheRoots int 990 nDataRoots, nBSSRoots, nSpanRoots, nStackRoots int 991 992 // Each type of GC state transition is protected by a lock. 993 // Since multiple threads can simultaneously detect the state 994 // transition condition, any thread that detects a transition 995 // condition must acquire the appropriate transition lock, 996 // re-check the transition condition and return if it no 997 // longer holds or perform the transition if it does. 998 // Likewise, any transition must invalidate the transition 999 // condition before releasing the lock. This ensures that each 1000 // transition is performed by exactly one thread and threads 1001 // that need the transition to happen block until it has 1002 // happened. 1003 // 1004 // startSema protects the transition from "off" to mark or 1005 // mark termination. 1006 startSema uint32 1007 // markDoneSema protects transitions from mark to mark termination. 1008 markDoneSema uint32 1009 1010 bgMarkReady note // signal background mark worker has started 1011 bgMarkDone uint32 // cas to 1 when at a background mark completion point 1012 // Background mark completion signaling 1013 1014 // mode is the concurrency mode of the current GC cycle. 1015 mode gcMode 1016 1017 // userForced indicates the current GC cycle was forced by an 1018 // explicit user call. 1019 userForced bool 1020 1021 // totaltime is the CPU nanoseconds spent in GC since the 1022 // program started if debug.gctrace > 0. 1023 totaltime int64 1024 1025 // initialHeapLive is the value of memstats.heap_live at the 1026 // beginning of this GC cycle. 1027 initialHeapLive uint64 1028 1029 // assistQueue is a queue of assists that are blocked because 1030 // there was neither enough credit to steal or enough work to 1031 // do. 1032 assistQueue struct { 1033 lock mutex 1034 q gQueue 1035 } 1036 1037 // sweepWaiters is a list of blocked goroutines to wake when 1038 // we transition from mark termination to sweep. 1039 sweepWaiters struct { 1040 lock mutex 1041 list gList 1042 } 1043 1044 // cycles is the number of completed GC cycles, where a GC 1045 // cycle is sweep termination, mark, mark termination, and 1046 // sweep. This differs from memstats.numgc, which is 1047 // incremented at mark termination. 1048 cycles uint32 1049 1050 // Timing/utilization stats for this cycle. 1051 stwprocs, maxprocs int32 1052 tSweepTerm, tMark, tMarkTerm, tEnd int64 // nanotime() of phase start 1053 1054 pauseNS int64 // total STW time this cycle 1055 pauseStart int64 // nanotime() of last STW 1056 1057 // debug.gctrace heap sizes for this cycle. 1058 heap0, heap1, heap2, heapGoal uint64 1059 } 1060 1061 // GC runs a garbage collection and blocks the caller until the 1062 // garbage collection is complete. It may also block the entire 1063 // program. 1064 func GC() { 1065 // We consider a cycle to be: sweep termination, mark, mark 1066 // termination, and sweep. This function shouldn't return 1067 // until a full cycle has been completed, from beginning to 1068 // end. Hence, we always want to finish up the current cycle 1069 // and start a new one. That means: 1070 // 1071 // 1. In sweep termination, mark, or mark termination of cycle 1072 // N, wait until mark termination N completes and transitions 1073 // to sweep N. 1074 // 1075 // 2. In sweep N, help with sweep N. 1076 // 1077 // At this point we can begin a full cycle N+1. 1078 // 1079 // 3. Trigger cycle N+1 by starting sweep termination N+1. 1080 // 1081 // 4. Wait for mark termination N+1 to complete. 1082 // 1083 // 5. Help with sweep N+1 until it's done. 1084 // 1085 // This all has to be written to deal with the fact that the 1086 // GC may move ahead on its own. For example, when we block 1087 // until mark termination N, we may wake up in cycle N+2. 1088 1089 // Wait until the current sweep termination, mark, and mark 1090 // termination complete. 1091 n := atomic.Load(&work.cycles) 1092 gcWaitOnMark(n) 1093 1094 // We're now in sweep N or later. Trigger GC cycle N+1, which 1095 // will first finish sweep N if necessary and then enter sweep 1096 // termination N+1. 1097 gcStart(gcTrigger{kind: gcTriggerCycle, n: n + 1}) 1098 1099 // Wait for mark termination N+1 to complete. 1100 gcWaitOnMark(n + 1) 1101 1102 // Finish sweep N+1 before returning. We do this both to 1103 // complete the cycle and because runtime.GC() is often used 1104 // as part of tests and benchmarks to get the system into a 1105 // relatively stable and isolated state. 1106 for atomic.Load(&work.cycles) == n+1 && sweepone() != ^uintptr(0) { 1107 sweep.nbgsweep++ 1108 Gosched() 1109 } 1110 1111 // Callers may assume that the heap profile reflects the 1112 // just-completed cycle when this returns (historically this 1113 // happened because this was a STW GC), but right now the 1114 // profile still reflects mark termination N, not N+1. 1115 // 1116 // As soon as all of the sweep frees from cycle N+1 are done, 1117 // we can go ahead and publish the heap profile. 1118 // 1119 // First, wait for sweeping to finish. (We know there are no 1120 // more spans on the sweep queue, but we may be concurrently 1121 // sweeping spans, so we have to wait.) 1122 for atomic.Load(&work.cycles) == n+1 && atomic.Load(&mheap_.sweepers) != 0 { 1123 Gosched() 1124 } 1125 1126 // Now we're really done with sweeping, so we can publish the 1127 // stable heap profile. Only do this if we haven't already hit 1128 // another mark termination. 1129 mp := acquirem() 1130 cycle := atomic.Load(&work.cycles) 1131 if cycle == n+1 || (gcphase == _GCmark && cycle == n+2) { 1132 mProf_PostSweep() 1133 } 1134 releasem(mp) 1135 } 1136 1137 // gcWaitOnMark blocks until GC finishes the Nth mark phase. If GC has 1138 // already completed this mark phase, it returns immediately. 1139 func gcWaitOnMark(n uint32) { 1140 for { 1141 // Disable phase transitions. 1142 lock(&work.sweepWaiters.lock) 1143 nMarks := atomic.Load(&work.cycles) 1144 if gcphase != _GCmark { 1145 // We've already completed this cycle's mark. 1146 nMarks++ 1147 } 1148 if nMarks > n { 1149 // We're done. 1150 unlock(&work.sweepWaiters.lock) 1151 return 1152 } 1153 1154 // Wait until sweep termination, mark, and mark 1155 // termination of cycle N complete. 1156 work.sweepWaiters.list.push(getg()) 1157 goparkunlock(&work.sweepWaiters.lock, waitReasonWaitForGCCycle, traceEvGoBlock, 1) 1158 } 1159 } 1160 1161 // gcMode indicates how concurrent a GC cycle should be. 1162 type gcMode int 1163 1164 const ( 1165 gcBackgroundMode gcMode = iota // concurrent GC and sweep 1166 gcForceMode // stop-the-world GC now, concurrent sweep 1167 gcForceBlockMode // stop-the-world GC now and STW sweep (forced by user) 1168 ) 1169 1170 // A gcTrigger is a predicate for starting a GC cycle. Specifically, 1171 // it is an exit condition for the _GCoff phase. 1172 type gcTrigger struct { 1173 kind gcTriggerKind 1174 now int64 // gcTriggerTime: current time 1175 n uint32 // gcTriggerCycle: cycle number to start 1176 } 1177 1178 type gcTriggerKind int 1179 1180 const ( 1181 // gcTriggerHeap indicates that a cycle should be started when 1182 // the heap size reaches the trigger heap size computed by the 1183 // controller. 1184 gcTriggerHeap gcTriggerKind = iota 1185 1186 // gcTriggerTime indicates that a cycle should be started when 1187 // it's been more than forcegcperiod nanoseconds since the 1188 // previous GC cycle. 1189 gcTriggerTime 1190 1191 // gcTriggerCycle indicates that a cycle should be started if 1192 // we have not yet started cycle number gcTrigger.n (relative 1193 // to work.cycles). 1194 gcTriggerCycle 1195 ) 1196 1197 // test reports whether the trigger condition is satisfied, meaning 1198 // that the exit condition for the _GCoff phase has been met. The exit 1199 // condition should be tested when allocating. 1200 func (t gcTrigger) test() bool { 1201 if !memstats.enablegc || panicking != 0 || gcphase != _GCoff { 1202 return false 1203 } 1204 switch t.kind { 1205 case gcTriggerHeap: 1206 // Non-atomic access to heap_live for performance. If 1207 // we are going to trigger on this, this thread just 1208 // atomically wrote heap_live anyway and we'll see our 1209 // own write. 1210 return memstats.heap_live >= memstats.gc_trigger 1211 case gcTriggerTime: 1212 if gcpercent < 0 { 1213 return false 1214 } 1215 lastgc := int64(atomic.Load64(&memstats.last_gc_nanotime)) 1216 return lastgc != 0 && t.now-lastgc > forcegcperiod 1217 case gcTriggerCycle: 1218 // t.n > work.cycles, but accounting for wraparound. 1219 return int32(t.n-work.cycles) > 0 1220 } 1221 return true 1222 } 1223 1224 // gcStart starts the GC. It transitions from _GCoff to _GCmark (if 1225 // debug.gcstoptheworld == 0) or performs all of GC (if 1226 // debug.gcstoptheworld != 0). 1227 // 1228 // This may return without performing this transition in some cases, 1229 // such as when called on a system stack or with locks held. 1230 func gcStart(trigger gcTrigger) { 1231 // Since this is called from malloc and malloc is called in 1232 // the guts of a number of libraries that might be holding 1233 // locks, don't attempt to start GC in non-preemptible or 1234 // potentially unstable situations. 1235 mp := acquirem() 1236 if gp := getg(); gp == mp.g0 || mp.locks > 1 || mp.preemptoff != "" { 1237 releasem(mp) 1238 return 1239 } 1240 releasem(mp) 1241 mp = nil 1242 1243 // Pick up the remaining unswept/not being swept spans concurrently 1244 // 1245 // This shouldn't happen if we're being invoked in background 1246 // mode since proportional sweep should have just finished 1247 // sweeping everything, but rounding errors, etc, may leave a 1248 // few spans unswept. In forced mode, this is necessary since 1249 // GC can be forced at any point in the sweeping cycle. 1250 // 1251 // We check the transition condition continuously here in case 1252 // this G gets delayed in to the next GC cycle. 1253 for trigger.test() && sweepone() != ^uintptr(0) { 1254 sweep.nbgsweep++ 1255 } 1256 1257 // Perform GC initialization and the sweep termination 1258 // transition. 1259 semacquire(&work.startSema) 1260 // Re-check transition condition under transition lock. 1261 if !trigger.test() { 1262 semrelease(&work.startSema) 1263 return 1264 } 1265 1266 // For stats, check if this GC was forced by the user. 1267 work.userForced = trigger.kind == gcTriggerCycle 1268 1269 // In gcstoptheworld debug mode, upgrade the mode accordingly. 1270 // We do this after re-checking the transition condition so 1271 // that multiple goroutines that detect the heap trigger don't 1272 // start multiple STW GCs. 1273 mode := gcBackgroundMode 1274 if debug.gcstoptheworld == 1 { 1275 mode = gcForceMode 1276 } else if debug.gcstoptheworld == 2 { 1277 mode = gcForceBlockMode 1278 } 1279 1280 // Ok, we're doing it! Stop everybody else 1281 semacquire(&gcsema) 1282 semacquire(&worldsema) 1283 1284 if trace.enabled { 1285 traceGCStart() 1286 } 1287 1288 // Check that all Ps have finished deferred mcache flushes. 1289 for _, p := range allp { 1290 if fg := atomic.Load(&p.mcache.flushGen); fg != mheap_.sweepgen { 1291 println("runtime: p", p.id, "flushGen", fg, "!= sweepgen", mheap_.sweepgen) 1292 throw("p mcache not flushed") 1293 } 1294 } 1295 1296 gcBgMarkStartWorkers() 1297 1298 systemstack(gcResetMarkState) 1299 1300 work.stwprocs, work.maxprocs = gomaxprocs, gomaxprocs 1301 if work.stwprocs > ncpu { 1302 // This is used to compute CPU time of the STW phases, 1303 // so it can't be more than ncpu, even if GOMAXPROCS is. 1304 work.stwprocs = ncpu 1305 } 1306 work.heap0 = atomic.Load64(&memstats.heap_live) 1307 work.pauseNS = 0 1308 work.mode = mode 1309 1310 now := nanotime() 1311 work.tSweepTerm = now 1312 work.pauseStart = now 1313 if trace.enabled { 1314 traceGCSTWStart(1) 1315 } 1316 systemstack(stopTheWorldWithSema) 1317 // Finish sweep before we start concurrent scan. 1318 systemstack(func() { 1319 finishsweep_m() 1320 }) 1321 1322 // clearpools before we start the GC. If we wait they memory will not be 1323 // reclaimed until the next GC cycle. 1324 clearpools() 1325 1326 work.cycles++ 1327 1328 gcController.startCycle() 1329 work.heapGoal = memstats.next_gc 1330 1331 // In STW mode, disable scheduling of user Gs. This may also 1332 // disable scheduling of this goroutine, so it may block as 1333 // soon as we start the world again. 1334 if mode != gcBackgroundMode { 1335 schedEnableUser(false) 1336 } 1337 1338 // Enter concurrent mark phase and enable 1339 // write barriers. 1340 // 1341 // Because the world is stopped, all Ps will 1342 // observe that write barriers are enabled by 1343 // the time we start the world and begin 1344 // scanning. 1345 // 1346 // Write barriers must be enabled before assists are 1347 // enabled because they must be enabled before 1348 // any non-leaf heap objects are marked. Since 1349 // allocations are blocked until assists can 1350 // happen, we want enable assists as early as 1351 // possible. 1352 setGCPhase(_GCmark) 1353 1354 gcBgMarkPrepare() // Must happen before assist enable. 1355 gcMarkRootPrepare() 1356 1357 // Mark all active tinyalloc blocks. Since we're 1358 // allocating from these, they need to be black like 1359 // other allocations. The alternative is to blacken 1360 // the tiny block on every allocation from it, which 1361 // would slow down the tiny allocator. 1362 gcMarkTinyAllocs() 1363 1364 // At this point all Ps have enabled the write 1365 // barrier, thus maintaining the no white to 1366 // black invariant. Enable mutator assists to 1367 // put back-pressure on fast allocating 1368 // mutators. 1369 atomic.Store(&gcBlackenEnabled, 1) 1370 1371 // Assists and workers can start the moment we start 1372 // the world. 1373 gcController.markStartTime = now 1374 1375 // In STW mode, we could block the instant systemstack 1376 // returns, so make sure we're not preemptible. 1377 mp = acquirem() 1378 1379 // Concurrent mark. 1380 systemstack(func() { 1381 now = startTheWorldWithSema(trace.enabled) 1382 work.pauseNS += now - work.pauseStart 1383 work.tMark = now 1384 }) 1385 1386 // Release the world sema before Gosched() in STW mode 1387 // because we will need to reacquire it later but before 1388 // this goroutine becomes runnable again, and we could 1389 // self-deadlock otherwise. 1390 semrelease(&worldsema) 1391 releasem(mp) 1392 1393 // Make sure we block instead of returning to user code 1394 // in STW mode. 1395 if mode != gcBackgroundMode { 1396 Gosched() 1397 } 1398 1399 semrelease(&work.startSema) 1400 } 1401 1402 // gcMarkDoneFlushed counts the number of P's with flushed work. 1403 // 1404 // Ideally this would be a captured local in gcMarkDone, but forEachP 1405 // escapes its callback closure, so it can't capture anything. 1406 // 1407 // This is protected by markDoneSema. 1408 var gcMarkDoneFlushed uint32 1409 1410 // debugCachedWork enables extra checks for debugging premature mark 1411 // termination. 1412 // 1413 // For debugging issue #27993. 1414 const debugCachedWork = false 1415 1416 // gcWorkPauseGen is for debugging the mark completion algorithm. 1417 // gcWork put operations spin while gcWork.pauseGen == gcWorkPauseGen. 1418 // Only used if debugCachedWork is true. 1419 // 1420 // For debugging issue #27993. 1421 var gcWorkPauseGen uint32 = 1 1422 1423 // gcMarkDone transitions the GC from mark to mark termination if all 1424 // reachable objects have been marked (that is, there are no grey 1425 // objects and can be no more in the future). Otherwise, it flushes 1426 // all local work to the global queues where it can be discovered by 1427 // other workers. 1428 // 1429 // This should be called when all local mark work has been drained and 1430 // there are no remaining workers. Specifically, when 1431 // 1432 // work.nwait == work.nproc && !gcMarkWorkAvailable(p) 1433 // 1434 // The calling context must be preemptible. 1435 // 1436 // Flushing local work is important because idle Ps may have local 1437 // work queued. This is the only way to make that work visible and 1438 // drive GC to completion. 1439 // 1440 // It is explicitly okay to have write barriers in this function. If 1441 // it does transition to mark termination, then all reachable objects 1442 // have been marked, so the write barrier cannot shade any more 1443 // objects. 1444 func gcMarkDone() { 1445 // Ensure only one thread is running the ragged barrier at a 1446 // time. 1447 semacquire(&work.markDoneSema) 1448 1449 top: 1450 // Re-check transition condition under transition lock. 1451 // 1452 // It's critical that this checks the global work queues are 1453 // empty before performing the ragged barrier. Otherwise, 1454 // there could be global work that a P could take after the P 1455 // has passed the ragged barrier. 1456 if !(gcphase == _GCmark && work.nwait == work.nproc && !gcMarkWorkAvailable(nil)) { 1457 semrelease(&work.markDoneSema) 1458 return 1459 } 1460 1461 // forEachP needs worldsema to execute, and we'll need it to 1462 // stop the world later, so acquire worldsema now. 1463 semacquire(&worldsema) 1464 1465 // Flush all local buffers and collect flushedWork flags. 1466 gcMarkDoneFlushed = 0 1467 systemstack(func() { 1468 gp := getg().m.curg 1469 // Mark the user stack as preemptible so that it may be scanned. 1470 // Otherwise, our attempt to force all P's to a safepoint could 1471 // result in a deadlock as we attempt to preempt a worker that's 1472 // trying to preempt us (e.g. for a stack scan). 1473 casgstatus(gp, _Grunning, _Gwaiting) 1474 forEachP(func(_p_ *p) { 1475 // Flush the write barrier buffer, since this may add 1476 // work to the gcWork. 1477 wbBufFlush1(_p_) 1478 // For debugging, shrink the write barrier 1479 // buffer so it flushes immediately. 1480 // wbBuf.reset will keep it at this size as 1481 // long as throwOnGCWork is set. 1482 if debugCachedWork { 1483 b := &_p_.wbBuf 1484 b.end = uintptr(unsafe.Pointer(&b.buf[wbBufEntryPointers])) 1485 b.debugGen = gcWorkPauseGen 1486 } 1487 // Flush the gcWork, since this may create global work 1488 // and set the flushedWork flag. 1489 // 1490 // TODO(austin): Break up these workbufs to 1491 // better distribute work. 1492 _p_.gcw.dispose() 1493 // Collect the flushedWork flag. 1494 if _p_.gcw.flushedWork { 1495 atomic.Xadd(&gcMarkDoneFlushed, 1) 1496 _p_.gcw.flushedWork = false 1497 } else if debugCachedWork { 1498 // For debugging, freeze the gcWork 1499 // until we know whether we've reached 1500 // completion or not. If we think 1501 // we've reached completion, but 1502 // there's a paused gcWork, then 1503 // that's a bug. 1504 _p_.gcw.pauseGen = gcWorkPauseGen 1505 // Capture the G's stack. 1506 for i := range _p_.gcw.pauseStack { 1507 _p_.gcw.pauseStack[i] = 0 1508 } 1509 callers(1, _p_.gcw.pauseStack[:]) 1510 } 1511 }) 1512 casgstatus(gp, _Gwaiting, _Grunning) 1513 }) 1514 1515 if gcMarkDoneFlushed != 0 { 1516 if debugCachedWork { 1517 // Release paused gcWorks. 1518 atomic.Xadd(&gcWorkPauseGen, 1) 1519 } 1520 // More grey objects were discovered since the 1521 // previous termination check, so there may be more 1522 // work to do. Keep going. It's possible the 1523 // transition condition became true again during the 1524 // ragged barrier, so re-check it. 1525 semrelease(&worldsema) 1526 goto top 1527 } 1528 1529 if debugCachedWork { 1530 throwOnGCWork = true 1531 // Release paused gcWorks. If there are any, they 1532 // should now observe throwOnGCWork and panic. 1533 atomic.Xadd(&gcWorkPauseGen, 1) 1534 } 1535 1536 // There was no global work, no local work, and no Ps 1537 // communicated work since we took markDoneSema. Therefore 1538 // there are no grey objects and no more objects can be 1539 // shaded. Transition to mark termination. 1540 now := nanotime() 1541 work.tMarkTerm = now 1542 work.pauseStart = now 1543 getg().m.preemptoff = "gcing" 1544 if trace.enabled { 1545 traceGCSTWStart(0) 1546 } 1547 systemstack(stopTheWorldWithSema) 1548 // The gcphase is _GCmark, it will transition to _GCmarktermination 1549 // below. The important thing is that the wb remains active until 1550 // all marking is complete. This includes writes made by the GC. 1551 1552 if debugCachedWork { 1553 // For debugging, double check that no work was added after we 1554 // went around above and disable write barrier buffering. 1555 for _, p := range allp { 1556 gcw := &p.gcw 1557 if !gcw.empty() { 1558 printlock() 1559 print("runtime: P ", p.id, " flushedWork ", gcw.flushedWork) 1560 if gcw.wbuf1 == nil { 1561 print(" wbuf1=<nil>") 1562 } else { 1563 print(" wbuf1.n=", gcw.wbuf1.nobj) 1564 } 1565 if gcw.wbuf2 == nil { 1566 print(" wbuf2=<nil>") 1567 } else { 1568 print(" wbuf2.n=", gcw.wbuf2.nobj) 1569 } 1570 print("\n") 1571 if gcw.pauseGen == gcw.putGen { 1572 println("runtime: checkPut already failed at this generation") 1573 } 1574 throw("throwOnGCWork") 1575 } 1576 } 1577 } else { 1578 // For unknown reasons (see issue #27993), there is 1579 // sometimes work left over when we enter mark 1580 // termination. Detect this and resume concurrent 1581 // mark. This is obviously unfortunate. 1582 // 1583 // Switch to the system stack to call wbBufFlush1, 1584 // though in this case it doesn't matter because we're 1585 // non-preemptible anyway. 1586 restart := false 1587 systemstack(func() { 1588 for _, p := range allp { 1589 wbBufFlush1(p) 1590 if !p.gcw.empty() { 1591 restart = true 1592 break 1593 } 1594 } 1595 }) 1596 if restart { 1597 getg().m.preemptoff = "" 1598 systemstack(func() { 1599 now := startTheWorldWithSema(true) 1600 work.pauseNS += now - work.pauseStart 1601 }) 1602 semrelease(&worldsema) 1603 goto top 1604 } 1605 } 1606 1607 // Disable assists and background workers. We must do 1608 // this before waking blocked assists. 1609 atomic.Store(&gcBlackenEnabled, 0) 1610 1611 // Wake all blocked assists. These will run when we 1612 // start the world again. 1613 gcWakeAllAssists() 1614 1615 // Likewise, release the transition lock. Blocked 1616 // workers and assists will run when we start the 1617 // world again. 1618 semrelease(&work.markDoneSema) 1619 1620 // In STW mode, re-enable user goroutines. These will be 1621 // queued to run after we start the world. 1622 schedEnableUser(true) 1623 1624 // endCycle depends on all gcWork cache stats being flushed. 1625 // The termination algorithm above ensured that up to 1626 // allocations since the ragged barrier. 1627 nextTriggerRatio := gcController.endCycle() 1628 1629 // Perform mark termination. This will restart the world. 1630 gcMarkTermination(nextTriggerRatio) 1631 } 1632 1633 func gcMarkTermination(nextTriggerRatio float64) { 1634 // World is stopped. 1635 // Start marktermination which includes enabling the write barrier. 1636 atomic.Store(&gcBlackenEnabled, 0) 1637 setGCPhase(_GCmarktermination) 1638 1639 work.heap1 = memstats.heap_live 1640 startTime := nanotime() 1641 1642 mp := acquirem() 1643 mp.preemptoff = "gcing" 1644 _g_ := getg() 1645 _g_.m.traceback = 2 1646 gp := _g_.m.curg 1647 casgstatus(gp, _Grunning, _Gwaiting) 1648 gp.waitreason = waitReasonGarbageCollection 1649 1650 // Run gc on the g0 stack. We do this so that the g stack 1651 // we're currently running on will no longer change. Cuts 1652 // the root set down a bit (g0 stacks are not scanned, and 1653 // we don't need to scan gc's internal state). We also 1654 // need to switch to g0 so we can shrink the stack. 1655 systemstack(func() { 1656 gcMark(startTime) 1657 // Must return immediately. 1658 // The outer function's stack may have moved 1659 // during gcMark (it shrinks stacks, including the 1660 // outer function's stack), so we must not refer 1661 // to any of its variables. Return back to the 1662 // non-system stack to pick up the new addresses 1663 // before continuing. 1664 }) 1665 1666 systemstack(func() { 1667 work.heap2 = work.bytesMarked 1668 if debug.gccheckmark > 0 { 1669 // Run a full non-parallel, stop-the-world 1670 // mark using checkmark bits, to check that we 1671 // didn't forget to mark anything during the 1672 // concurrent mark process. 1673 gcResetMarkState() 1674 initCheckmarks() 1675 gcw := &getg().m.p.ptr().gcw 1676 gcDrain(gcw, 0) 1677 wbBufFlush1(getg().m.p.ptr()) 1678 gcw.dispose() 1679 clearCheckmarks() 1680 } 1681 1682 // marking is complete so we can turn the write barrier off 1683 setGCPhase(_GCoff) 1684 gcSweep(work.mode) 1685 }) 1686 1687 _g_.m.traceback = 0 1688 casgstatus(gp, _Gwaiting, _Grunning) 1689 1690 if trace.enabled { 1691 traceGCDone() 1692 } 1693 1694 // all done 1695 mp.preemptoff = "" 1696 1697 if gcphase != _GCoff { 1698 throw("gc done but gcphase != _GCoff") 1699 } 1700 1701 // Record next_gc and heap_inuse for scavenger. 1702 memstats.last_next_gc = memstats.next_gc 1703 memstats.last_heap_inuse = memstats.heap_inuse 1704 1705 // Update GC trigger and pacing for the next cycle. 1706 gcSetTriggerRatio(nextTriggerRatio) 1707 1708 // Update timing memstats 1709 now := nanotime() 1710 sec, nsec, _ := time_now() 1711 unixNow := sec*1e9 + int64(nsec) 1712 work.pauseNS += now - work.pauseStart 1713 work.tEnd = now 1714 atomic.Store64(&memstats.last_gc_unix, uint64(unixNow)) // must be Unix time to make sense to user 1715 atomic.Store64(&memstats.last_gc_nanotime, uint64(now)) // monotonic time for us 1716 memstats.pause_ns[memstats.numgc%uint32(len(memstats.pause_ns))] = uint64(work.pauseNS) 1717 memstats.pause_end[memstats.numgc%uint32(len(memstats.pause_end))] = uint64(unixNow) 1718 memstats.pause_total_ns += uint64(work.pauseNS) 1719 1720 // Update work.totaltime. 1721 sweepTermCpu := int64(work.stwprocs) * (work.tMark - work.tSweepTerm) 1722 // We report idle marking time below, but omit it from the 1723 // overall utilization here since it's "free". 1724 markCpu := gcController.assistTime + gcController.dedicatedMarkTime + gcController.fractionalMarkTime 1725 markTermCpu := int64(work.stwprocs) * (work.tEnd - work.tMarkTerm) 1726 cycleCpu := sweepTermCpu + markCpu + markTermCpu 1727 work.totaltime += cycleCpu 1728 1729 // Compute overall GC CPU utilization. 1730 totalCpu := sched.totaltime + (now-sched.procresizetime)*int64(gomaxprocs) 1731 memstats.gc_cpu_fraction = float64(work.totaltime) / float64(totalCpu) 1732 1733 // Reset sweep state. 1734 sweep.nbgsweep = 0 1735 sweep.npausesweep = 0 1736 1737 if work.userForced { 1738 memstats.numforcedgc++ 1739 } 1740 1741 // Bump GC cycle count and wake goroutines waiting on sweep. 1742 lock(&work.sweepWaiters.lock) 1743 memstats.numgc++ 1744 injectglist(&work.sweepWaiters.list) 1745 unlock(&work.sweepWaiters.lock) 1746 1747 // Finish the current heap profiling cycle and start a new 1748 // heap profiling cycle. We do this before starting the world 1749 // so events don't leak into the wrong cycle. 1750 mProf_NextCycle() 1751 1752 systemstack(func() { startTheWorldWithSema(true) }) 1753 1754 // Flush the heap profile so we can start a new cycle next GC. 1755 // This is relatively expensive, so we don't do it with the 1756 // world stopped. 1757 mProf_Flush() 1758 1759 // Prepare workbufs for freeing by the sweeper. We do this 1760 // asynchronously because it can take non-trivial time. 1761 prepareFreeWorkbufs() 1762 1763 // Free stack spans. This must be done between GC cycles. 1764 systemstack(freeStackSpans) 1765 1766 // Ensure all mcaches are flushed. Each P will flush its own 1767 // mcache before allocating, but idle Ps may not. Since this 1768 // is necessary to sweep all spans, we need to ensure all 1769 // mcaches are flushed before we start the next GC cycle. 1770 systemstack(func() { 1771 forEachP(func(_p_ *p) { 1772 _p_.mcache.prepareForSweep() 1773 }) 1774 }) 1775 1776 // Print gctrace before dropping worldsema. As soon as we drop 1777 // worldsema another cycle could start and smash the stats 1778 // we're trying to print. 1779 if debug.gctrace > 0 { 1780 util := int(memstats.gc_cpu_fraction * 100) 1781 1782 var sbuf [24]byte 1783 printlock() 1784 print("gc ", memstats.numgc, 1785 " @", string(itoaDiv(sbuf[:], uint64(work.tSweepTerm-runtimeInitTime)/1e6, 3)), "s ", 1786 util, "%: ") 1787 prev := work.tSweepTerm 1788 for i, ns := range []int64{work.tMark, work.tMarkTerm, work.tEnd} { 1789 if i != 0 { 1790 print("+") 1791 } 1792 print(string(fmtNSAsMS(sbuf[:], uint64(ns-prev)))) 1793 prev = ns 1794 } 1795 print(" ms clock, ") 1796 for i, ns := range []int64{sweepTermCpu, gcController.assistTime, gcController.dedicatedMarkTime + gcController.fractionalMarkTime, gcController.idleMarkTime, markTermCpu} { 1797 if i == 2 || i == 3 { 1798 // Separate mark time components with /. 1799 print("/") 1800 } else if i != 0 { 1801 print("+") 1802 } 1803 print(string(fmtNSAsMS(sbuf[:], uint64(ns)))) 1804 } 1805 print(" ms cpu, ", 1806 work.heap0>>20, "->", work.heap1>>20, "->", work.heap2>>20, " MB, ", 1807 work.heapGoal>>20, " MB goal, ", 1808 work.maxprocs, " P") 1809 if work.userForced { 1810 print(" (forced)") 1811 } 1812 print("\n") 1813 printunlock() 1814 } 1815 1816 semrelease(&worldsema) 1817 semrelease(&gcsema) 1818 // Careful: another GC cycle may start now. 1819 1820 releasem(mp) 1821 mp = nil 1822 1823 // now that gc is done, kick off finalizer thread if needed 1824 if !concurrentSweep { 1825 // give the queued finalizers, if any, a chance to run 1826 Gosched() 1827 } 1828 } 1829 1830 // gcBgMarkStartWorkers prepares background mark worker goroutines. 1831 // These goroutines will not run until the mark phase, but they must 1832 // be started while the work is not stopped and from a regular G 1833 // stack. The caller must hold worldsema. 1834 func gcBgMarkStartWorkers() { 1835 // Background marking is performed by per-P G's. Ensure that 1836 // each P has a background GC G. 1837 for _, p := range allp { 1838 if p.gcBgMarkWorker == 0 { 1839 go gcBgMarkWorker(p) 1840 notetsleepg(&work.bgMarkReady, -1) 1841 noteclear(&work.bgMarkReady) 1842 } 1843 } 1844 } 1845 1846 // gcBgMarkPrepare sets up state for background marking. 1847 // Mutator assists must not yet be enabled. 1848 func gcBgMarkPrepare() { 1849 // Background marking will stop when the work queues are empty 1850 // and there are no more workers (note that, since this is 1851 // concurrent, this may be a transient state, but mark 1852 // termination will clean it up). Between background workers 1853 // and assists, we don't really know how many workers there 1854 // will be, so we pretend to have an arbitrarily large number 1855 // of workers, almost all of which are "waiting". While a 1856 // worker is working it decrements nwait. If nproc == nwait, 1857 // there are no workers. 1858 work.nproc = ^uint32(0) 1859 work.nwait = ^uint32(0) 1860 } 1861 1862 func gcBgMarkWorker(_p_ *p) { 1863 gp := getg() 1864 1865 type parkInfo struct { 1866 m muintptr // Release this m on park. 1867 attach puintptr // If non-nil, attach to this p on park. 1868 } 1869 // We pass park to a gopark unlock function, so it can't be on 1870 // the stack (see gopark). Prevent deadlock from recursively 1871 // starting GC by disabling preemption. 1872 gp.m.preemptoff = "GC worker init" 1873 park := new(parkInfo) 1874 gp.m.preemptoff = "" 1875 1876 park.m.set(acquirem()) 1877 park.attach.set(_p_) 1878 // Inform gcBgMarkStartWorkers that this worker is ready. 1879 // After this point, the background mark worker is scheduled 1880 // cooperatively by gcController.findRunnable. Hence, it must 1881 // never be preempted, as this would put it into _Grunnable 1882 // and put it on a run queue. Instead, when the preempt flag 1883 // is set, this puts itself into _Gwaiting to be woken up by 1884 // gcController.findRunnable at the appropriate time. 1885 notewakeup(&work.bgMarkReady) 1886 1887 for { 1888 // Go to sleep until woken by gcController.findRunnable. 1889 // We can't releasem yet since even the call to gopark 1890 // may be preempted. 1891 gopark(func(g *g, parkp unsafe.Pointer) bool { 1892 park := (*parkInfo)(parkp) 1893 1894 // The worker G is no longer running, so it's 1895 // now safe to allow preemption. 1896 releasem(park.m.ptr()) 1897 1898 // If the worker isn't attached to its P, 1899 // attach now. During initialization and after 1900 // a phase change, the worker may have been 1901 // running on a different P. As soon as we 1902 // attach, the owner P may schedule the 1903 // worker, so this must be done after the G is 1904 // stopped. 1905 if park.attach != 0 { 1906 p := park.attach.ptr() 1907 park.attach.set(nil) 1908 // cas the worker because we may be 1909 // racing with a new worker starting 1910 // on this P. 1911 if !p.gcBgMarkWorker.cas(0, guintptr(unsafe.Pointer(g))) { 1912 // The P got a new worker. 1913 // Exit this worker. 1914 return false 1915 } 1916 } 1917 return true 1918 }, unsafe.Pointer(park), waitReasonGCWorkerIdle, traceEvGoBlock, 0) 1919 1920 // Loop until the P dies and disassociates this 1921 // worker (the P may later be reused, in which case 1922 // it will get a new worker) or we failed to associate. 1923 if _p_.gcBgMarkWorker.ptr() != gp { 1924 break 1925 } 1926 1927 // Disable preemption so we can use the gcw. If the 1928 // scheduler wants to preempt us, we'll stop draining, 1929 // dispose the gcw, and then preempt. 1930 park.m.set(acquirem()) 1931 1932 if gcBlackenEnabled == 0 { 1933 throw("gcBgMarkWorker: blackening not enabled") 1934 } 1935 1936 startTime := nanotime() 1937 _p_.gcMarkWorkerStartTime = startTime 1938 1939 decnwait := atomic.Xadd(&work.nwait, -1) 1940 if decnwait == work.nproc { 1941 println("runtime: work.nwait=", decnwait, "work.nproc=", work.nproc) 1942 throw("work.nwait was > work.nproc") 1943 } 1944 1945 systemstack(func() { 1946 // Mark our goroutine preemptible so its stack 1947 // can be scanned. This lets two mark workers 1948 // scan each other (otherwise, they would 1949 // deadlock). We must not modify anything on 1950 // the G stack. However, stack shrinking is 1951 // disabled for mark workers, so it is safe to 1952 // read from the G stack. 1953 casgstatus(gp, _Grunning, _Gwaiting) 1954 switch _p_.gcMarkWorkerMode { 1955 default: 1956 throw("gcBgMarkWorker: unexpected gcMarkWorkerMode") 1957 case gcMarkWorkerDedicatedMode: 1958 gcDrain(&_p_.gcw, gcDrainUntilPreempt|gcDrainFlushBgCredit) 1959 if gp.preempt { 1960 // We were preempted. This is 1961 // a useful signal to kick 1962 // everything out of the run 1963 // queue so it can run 1964 // somewhere else. 1965 lock(&sched.lock) 1966 for { 1967 gp, _ := runqget(_p_) 1968 if gp == nil { 1969 break 1970 } 1971 globrunqput(gp) 1972 } 1973 unlock(&sched.lock) 1974 } 1975 // Go back to draining, this time 1976 // without preemption. 1977 gcDrain(&_p_.gcw, gcDrainFlushBgCredit) 1978 case gcMarkWorkerFractionalMode: 1979 gcDrain(&_p_.gcw, gcDrainFractional|gcDrainUntilPreempt|gcDrainFlushBgCredit) 1980 case gcMarkWorkerIdleMode: 1981 gcDrain(&_p_.gcw, gcDrainIdle|gcDrainUntilPreempt|gcDrainFlushBgCredit) 1982 } 1983 casgstatus(gp, _Gwaiting, _Grunning) 1984 }) 1985 1986 // Account for time. 1987 duration := nanotime() - startTime 1988 switch _p_.gcMarkWorkerMode { 1989 case gcMarkWorkerDedicatedMode: 1990 atomic.Xaddint64(&gcController.dedicatedMarkTime, duration) 1991 atomic.Xaddint64(&gcController.dedicatedMarkWorkersNeeded, 1) 1992 case gcMarkWorkerFractionalMode: 1993 atomic.Xaddint64(&gcController.fractionalMarkTime, duration) 1994 atomic.Xaddint64(&_p_.gcFractionalMarkTime, duration) 1995 case gcMarkWorkerIdleMode: 1996 atomic.Xaddint64(&gcController.idleMarkTime, duration) 1997 } 1998 1999 // Was this the last worker and did we run out 2000 // of work? 2001 incnwait := atomic.Xadd(&work.nwait, +1) 2002 if incnwait > work.nproc { 2003 println("runtime: p.gcMarkWorkerMode=", _p_.gcMarkWorkerMode, 2004 "work.nwait=", incnwait, "work.nproc=", work.nproc) 2005 throw("work.nwait > work.nproc") 2006 } 2007 2008 // If this worker reached a background mark completion 2009 // point, signal the main GC goroutine. 2010 if incnwait == work.nproc && !gcMarkWorkAvailable(nil) { 2011 // Make this G preemptible and disassociate it 2012 // as the worker for this P so 2013 // findRunnableGCWorker doesn't try to 2014 // schedule it. 2015 _p_.gcBgMarkWorker.set(nil) 2016 releasem(park.m.ptr()) 2017 2018 gcMarkDone() 2019 2020 // Disable preemption and prepare to reattach 2021 // to the P. 2022 // 2023 // We may be running on a different P at this 2024 // point, so we can't reattach until this G is 2025 // parked. 2026 park.m.set(acquirem()) 2027 park.attach.set(_p_) 2028 } 2029 } 2030 } 2031 2032 // gcMarkWorkAvailable reports whether executing a mark worker 2033 // on p is potentially useful. p may be nil, in which case it only 2034 // checks the global sources of work. 2035 func gcMarkWorkAvailable(p *p) bool { 2036 if p != nil && !p.gcw.empty() { 2037 return true 2038 } 2039 if !work.full.empty() { 2040 return true // global work available 2041 } 2042 if work.markrootNext < work.markrootJobs { 2043 return true // root scan work available 2044 } 2045 return false 2046 } 2047 2048 // gcMark runs the mark (or, for concurrent GC, mark termination) 2049 // All gcWork caches must be empty. 2050 // STW is in effect at this point. 2051 func gcMark(start_time int64) { 2052 if debug.allocfreetrace > 0 { 2053 tracegc() 2054 } 2055 2056 if gcphase != _GCmarktermination { 2057 throw("in gcMark expecting to see gcphase as _GCmarktermination") 2058 } 2059 work.tstart = start_time 2060 2061 // Check that there's no marking work remaining. 2062 if work.full != 0 || work.markrootNext < work.markrootJobs { 2063 print("runtime: full=", hex(work.full), " next=", work.markrootNext, " jobs=", work.markrootJobs, " nDataRoots=", work.nDataRoots, " nBSSRoots=", work.nBSSRoots, " nSpanRoots=", work.nSpanRoots, " nStackRoots=", work.nStackRoots, "\n") 2064 panic("non-empty mark queue after concurrent mark") 2065 } 2066 2067 if debug.gccheckmark > 0 { 2068 // This is expensive when there's a large number of 2069 // Gs, so only do it if checkmark is also enabled. 2070 gcMarkRootCheck() 2071 } 2072 if work.full != 0 { 2073 throw("work.full != 0") 2074 } 2075 2076 // Clear out buffers and double-check that all gcWork caches 2077 // are empty. This should be ensured by gcMarkDone before we 2078 // enter mark termination. 2079 // 2080 // TODO: We could clear out buffers just before mark if this 2081 // has a non-negligible impact on STW time. 2082 for _, p := range allp { 2083 // The write barrier may have buffered pointers since 2084 // the gcMarkDone barrier. However, since the barrier 2085 // ensured all reachable objects were marked, all of 2086 // these must be pointers to black objects. Hence we 2087 // can just discard the write barrier buffer. 2088 if debug.gccheckmark > 0 || throwOnGCWork { 2089 // For debugging, flush the buffer and make 2090 // sure it really was all marked. 2091 wbBufFlush1(p) 2092 } else { 2093 p.wbBuf.reset() 2094 } 2095 2096 gcw := &p.gcw 2097 if !gcw.empty() { 2098 printlock() 2099 print("runtime: P ", p.id, " flushedWork ", gcw.flushedWork) 2100 if gcw.wbuf1 == nil { 2101 print(" wbuf1=<nil>") 2102 } else { 2103 print(" wbuf1.n=", gcw.wbuf1.nobj) 2104 } 2105 if gcw.wbuf2 == nil { 2106 print(" wbuf2=<nil>") 2107 } else { 2108 print(" wbuf2.n=", gcw.wbuf2.nobj) 2109 } 2110 print("\n") 2111 throw("P has cached GC work at end of mark termination") 2112 } 2113 // There may still be cached empty buffers, which we 2114 // need to flush since we're going to free them. Also, 2115 // there may be non-zero stats because we allocated 2116 // black after the gcMarkDone barrier. 2117 gcw.dispose() 2118 } 2119 2120 throwOnGCWork = false 2121 2122 cachestats() 2123 2124 // Update the marked heap stat. 2125 memstats.heap_marked = work.bytesMarked 2126 2127 // Update other GC heap size stats. This must happen after 2128 // cachestats (which flushes local statistics to these) and 2129 // flushallmcaches (which modifies heap_live). 2130 memstats.heap_live = work.bytesMarked 2131 memstats.heap_scan = uint64(gcController.scanWork) 2132 2133 if trace.enabled { 2134 traceHeapAlloc() 2135 } 2136 } 2137 2138 // gcSweep must be called on the system stack because it acquires the heap 2139 // lock. See mheap for details. 2140 // 2141 // The world must be stopped. 2142 // 2143 //go:systemstack 2144 func gcSweep(mode gcMode) { 2145 if gcphase != _GCoff { 2146 throw("gcSweep being done but phase is not GCoff") 2147 } 2148 2149 lock(&mheap_.lock) 2150 mheap_.sweepgen += 2 2151 mheap_.sweepdone = 0 2152 if !go115NewMCentralImpl && mheap_.sweepSpans[mheap_.sweepgen/2%2].index != 0 { 2153 // We should have drained this list during the last 2154 // sweep phase. We certainly need to start this phase 2155 // with an empty swept list. 2156 throw("non-empty swept list") 2157 } 2158 mheap_.pagesSwept = 0 2159 mheap_.sweepArenas = mheap_.allArenas 2160 mheap_.reclaimIndex = 0 2161 mheap_.reclaimCredit = 0 2162 unlock(&mheap_.lock) 2163 2164 if go115NewMCentralImpl { 2165 sweep.centralIndex.clear() 2166 } 2167 2168 if !_ConcurrentSweep || mode == gcForceBlockMode { 2169 // Special case synchronous sweep. 2170 // Record that no proportional sweeping has to happen. 2171 lock(&mheap_.lock) 2172 mheap_.sweepPagesPerByte = 0 2173 unlock(&mheap_.lock) 2174 // Sweep all spans eagerly. 2175 for sweepone() != ^uintptr(0) { 2176 sweep.npausesweep++ 2177 } 2178 // Free workbufs eagerly. 2179 prepareFreeWorkbufs() 2180 for freeSomeWbufs(false) { 2181 } 2182 // All "free" events for this mark/sweep cycle have 2183 // now happened, so we can make this profile cycle 2184 // available immediately. 2185 mProf_NextCycle() 2186 mProf_Flush() 2187 return 2188 } 2189 2190 // Background sweep. 2191 lock(&sweep.lock) 2192 if sweep.parked { 2193 sweep.parked = false 2194 ready(sweep.g, 0, true) 2195 } 2196 unlock(&sweep.lock) 2197 } 2198 2199 // gcResetMarkState resets global state prior to marking (concurrent 2200 // or STW) and resets the stack scan state of all Gs. 2201 // 2202 // This is safe to do without the world stopped because any Gs created 2203 // during or after this will start out in the reset state. 2204 // 2205 // gcResetMarkState must be called on the system stack because it acquires 2206 // the heap lock. See mheap for details. 2207 // 2208 //go:systemstack 2209 func gcResetMarkState() { 2210 // This may be called during a concurrent phase, so make sure 2211 // allgs doesn't change. 2212 lock(&allglock) 2213 for _, gp := range allgs { 2214 gp.gcscandone = false // set to true in gcphasework 2215 gp.gcAssistBytes = 0 2216 } 2217 unlock(&allglock) 2218 2219 // Clear page marks. This is just 1MB per 64GB of heap, so the 2220 // time here is pretty trivial. 2221 lock(&mheap_.lock) 2222 arenas := mheap_.allArenas 2223 unlock(&mheap_.lock) 2224 for _, ai := range arenas { 2225 ha := mheap_.arenas[ai.l1()][ai.l2()] 2226 for i := range ha.pageMarks { 2227 ha.pageMarks[i] = 0 2228 } 2229 } 2230 2231 work.bytesMarked = 0 2232 work.initialHeapLive = atomic.Load64(&memstats.heap_live) 2233 } 2234 2235 // Hooks for other packages 2236 2237 var poolcleanup func() 2238 2239 //go:linkname sync_runtime_registerPoolCleanup sync.runtime_registerPoolCleanup 2240 func sync_runtime_registerPoolCleanup(f func()) { 2241 poolcleanup = f 2242 } 2243 2244 func clearpools() { 2245 // clear sync.Pools 2246 if poolcleanup != nil { 2247 poolcleanup() 2248 } 2249 2250 // Clear central sudog cache. 2251 // Leave per-P caches alone, they have strictly bounded size. 2252 // Disconnect cached list before dropping it on the floor, 2253 // so that a dangling ref to one entry does not pin all of them. 2254 lock(&sched.sudoglock) 2255 var sg, sgnext *sudog 2256 for sg = sched.sudogcache; sg != nil; sg = sgnext { 2257 sgnext = sg.next 2258 sg.next = nil 2259 } 2260 sched.sudogcache = nil 2261 unlock(&sched.sudoglock) 2262 2263 // Clear central defer pools. 2264 // Leave per-P pools alone, they have strictly bounded size. 2265 lock(&sched.deferlock) 2266 for i := range sched.deferpool { 2267 // disconnect cached list before dropping it on the floor, 2268 // so that a dangling ref to one entry does not pin all of them. 2269 var d, dlink *_defer 2270 for d = sched.deferpool[i]; d != nil; d = dlink { 2271 dlink = d.link 2272 d.link = nil 2273 } 2274 sched.deferpool[i] = nil 2275 } 2276 unlock(&sched.deferlock) 2277 } 2278 2279 // Timing 2280 2281 // itoaDiv formats val/(10**dec) into buf. 2282 func itoaDiv(buf []byte, val uint64, dec int) []byte { 2283 i := len(buf) - 1 2284 idec := i - dec 2285 for val >= 10 || i >= idec { 2286 buf[i] = byte(val%10 + '0') 2287 i-- 2288 if i == idec { 2289 buf[i] = '.' 2290 i-- 2291 } 2292 val /= 10 2293 } 2294 buf[i] = byte(val + '0') 2295 return buf[i:] 2296 } 2297 2298 // fmtNSAsMS nicely formats ns nanoseconds as milliseconds. 2299 func fmtNSAsMS(buf []byte, ns uint64) []byte { 2300 if ns >= 10e6 { 2301 // Format as whole milliseconds. 2302 return itoaDiv(buf, ns/1e6, 0) 2303 } 2304 // Format two digits of precision, with at most three decimal places. 2305 x := ns / 1e3 2306 if x == 0 { 2307 buf[0] = '0' 2308 return buf[:1] 2309 } 2310 dec := 3 2311 for x >= 100 { 2312 x /= 10 2313 dec-- 2314 } 2315 return itoaDiv(buf, x, dec) 2316 } 2317