Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync: bad placement of multiple contested locks can cause drastic slowdown #17953

Closed
ianlancetaylor opened this issue Nov 17, 2016 · 5 comments
Closed
Milestone

Comments

@ianlancetaylor
Copy link
Contributor

This program runs in a couple of seconds in the normal case. When invoked with the argument -offset=251, it takes over four minutes. The problem is the hash table in runtime/sema.go. When two contested mutexes happen to wind up using the same hash table entry, the linear search in semrelease is a significant performance drag. While this is not a likely occurrence, 1 in 251 is not so terribly unlikely either, and the performance effect is extreme. This actually occurred in an internal Google program.

CC @dvyukov

package main

import (
    "flag"
    "sync"
)

var offset = flag.Int("offset", 1, "Offset for the second lock")

func main() {
    flag.Parse()
    locks := make([]sync.RWMutex, *offset+1)
    ch := make(chan struct{})

    var wg sync.WaitGroup

    for i := 0; i < 10000; i++ {
	    wg.Add(1)
	    go func() {
		    <-ch
		    for i := 0; i < 10; i++ {
			    locks[0].Lock()
			    locks[*offset].Lock()
			    locks[*offset].Unlock()
			    locks[0].Unlock()
		    }
		    wg.Done()
	    }()
    }

    for i := 0; i < 100; i++ {
	    go func() {
		    for {
			    locks[*offset].Lock()
			    locks[*offset].Unlock()
		    }
	    }()
    }

    close(ch)
    wg.Wait()
}
@ianlancetaylor ianlancetaylor added this to the Go1.9 milestone Nov 17, 2016
@dvyukov
Copy link
Member

dvyukov commented Nov 17, 2016

This is not specific to RWMutex, right? Mutex should be affected the same way. If RWMutex is used only for write locks it behaves exactly as Mutex.

It's solvable if we increase size of the Mutex by at least 1 word to store *semaRoot directly. Maybe we could even pack all state into 2 words without indirection.
Solving it without Mutex size increase will require playing dirty games like packing pointers into integers (which should work provided that pointee is non-movable and is preserved alive by other pointers, which is true for e.g. G).

@ianlancetaylor
Copy link
Contributor Author

As far as I can see it would be OK to grow Mutex by a word. It seems unlikely that any program's memory usage is dominated by the number of Mutex values it creates.

@rsc
Copy link
Contributor

rsc commented Feb 10, 2017

I'd really prefer not to store internal runtime state (pointers) in a user-corruptible data structure. We've seen malloc implementations have to move their metadata into separate pages to make sure that user corruption doesn't cause what look like internal errors in malloc, and I think that lesson applies here.

I have a different, simpler idea. Instead of making the 251-entry hash table have a list of all goroutines blocked on addresses hashing to that bucket, make it have a list of only goroutines blocked on unique addresses hashing to that bucket. Instead of inserting an address into the list a second time, a secondary list can hang off of each of these unique representatives. Because the wakeups are FIFO, the length of those secondary lists don't matter - the ops on them are O(1) - and moving the large numbers of goroutines blocked on the same lock off the main list eliminates that as a cause for slowdown if there's another lock hitting the same hash bucket.

I've implemented this, and it eliminates the problem in @ianlancetaylor's test program. Ian, if you can send me more details about the Google problem off-list then I'll see if it would fix that too.

@gopherbot
Copy link

CL https://golang.org/cl/36792 mentions this issue.

@gopherbot
Copy link

CL https://golang.org/cl/37103 mentions this issue.

gopherbot pushed a commit that referenced this issue Feb 16, 2017
CL 36792 fixed #17953, a linear scan caused by n goroutines piling into
two different locks that hashed to the same bucket in the semaphore table.
In that CL, n goroutines contending for 2 unfortunately chosen locks
went from O(n²) to O(n).

This CL fixes a different linear scan, when n goroutines are contending for
n/2 different locks that all hash to the same bucket in the semaphore table.
In this CL, n goroutines contending for n/2 unfortunately chosen locks
goes from O(n²) to O(n log n). This case is much less likely, but any linear
scan eventually hurts, so we might as well fix it while the problem is fresh
in our minds.

The new test in this CL checks for both linear scans.

The effect of this CL on the sync benchmarks is negligible
(but it fixes the new test).

name                      old time/op    new time/op    delta
Cond1-48                     576ns ±10%     575ns ±13%     ~     (p=0.679 n=71+71)
Cond2-48                    1.59µs ± 8%    1.61µs ± 9%     ~     (p=0.107 n=73+69)
Cond4-48                    4.56µs ± 7%    4.55µs ± 7%     ~     (p=0.670 n=74+72)
Cond8-48                    9.87µs ± 9%    9.90µs ± 7%     ~     (p=0.507 n=69+73)
Cond16-48                   20.4µs ± 7%    20.4µs ±10%     ~     (p=0.588 n=69+71)
Cond32-48                   45.4µs ±10%    45.4µs ±14%     ~     (p=0.944 n=73+73)
UncontendedSemaphore-48     19.7ns ±12%    19.7ns ± 8%     ~     (p=0.589 n=65+63)
ContendedSemaphore-48       55.4ns ±26%    54.9ns ±32%     ~     (p=0.441 n=75+75)
MutexUncontended-48         0.63ns ± 0%    0.63ns ± 0%     ~     (all equal)
Mutex-48                     210ns ± 6%     213ns ±10%   +1.30%  (p=0.035 n=70+74)
MutexSlack-48                210ns ± 7%     211ns ± 9%     ~     (p=0.184 n=71+72)
MutexWork-48                 299ns ± 5%     300ns ± 5%     ~     (p=0.678 n=73+75)
MutexWorkSlack-48            302ns ± 6%     300ns ± 5%     ~     (p=0.149 n=74+72)
MutexNoSpin-48               135ns ± 6%     135ns ±10%     ~     (p=0.788 n=67+75)
MutexSpin-48                 693ns ± 5%     689ns ± 6%     ~     (p=0.092 n=65+74)
Once-48                     0.22ns ±25%    0.22ns ±24%     ~     (p=0.882 n=74+73)
Pool-48                     5.88ns ±36%    5.79ns ±24%     ~     (p=0.655 n=69+69)
PoolOverflow-48             4.79µs ±18%    4.87µs ±20%     ~     (p=0.233 n=75+75)
SemaUncontended-48          0.80ns ± 1%    0.82ns ± 8%   +2.46%  (p=0.000 n=60+74)
SemaSyntNonblock-48          103ns ± 4%     102ns ± 5%   -1.11%  (p=0.003 n=75+75)
SemaSyntBlock-48             104ns ± 4%     104ns ± 5%     ~     (p=0.231 n=71+75)
SemaWorkNonblock-48          128ns ± 4%     129ns ± 6%   +1.51%  (p=0.000 n=63+75)
SemaWorkBlock-48             129ns ± 8%     130ns ± 7%     ~     (p=0.072 n=75+74)
RWMutexUncontended-48       2.35ns ± 1%    2.35ns ± 0%     ~     (p=0.144 n=70+55)
RWMutexWrite100-48           139ns ±18%     141ns ±21%     ~     (p=0.071 n=75+73)
RWMutexWrite10-48            145ns ± 9%     145ns ± 8%     ~     (p=0.553 n=75+75)
RWMutexWorkWrite100-48       297ns ±13%     297ns ±15%     ~     (p=0.519 n=75+74)
RWMutexWorkWrite10-48        588ns ± 7%     585ns ± 5%     ~     (p=0.173 n=73+70)
WaitGroupUncontended-48     0.87ns ± 0%    0.87ns ± 0%     ~     (all equal)
WaitGroupAddDone-48         63.2ns ± 4%    62.7ns ± 4%   -0.82%  (p=0.027 n=72+75)
WaitGroupAddDoneWork-48      109ns ± 5%     109ns ± 4%     ~     (p=0.233 n=75+75)
WaitGroupWait-48            0.17ns ± 0%    0.16ns ±16%   -8.55%  (p=0.000 n=56+75)
WaitGroupWaitWork-48        1.78ns ± 1%    2.08ns ± 5%  +16.92%  (p=0.000 n=74+70)
WaitGroupActuallyWait-48    52.0ns ± 3%    50.6ns ± 5%   -2.70%  (p=0.000 n=71+69)

https://perf.golang.org/search?q=upload:20170215.1

Change-Id: Ia29a8bd006c089e401ec4297c3038cca656bcd0a
Reviewed-on: https://go-review.googlesource.com/37103
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@golang golang locked and limited conversation to collaborators Feb 15, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants