Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: O(n^2) performance using map 'pop' pattern #8412

Closed
randall77 opened this issue Jul 23, 2014 · 14 comments
Closed

runtime: O(n^2) performance using map 'pop' pattern #8412

randall77 opened this issue Jul 23, 2014 · 14 comments
Milestone

Comments

@randall77
Copy link
Contributor

The idiom of using an iterator to get a single element, then deleting the single
element, then repeating leads to O(n^2) performance.  We should fix this, perhaps by
remembering the first nonempty bucket.

func benchmarkMap(b *testing.B, doPop bool, n int) {
    var m map[int]bool
    pop := func() (int, bool) {
        for k := range m {
            delete(m, k)
            return k, true
        }
        return 0, false
    }

    for i := 0; i < b.N; i++ {
        m = make(map[int]bool)
        for k := 0; k < n; k++ {
            m[5*k+3] = true
        }
        if !doPop {
            continue
        }

        for {
            _, ok := pop()
            if !ok {
                break
            }
        }
    }
}

func BenchmarkPop10(b *testing.B)     { benchmarkMap(b, true, 10) }
func BenchmarkPop100(b *testing.B)    { benchmarkMap(b, true, 100) }
func BenchmarkPop1000(b *testing.B)   { benchmarkMap(b, true, 1000) }
func BenchmarkPop10000(b *testing.B)  { benchmarkMap(b, true, 10000) }
func BenchmarkPop100000(b *testing.B) { benchmarkMap(b, true, 100000) }
@randall77
Copy link
Contributor Author

Comment 1:

Labels changed: added release-go1.4.

@ianlancetaylor
Copy link
Member

Comment 3:

Labels changed: added repo-main.

@randall77
Copy link
Contributor Author

Comment 4:

So as I see it there are two options.
One is to go back to starting at a random bucket (or some hybrid, like starting at a
random offset if there is 1 bucket and a random bucket otherwise).  This gives an O(n lg
n) total running time for the pop idiom.  It's not great, but better than O(n^2).
The other is to remember the bucket of the last item returned during iteration.  We can
start the next iteration at that bucket.  This would give us O(n) for the pop idiom. 
The challenge is that iteration is a read operation, so multiple simultaneous iterations
can be happening at once.  As a result, maintaining the last bucket during iteration
would require atomic memory accesses (or maybe just a naked write would be sufficient?).
 I know how we can make space in the map header for this extra item.

@dvyukov
Copy link
Member

dvyukov commented Jul 25, 2014

Comment 5:

> As a result, maintaining the last bucket during iteration would require atomic memory
accesses (or maybe just a naked write would be sufficient?). 
Even if we use naked write, it still can be extremely expensive.

@ianlancetaylor
Copy link
Member

Comment 6:

But I think only in the relatively uncommon case of multiple
goroutines iterating over a map simultaneously.
Another approach for this use case might be to remember the bucket not
on map iteration, but on delete.

@gopherbot
Copy link
Contributor

Comment 7:

CL https://golang.org/cl/112700043 mentions this issue.

@josharian
Copy link
Contributor

Comment 8:

> Another approach for this use case might be to remember the bucket not on map
iteration, but on delete.
I'm not sure how to tie the delete to the iterator state, but here's a variant on that
idea.
We could store the lowest non-empty bucket on inserts, deletes, and creation, and read
it once at the beginning of iteration. This will make some deletes a bit more expensive,
but those should be rare and non-compounding. The lowest non-empty bucket could perhaps
also be used later as a cue to trigger map downsizing.
This approach could also help keep the iteration code simple--just run from first to
bucketCnt instead of 0 to bucketCnt--depending on what guarantees we make about
(non-concurrent) mutation during iteration.
On that note, here's a fun pop quiz. What do these output (modulo print order)? To what
extent is this behavior covered under the Go 1 compatibility guarantee?
http://play.golang.org/p/Y3nmsF3B2m
http://play.golang.org/p/MZcJKa9_hR
http://play.golang.org/p/5THESM0Pna

Status changed to Accepted.

@randall77
Copy link
Contributor Author

Comment 9:

I think Ian's idea of just recording the last bucket deleted from works, at least for
the pop idiom.  Just start each iterator at the last delete bucket.  I can't shake the
feeling that it is too specific to the pop idiom, though.  It seems counterintuitive
that you should start iterating at a bucket you might have just emptied.
There is a problem with storing the lowest non-empty bucket.  You can have a repeated
pattern of insert/delete that costs O(n) per insert/delete pair.  (Bucket 0 and bucket
n-1 have an element.  Delete the one in bucket 0, then reinsert it.)
You could store a conservative approximation to the lowest non-empty bucket, though
(invariant = all buckets below the remembered bucket are empty).  At first I thought
that would just get you back to where you have to update such a field on iteration.  But
maybe just bumping it up one bucket every delete would be enough?  On insert take min of
insert bucket and remembered bucket.  On delete check remembered bucket.  If it is
empty, increase remembered bucket # by 1.
I suspect there would still be patterns of insert, delete, and iteration that would show
bad behavior.  But maybe they wouldn't be common in practice.
Or maybe we're overthinking this.  Just starting at a random bucket gets us O(n lg n)
which might be good enough.  It's certainly the easiest to implement.

@randall77
Copy link
Contributor Author

Comment 10:

Go makes no guarantee about iteration order, or whether an item inserted during
iteration will show up later in that iteration or not.  I think that wiggle room is
enough to explain the behavior of your examples, and show that it is within the Go 1
compatibility guarantee.
They are weird, though.

@josharian
Copy link
Contributor

Comment 11:

> Or maybe we're overthinking this.  Just starting at a random bucket gets us O(n lg n)
which might be good enough.  It's certainly the easiest to implement.
+1

@dvyukov
Copy link
Member

dvyukov commented Jul 26, 2014

Comment 12:

Just please don't update map on reads. I don't think that concurrent iteration is
absolutely uncommon.

@randall77
Copy link
Contributor Author

Comment 13:

This issue was updated by revision 668a55a.

LGTM=iant
R=iant, khr
CC=golang-codereviews
https://golang.org/cl/112700043

@gopherbot
Copy link
Contributor

Comment 14:

CL https://golang.org/cl/141270043 mentions this issue.

@randall77
Copy link
Contributor Author

Comment 15:

This issue was closed by revision 55c458e.

Status changed to Fixed.

@rsc rsc added this to the Go1.4 milestone Apr 14, 2015
@rsc rsc removed the release-go1.4 label Apr 14, 2015
@golang golang locked and limited conversation to collaborators Jun 25, 2016
wheatman pushed a commit to wheatman/go-akaros that referenced this issue Jun 25, 2018

Verified

This commit was signed with the committer’s verified signature.
W-A-James Warren James
This change brings the iter/delete pattern down to O(n lgn) from O(n^2).

Fixes golang#8412.

before:
BenchmarkMapPop100	   50000	     32498 ns/op
BenchmarkMapPop1000	     500	   3244851 ns/op
BenchmarkMapPop10000	       5	 270276855 ns/op

after:
BenchmarkMapPop100	  100000	     16169 ns/op
BenchmarkMapPop1000	    5000	    300416 ns/op
BenchmarkMapPop10000	     300	   5990814 ns/op

LGTM=iant
R=golang-codereviews, iant, khr
CC=golang-codereviews
https://golang.org/cl/141270043
wheatman pushed a commit to wheatman/go-akaros that referenced this issue Jul 9, 2018

Verified

This commit was signed with the committer’s verified signature.
W-A-James Warren James
This change brings the iter/delete pattern down to O(n lgn) from O(n^2).

Fixes golang#8412.

before:
BenchmarkMapPop100	   50000	     32498 ns/op
BenchmarkMapPop1000	     500	   3244851 ns/op
BenchmarkMapPop10000	       5	 270276855 ns/op

after:
BenchmarkMapPop100	  100000	     16169 ns/op
BenchmarkMapPop1000	    5000	    300416 ns/op
BenchmarkMapPop10000	     300	   5990814 ns/op

LGTM=iant
R=golang-codereviews, iant, khr
CC=golang-codereviews
https://golang.org/cl/141270043
wheatman pushed a commit to wheatman/go-akaros that referenced this issue Jul 30, 2018

Verified

This commit was signed with the committer’s verified signature.
W-A-James Warren James
This change brings the iter/delete pattern down to O(n lgn) from O(n^2).

Fixes golang#8412.

before:
BenchmarkMapPop100	   50000	     32498 ns/op
BenchmarkMapPop1000	     500	   3244851 ns/op
BenchmarkMapPop10000	       5	 270276855 ns/op

after:
BenchmarkMapPop100	  100000	     16169 ns/op
BenchmarkMapPop1000	    5000	    300416 ns/op
BenchmarkMapPop10000	     300	   5990814 ns/op

LGTM=iant
R=golang-codereviews, iant, khr
CC=golang-codereviews
https://golang.org/cl/141270043
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants