Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

container/heap: optimize heap to reduce number of comparisons during heapify and heap pop #21407

Open
rajathagasthya opened this issue Aug 11, 2017 · 7 comments
Labels
NeedsFix The path to resolution is known, but the work has not been done. Performance
Milestone

Comments

@rajathagasthya
Copy link
Contributor

rajathagasthya commented Aug 11, 2017

This is an optimization specified in Knuth Volume 3 (Exercise 5.2, problem 18 I believe) where down() is called on an element in a certain position (during heap pop or init). It states that it's beneficial to move that element all the way to the leaf and then move up to its appropriate place instead of breaking out as soon as we find a position where the item is <= both of its children.

Python does this and the comment about this method in its heapq module describes it better than I can articulate (also includes comparison numbers).

# The child indices of heap index pos are already heaps, and we want to make
# a heap at index pos too.  We do this by bubbling the smaller child of
# pos up (and so on with that child's children, etc) until hitting a leaf,
# then using _siftdown to move the oddball originally at index pos into place.
#
# We *could* break out of the loop as soon as we find a pos where newitem <=
# both its children, but turns out that's not a good idea, and despite that
# many books write the algorithm that way.  During a heap pop, the last array
# element is sifted in, and that tends to be large, so that comparing it
# against values starting from the root usually doesn't pay (= usually doesn't
# get us out of the loop early).  See Knuth, Volume 3, where this is
# explained and quantified in an exercise.
#
# Cutting the # of comparisons is important, since these routines have no
# way to extract "the priority" from an array element, so that intelligence
# is likely to be hiding in custom comparison methods, or in array elements
# storing (priority, record) tuples.  Comparisons are thus potentially
# expensive.
#
# On random arrays of length 1000, making this change cut the number of
# comparisons made by heapify() a little, and those made by exhaustive
# heappop() a lot, in accord with theory.  Here are typical results from 3
# runs (3 just to demonstrate how small the variance is):
#
# Compares needed by heapify     Compares needed by 1000 heappops
# --------------------------     --------------------------------
# 1837 cut to 1663               14996 cut to 8680
# 1855 cut to 1659               14966 cut to 8678
# 1847 cut to 1660               15024 cut to 8703
#
# Building the heap by using heappush() 1000 times instead required
# 2198, 2148, and 2219 compares:  heapify() is more efficient, when
# you can use it.
#
# The total compares needed by list.sort() on the same lists were 8627,
# 8627, and 8632 (this should be compared to the sum of heapify() and
# heappop() compares):  list.sort() is (unsurprisingly!) more efficient
# for sorting.

def _siftup(heap, pos):
    endpos = len(heap)
    startpos = pos
    newitem = heap[pos]
    # Bubble up the smaller child until hitting a leaf.
    childpos = 2*pos + 1    # leftmost child position
    while childpos < endpos:
        # Set childpos to index of smaller child.
        rightpos = childpos + 1
        if rightpos < endpos and not heap[childpos] < heap[rightpos]:
            childpos = rightpos
        # Move the smaller child up.
        heap[pos] = heap[childpos]
        pos = childpos
        childpos = 2*pos + 1
    # The leaf at pos is empty now.  Put newitem there, and bubble it up
    # to its final resting place (by sifting its parents down).
    heap[pos] = newitem
    _siftdown(heap, startpos, pos)

Is there any interest in doing this optimization? I'm fairly new to Go, but I'd love to contribute with this if it's something you'd like to see. Thanks!

@cespare cespare changed the title heap: Optimize heap to reduce number of comparisons during heapify and heap pop container/heap: optimize heap to reduce number of comparisons during heapify and heap pop Aug 11, 2017
@josharian
Copy link
Contributor

cc @griesemer

@ianlancetaylor ianlancetaylor added this to the Unplanned milestone Aug 11, 2017
@randall77
Copy link
Contributor

Sounds reasonable. You'll need some benchmarks to justify your change.
Such a change may make the Pop() order change for keys which are unordered. That's getting close to "breaking the go1 compatibility guarantee" territory, although I think it is probably ok.

@griesemer
Copy link
Contributor

What @randall77 said. Feel free to give it a shot. Please be aware that this is not a high-priority issue so it may take a while to get this reviewed thoroughly.

@rajathagasthya
Copy link
Contributor Author

Great, thanks! I'll put a review up in Gerrit soon and try out some benchmarking.

@gopherbot
Copy link

Change https://golang.org/cl/56070 mentions this issue: WIP: container/heap: Optimize heap to reduce comparisons

@ALTree ALTree added Performance NeedsFix The path to resolution is known, but the work has not been done. labels Sep 22, 2018
@Windsooon
Copy link

@rajathagasthya are you still working on it?

@rajathagasthya
Copy link
Contributor Author

@Windsooon Yes, PR needs review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsFix The path to resolution is known, but the work has not been done. Performance
Projects
None yet
Development

No branches or pull requests

8 participants