Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: replace span freelist with free bitmap #8894

Closed
dvyukov opened this issue Oct 7, 2014 · 3 comments
Closed

runtime: replace span freelist with free bitmap #8894

dvyukov opened this issue Oct 7, 2014 · 3 comments

Comments

@dvyukov
Copy link
Member

dvyukov commented Oct 7, 2014

The slowest parts of mallocgc and sweeping are freelist manipulation.
The idea is to replace the freelist with bitmap of free blocks. For spans with objects
>= 128 bytes, we allocate an additional word in MSpan for this; for spans < 128
bytes, we reserve the necessary amount of words at the end of the span.

Here is a prototype CL:
https://golang.org/cl/113640043/

This has several advantages:
1. We always allocate in memory order.
2. We don't touch MSpan nor memory blocks themselves in mallocgc (freemap is cached in
MCache).
3. We don't touch blocks at all in MSpan_Sweep (only mark bits and free bitmap).
4. We don't work with linked lists in MSpan_Sweep.
5. We don't walk freelist in MSpan_Sweep to mark already free objects (bitmap set is
idempotent).
@randall77
Copy link
Contributor

Comment 1:

Now that the garbage collector is fully precise, we can drop free lists before GC
instead of walking and marking them during sweep.  That will eliminate #5 and get us #1
also.
I'm not sure #2 is actually an advantage.  We want to touch the objects we're
allocating, the caller is going to want them in the cache.  #3 could actually be
helpful, though.
I'm not convinced linked list operations are really that slow.  Popping something off
the head of the list is just two loads and a store.

@dvyukov
Copy link
Member Author

dvyukov commented Oct 8, 2014

Comment 2:

To answer this question we simply need to benchmark. But I don't have time for it right
now.
But I can say that the freelist operation is the slowest part of mallocgc.
Regarding caching, yes, caller wants the block in cache. But there is significant
difference between async store that goes into write buffer causing prefetch into cache
and a load of uncached data with a subsequent control dependency on the value. See eg:
https://golang.org/cl/100230043/diff/80001/src/pkg/runtime/mgc0.c

@randall77
Copy link
Contributor

We now use bitmaps for allocation.

@golang golang locked and limited conversation to collaborators Apr 1, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants