Tuesday, November 17, 2009

Very, Very, Very Fast 2.5-Phase Compacting

I skipped from Boehm/Demers to a (fast) variant of Cheney. Cheney has one serious drawback, it is a copying collector so it wastes memory. I guess I might go for a compacting algorithm for multicore machines. Here a minor variant on compacting.

I assume all nodes only refer to older nodes. For simplicity, I assume here that records only hold references. Nodes also don't point into other nodes.

Assume each node has a meta-field in which is stored the size of the node and an empty field used during compaction.

A running example:


A: B: C: D: E:
[4,#|E,B] ..n.. [3,#|C] [4,#|D,E] ..m.. [3,#|E] ..k.. [2,#|]


For simplicity, the nodes only refer to other nodes. A is a node [4,#|E,B], the first two fields are meta-information, the latter fields the actual content. A has size 4, the second field is not used. The content holds two fields which are references to nodes E and B. ..n.. denotes a space of size n filled with nodes which will be reclaimed.

The collection algorithm:


  1. Nodes are processed from left to right, the first node is marked life. When a marked life node is processed, its children are marked life. In a life node being processed the empty space preceding it is stored, i.e. the sum of the size of all unmarked nodes. For each unmarked node, we add its size to the unused space, and advance to the next node.


    A: B: C: D: E:
    [4,0|E,B] ..n.. [3,n|C] [4,n|D,E] ..m.. [3,n+m|E] ..k.. [2,n+m+k|]

  2. Slide the nodes from left to right, for each field, look up in the referenced field how much that node will be moved and recalc the correct new pointer.

    So, halfway after sliding A and B the heap will look like:


    A: B: C: D: E:
    [4,0|E-n-m-k,B-n] [3,n|C-n] ..n.. [4,n|D,E] ..m.. [3,n+m|E] ..k.. [2,n+m+k|]

  3. Major disadvantage, it slides to the left, where we want to compact to the right. So, that could be solved with an extra phase which moves the memory n+m+k to the right.

    A: B: C: D: E:
    ..n+m+k.. [4,0|E,B+m+k] [3,n|C+m+k] [4,n|D+k,E] [3,n+m|E] [2,n+m+k|]

    Hmm...



Room for improvement? The correct final offsets could be calculated in the second phase, leaving a simple copy in the last phase. Nodes to be reclaimed could be merged in the first phase by setting the length field of the head node to the sum of all to be reclaimed nodes following it.

Benefits, ok-ish performance expected since it only has 'two and a half' phases and reasonable cache coherence if most nodes are relatively close. It also keeps the age order correct, which should give better performance than a Cheney collector.

Additional notes:

A disadvantage seems to be that we visit all nodes in the heap during a linear scan, instead of just life nodes during the mark phase. But given modern CPUs, we can expect that to be a minor overhead since for each non-marked node the pointer is just advanced with its size, and often nodes are expected to be in the chache anyway - which gives very good locality, and nothing is written to memory. The algorithm has two linear scans, where other algorithms might use some data structure to maintain information to not visit all nodes. The second phase could be made faster by storing forwarding pointers to the next life node, or adding them to a stack - then again, not sure that is worth the effort. (A slight adaption, in the first phase, let each dead node 'eat' the next dead node by setting it's size to the sum of both nodes.)

For a functional language we expect a large number of dead objects, hence the last phase, a simple copy of a small number of objects, should be relatively light. Its a time/space trade-off, expected running time is probably slower than Cheney, since the whole heap is visited (twice), but it halves the memory space used.

Then again, its my first try at compacting. Maybe a good book?

Something itches in the back of my mind, an old compiler building joke: "Nothing beats free lists."

11/19/2009: (Checked it against 'A Compacting Garbage Collector for Unidirectional Heaps (1998)' by Kent Boortz, Dan Sahlin which I just scanned for basic complexity. It has roughly the same overall complexity, but this algorithm is way simpler, so I guess I am save.)

No comments:

Post a Comment