Jeff's Technology Blog: Mark-and-Sweep Garbage Collection

Mark-and-Sweep is an algorithm for garbage collection. Lets start with the concept of a root. A root is a starting point for searches. Root's are usually static variables and stack variables. Roots point to objects in the heap. Those objects can then point to other objects on the heap. As you create new objects, you keep track of how much memory you are using. If, during your next allocation, you need more space than is available, you invoke the Mark-and-Sweep algorithm.

First, the world is stopped. Then, the algorithm starts from the roots and traverses the object tree (imagine a depth-first-search through your objects). Every time the algorithm visits an object, it "marks" it. This usually means flagging a bit, but it could mean other things that I will get to later. After marking the object, it continues searching for more objects to mark. If an object is already marked, it does not traverse that leaf anymore, this allowing circular object references.

Once the mark phase is done, the sweep phase is initiated. During the sweep phase, all objects that were previously marked are moved and rearranged so that all the objects are now sitting next to each other on the heap. This sweep operation will just override any blocks of memory that were not "marked". After the sweep phase, all objects that are no longer referenced are just gone.

Many critics of mark-and-sweep point out that the "stop the world" of the garbage collection can take a long time. This can make an application "feel" unresponsive since the app is literally not doing anything during the GC cycle, because it would be bad to change pointers while doing a depth first search. In older versions of Java, the GC cycle did consume a lot of time, especially for large apps. Why would use use such an algorithm then?

The main advantage of mark-and-sweep is that it enables you to do other things faster that you can't do with other memory management solutions. Specifically, allocating objects is really fast. In most other memory management solutions, you have to search for an open spot in memory. This is a linear search every time you allocate an object. With mark-and-sweep, you maintain a pointer to the first free memory block. To allocate memory, you just use the free memory pointer that you currently have. You then save a new free memory pointer that is just after the block you just allocated. That is constant time allocation! The penalty is periodic GCs that are linear to the size of your heap. Those GC's can be managed, however. I'll talk about that in a future blog post.

Advantages

Fast object/memory allocation
Memory limits

Mark-and-sweep is implemented using malloc/free under the hood. With this abstraction layer, you can actually set a limit on how much memory your process can use

Locality

There are two important steps that increase locality

New objects are created next to each other in the heap
Objects that survive a GC are put next to each other on the heap

By having objects next to each other in the heap, you minimize cache misses

Disadvantages

Long GC cycles give the appearance of slowness

This can be mitigated

Excessive use of memory

Since GC cycles don't kick off until you run out of memory, you tend to use more memory
Heap sizes increase when you run out of memory. If that increase was due to a temporary spike, your heap size never decreases

Tends to be abused

Developers who have never needed to do their own memory management tend to abuse automated memory management systems, so they tend to use more memory

Jeff's Technology Blog

JS Ext

Thursday, November 15, 2012

Mark-and-Sweep Garbage Collection

No comments:

Post a Comment