JS Ext

Thursday, November 15, 2012

Mark-and-Sweep Garbage Collection

Mark-and-Sweep is an algorithm for garbage collection.  Lets start with the concept of a root.  A root is a starting point for searches.  Root's are usually static variables and stack variables.  Roots point to objects in the heap.  Those objects can then point to other objects on the heap.  As you create new objects, you keep track of how much memory you are using.  If, during your next allocation, you need more space than is available, you invoke the Mark-and-Sweep algorithm.

First, the world is stopped.  Then, the algorithm starts from the roots and traverses the object tree (imagine a depth-first-search through your objects).  Every time the algorithm visits an object, it "marks" it.  This usually means flagging a bit, but it could mean other things that I will get to later.  After marking the object, it continues searching for more objects to mark.  If an object is already marked, it does not traverse that leaf anymore, this allowing circular object references.

Once the mark phase is done, the sweep phase is initiated.  During the sweep phase, all objects that were previously marked are moved and rearranged so that all the objects are now sitting next to each other on the heap.  This sweep operation will just override any blocks of memory that were not "marked".  After the sweep phase, all objects that are no longer referenced are just gone.

Many critics of mark-and-sweep point out that the "stop the world" of the garbage collection can take a long time.  This can make an application "feel" unresponsive since the app is literally not doing anything during the GC cycle, because it would be bad to change pointers while doing a depth first search.  In older versions of Java, the GC cycle did consume a lot of time, especially for large apps.  Why would use use such an algorithm then?

The main advantage of mark-and-sweep is that it enables you to do other things faster that you can't do with other memory management solutions.  Specifically, allocating objects is really fast.  In most other memory management solutions, you have to search for an open spot in memory.  This is a linear search every time you allocate an object.  With mark-and-sweep, you maintain a pointer to the first free memory block.  To allocate memory, you just use the free memory pointer that you currently have.  You then save a new free memory pointer that is just after the block you just allocated.  That is constant time allocation!  The penalty is periodic GCs that are linear to the size of your heap.  Those GC's can be managed, however.  I'll talk about that in a future blog post.

Advantages

  1. Fast object/memory allocation
  2. Memory limits
    • Mark-and-sweep is implemented using malloc/free under the hood.  With this abstraction layer, you can actually set a limit on how much memory your process can use
  3. Locality
    • There are two important steps that increase locality
      • New objects are created next to each other in the heap
      • Objects that survive a GC are put next to each other on the heap
    • By having objects next to each other in the heap, you minimize cache misses
Disadvantages
  1. Long GC cycles give the appearance of slowness
    • This can be mitigated
  2. Excessive use of memory
    • Since GC cycles don't kick off until you run out of memory, you tend to use more memory
    • Heap sizes increase when you run out of memory.  If that increase was due to a temporary spike, your heap size never decreases
  3. Tends to be abused
    • Developers who have never needed to do their own memory management tend to abuse automated memory management systems, so they tend to use more memory

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.