JS Ext

Thursday, December 12, 2013

Thread Safety: Beware of Java's Hashtable, Vector and Collections.synchronize*()

I am shocked at how bad people are at multithreaded programming.  I took one class in college on the topic, and that was all I needed.  That one class gave me far more knowledge than most programmers I know.  This isn't a statement about how good I am....this is a statement about the poor state of programmers today.

One of the first mistakes I see is just throwing a Collections.synchronize*() call around a data structure or using the synchronized data structures.  While very handy, these wrappers give a false sense of security.  Consider the following code:

public class MySet {
 private final List list = Collections.synchronizeList( new ArrayList( 10 ) );
 private final Map map = new Hashtable();
 public void add( MyObject obj ) {
  if ( !map.containsKey( obj.getId() ) ) {
   list.add( obj );
   map.put( obj.getId(), obj );
  }
 }
}


This class makes use of a synchornized wrapper and a synchronized data structure (Hashtable).  Therefore, access to the internals of 'list' and 'map' are protected and thread-safe.  You know what isn't threadsafe?  MySet!  Imagine two threads running at the same time that are both calling add().  Below is a sequence of what line each thread is on.  You will see Thread B stall 3 times waiting for Thread A to finish with the easy-to-use wrappers.

A: MySet.add()
B: MySet.add()
A: if ( !map.containsKey( obj.getId() ) ) {
B: if ( !map.containsKey( obj.getId() ) ) { // B must wait for A to finish with 'map' before it starts
A: list.add( obj );
B: list.add( obj ); // B must wait for A to finish with 'list' before it starts
A: map.put( obj.getId(), obj );
B: map.put( obj.getId(), obj ); // B must wait for A to finish with 'map' before it starts

At the end of the execution, 'list' will have 2 entries in it while 'map' will only have 1.  Congradulations!  You have know written a race condition.  On top of that, your add() method now has 3 critical sections!

The correct way to write the synchronization should look closer to this: (I say closer because it doesn't HAVE to look like this)

public class MySet {
 private final List list = new new ArrayList( 10 );
 private final Map map = new HashMap( 64 );
 private final Object lock = new Object(); // I will explain this in a future post

 public void add( MyObject obj ) {
  synchronize( lock ) { // lock only once, do your business, then get out
   if ( !map.containsKey( obj.getId() ) ) {
    list.add( obj );
    map.put( obj.getId(), obj );
   }
  }
 }
}

In the updated code, we create a singe lock that protects both data structures.  Then, we use the non-thread safe versions of the data structures.  In this case, Thread B must wait for Thread A to completely finish the add() method before it can do the first check.  This code correct.  Correct in this context means there is no order that Thread's A and B can run in that would violate the internal invarience of the object.  It is impossible for 'list' and 'map' to get out of sync with each other.  It gets even better.  Since there is only one critical section, the code actually runs faster!

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.