JS Ext

Tuesday, July 2, 2013

Performance Hash Functions....So You Think

In a previous post, I discussed a team that had some performance issues.  Their JVM was having performance problems again.  This time, they couldn't replicate the problem in their DEV environment.  They decided to come to me for help.  The JMX Attach api was disabled on this JVM, so I had to do a kill -3 to get a stack trace.

Once I inspected the stack trace, I noticed that a bunch of the threads were around the same code.  I pulled up the code.  I started going through the usual performance suspects.  Spoiler alert: the answer ended up being simple.  There was a race condition.  Two threads wrote into a HashMap, corrupting the LinkedList in one of the buckets.  The threads that hit that bucket were actually hung in an infinite loop, taking up all the CPU on the box.  At the time, I didn't know it was a race condition and I was looking for bad performance patterns.  I found a few and decided to write a few posts about it.

The first thing that I found weird was an overly complex hashcode() implementation.  There was a HashMap< Package, PackageInfo > map.  The Package class contained both identifying information (app, release and package name) as well as some meta information (last time the package was used, highest sibling number of the package).  Given the split between id and meta information, this class was probably a poor candidate for being a HashMap key, but I digress.  The implementation for Package.hashcode() was comical to me.  The developer was taking information from the id portion of the object and "joining" it with the meta information portion.  He started by generating a unique number based off of the first member variable.  For the next member variable, he generated another unique number and performed some operation on that number against the first number.  He did the same thing for every member variable, switching up the math operation in between.

At first, I thought this hashcode() implementation might be the problem.  The HashMap in question was a global cache object.  The threads that were hung were in a double nested tight loop.  He was performing a search.  The search was pretty poor, but I can cover that in another post.  I didn't know if the hashcode() or if the poor search was the problem, until I noticed that the calling object was a singleton.

It is easy to criticize and not offer any alternatives.  To me, the Package class should have only contained identifying information.  I say this because the object was being used as a key to a Map.  If you think about it, the implementations of the hashcode() and equals() methods are ambiguous in this scenario.  If you are using the class as a key, shouldn't hashcode() only operate on the identifying member variables.  This doesn't match with equals(), since that should only return true of the objects are truly the "the same".  This means the meta information should be considered in the equals() implementation.  If the package class only contained id member variables, then the implementations become trivial.  Consider making a toString() method that just concatenates the app, release and package with a separator between each value (like a space).  hashcode() simply becomes toString().hashcode() and equals() becomes toString().equals().  Fast and simple.


No comments:

Post a Comment

Note: Only a member of this blog may post a comment.