Jeff's Technology Blog

Friday, September 28, 2012

Statistics in programming

A lot of companies like to gather metrics on what they do. One of the common metrics that computer programs capture is the average time a process takes, but they rarely calculate the standard deviation.

For those of you who don't know what a standard deviation (SD) is, it is a measure of how much your data deviates from the expected value (average value). Standard deviation and expected value are the statistical terms for describing the bell curve that represents your data. A low expected value means your sample data is very close to the center. A high expected value means your data varies heavily. In the context of process runtime, it could mean your process is not very repeatable or reliable. One way to improve a process is to reduce the standard deviation of the process. By knowing the standard deviation, it can be easier to identify the process exeuctions that deviate heavily from the expected value. These outliers tend to be the executions that cause the most problems. The Standard deviation Wikipedia page has some graphs that explain it farther and more background information.

The most common method of calculating the SD is: $$\sqrt{E(x^2)-E(x)^2}$$. E(y) is the average value of y. Since you need to calculate the $$E(x^2)$$, I have heard people say that you need to know every value of x to calculate the SD of x. For the $$E(x)^2$$ term, you can keep a running sumation of x and the count, then divide the summed x by the counter to get the average. I actually had a developer use that as an excuse on why he couldn't calculate the SD in real time. The begs the question, why can't you do the same trick with x^2. In reality, you can keep a running summation of your count, your value and the square of your value. With these three pieces of information, you can calculate the SD and average of your dataset.

// example SD calculation
// we have an Enumeration.  We don't know all
// the values at once
final Enumeration< Double > e = getEnumeration();

// declare our 3 sumation variables
double sum_value = 0.0;
double sum_value_2 = 0.0;
int count = 0;

// iterate, only knowing a single 'value' at one time
while ( e.hasMoreElements() )
{
    final double value = e.nextElement();

    // sumate
    count++;
    sum_value += value;
    sum_value_2 += value * value;
}

// calculate the standard deviation
final double avg = sum_value / count;
final double avg_2 = sum_value_2 / count;
final double variance = avg_2 - avg;
final double sd = Math.pow( variance, 0.5 );

Thursday, September 27, 2012

Google groups performance

I have been researching running Android x86 in Xen. The Android x86 mailing lists are hosted on Google Groups. This has been driving me crazy. The web page performance is terrible. I have tried in Firefox and Chrome. The performance on my tablet is bad as well. On top of the, the arrow keys don't work. I had to use the mouse to scroll down. I found the website to be a very inefficient method for giving out information.

Wednesday, September 26, 2012

Simplexmlrpc

Simplexmlrpc is a c library I wrote for my multimedia frontend. It is a simple Xmlrpc client and server. I wrote it because I was unhappy with the various Xmlrpc c/c++ implementations. Most were pretty complex and didn't make it really easy to wrap within a SpiderMonkey Javascript object. I wanted a library that had simple reference counting garbage collection. I also wanted to override the transport mechanism. I didn't want to be tied down to the http client and server that the library used.

Simplexmlrpc is written as a single c++ file, but it exposes a c interface. This allows the library to be directly embedded into another project, or to be packaged up as an external .so or .dll file. I declare a public abstract type of simplexmlrpc_value that is reference counted. Under the hood, the type points to a class. The simplexmlrpc_create* functions create the appropriate instance of the class. This structure closely matches Spidermonkey's jsval structure. In my multimedia frontend, I was able to write xmlrpc2jsval and jsval2xmlrpc functions that recursively translate between the two libraries. This allowed me to write javascript client code that calls javascript server code (on another computer) without having to manually translate every argument. The serialization of the json arguments are automatic.

Another feature that I needed was the ability to use a custom transport. In my multimedia frontend, I have a master server and multiple slave servers. The communication between the servers is xmlrpc. A lot of the code is shared between the implementations, though. In many ways, the master also operates as a slave. Any communication between a slave and the master also happens between the master and itself. What is the point of making an http call out and back into itself? The simplexmlrpc_generateCall () function allows you to get the xmlrpc client xml and pass it directly to the simplexmlrpc_processMessageFree () function. I created a wrapper function inside of the multimedia software code. That wrapper allows for url aliases. One of the aliases is master://. Any xmlrpc calls going to that url will automatically go to the master server. If the client is currently running on a slave or the user interface, then the url gets rewritten to the url of the master server. If the client is currentl running inside of the master, then the http transport is completely bypassed.

Simplexmlrpc also supports security levels. The idea with security levels is you can have a different set of methods show up based on authorizations and authorization. You can have a set of public methods. Then you can have a set of methods for regular users. Then you can have per methods for admin users.

The library also provides direct integrations with curl and the Mongoose HTTP server. The Mongoose api that is used is for an older version, however. A few features of the xmlrpc spec have not been implemented yet. Simplexmlrpc is missing support for the date/time and base64 types.

Overall, my biased opinion is that Simplexmlrpc is a simple yet powerful implementation of the xmlrpc standard.

Tuesday, September 25, 2012

Waiting for the iPhone 5 noise to settle

There have been lots of reports with problems with the iPhone 5. Apple touted this phone as a new revolution in mobile computing. With so many Apple fans buying the new phone, it is obvious that some of what Apple is saying is true. If the phone is so revolutionary, then why are there so many issues with it? Because it is new and revolutionary. Any big change is going to have issues. The tech media will blast the new technology relentlessly for a period of time. Then, something new will come out. Everyone will forget about the problems that were being reported.

What Apple really needs is time. They will continue to fix the issues as fast as they can. Apple will fix the maps. There will be a software update to fix the static on the keyboard. WiFi will get better. Eventually, the problems will go away and the iPhone 5 will be a great product. The most prototypical example of this media pattern is Windows XP. When Windows XP first came out, it was heavily blasted. We complained about the look and feel. People complained about compatibility. Eventually, things calmed down and Windows XP turned into one of the best versions of the Windows platform.

Ubuntu Unity Interface

Nobody likes change. Especially with user interfaces. Facebook suffers this every few months when they make a change. Ubuntu's Unity interface has been making waves for a while, but I had very little exposure to it, since my Linux distro of choice is still Gentoo. First, I'll start off with a little background on why I had a late start with Unity. The recent hard disk crash of my desktop has caused me to revisit the distro I use on my desktop. I use my laptop far more than my desktop at this point. I have a server in my living room that provides all of my services. The desktop is really for a few special tasks that require a larger screen. Those of you familiar with Gentoo will know that you have to keep your install up to date. If you don't, then the next upgrade you do will be very painful. Since I don't use the desktop as much as I used to, I don't want to worry about the Gentoo upgrade maintenance. Also, I don't need the horsepower or bleeding edge software of Gentoo.

Now on to Unity. I understand the desire to optimize for smaller screens, but this computer has a 19" screen. My first thought (like everyone else) was "Where is the start menu!" The launcher reminded me of a rotated OS X launcher. I never really liked the OS X launcher, but its usable. I don't mind having to type in a program name to start it, since I'm a terminal guy. People who know me know that I tend to have dozens of xterms open. Typing in a program name to search for it feels kind of natural to me. Since there is only a handful of apps that I use from a day to day basis on that computer, I don't mind the limited space for pinning on the launcher. Like I said before, I like xterms! Having the ability to click on the Xterm pin to see all my xterms to figure out the one I want to switch to was a nice feature.

Overall, Unity is a nice look. I still prefer the more traditional start menu and lanucher in IceWM and XFCE. Given the fact that I don't use that computer very much, I think I will keep Unity around.

Monday, September 24, 2012

Firefox 15 is really slow

I use both Firefox and Chrome on my Linux laptop. Some websites are so poorly written, that even on Chrome they run really slow. That is where the NoScript Add-On for Firefox comes in handy. I use Firefox for most webpages, since NoScript blocks tons of crap. I go to Chrome for watching Flash video or when I want a more responsive interface.

I noticed something when I upgraded to Firefox 15, however. The entire browser started running really slow. There is a significant lag between when I tell it to perform an action and when the action actually gets performed. As an example, opening a new tab takes about 3/4 of a second now. Page up/Page down can take up to a second. Switching between tabs seems to take over a second. I did a quick search and there is a support ticket that seems to match my issue. The Mozilla people seem to think it is operating system related. They said check things like Windows Firewall and see what anti-virus programs are running. Really? That sounds similar to "did you turn it off and back on again". I know its not the operating system since Chrome didn't suffer a performance problem when Firefox upgraded. Firefox has been my favorite browser for years. What is going on with Firefox these days?

Friday, September 21, 2012

Maven snapshots vs milestones

Maven versions come in two flavors: snapshots and milestones. Snapshots have a version but end in -SNAPSHOT or -LATEST. Everything else falls into the milestone category. Milestones are unmodifiable. Snapshots can be changed.

When a new snapshot is created, the -SNAPSHOT gets replaced with a unique timestamp with a counter. Parts of maven support depending directly on the "locked snapshot" version, but some maven mojos don't support that. Pom files that depend on snapshots do not get updated to the locked snapshot. This has a major disadvantage that artifacts that contain snapshots are not reproduceable. Pom files that only depend on milestones are reproduceable but converting a pom file from a snapshot to a milestone can be quite painful.

Let's assume you have a large dependency tree of pom files. Let's say 5 pom files deep. If the bottom of the tree needs to be updated, then you have to update 4 pom files to point to the snapshot versions of the dependencies. Once you perform all the builds, you can test your code. Let's assume everything tested correctly. Its time to install the code into production. Hold on a second! You can't install an artifact into production that isn't reproducable! You have to milestone the bottom artifact. Then you have to update the pom file that depends on it to use the milestones version of the dependency. Then you have to milestone that pom. You have to continually do this all the way up the dependency tree. That is kind of a pain when your code has already been tested!

In my opinion, milestones should also be pointers. These pointers should be immutable. Pom files should be updated to point to the version that any symlinks point to during the deploy process to the remote repository. Maven also needs better support for locked snapshots. These changes should make every snapshot build reproduceable.

JS Ext