Jeff's Technology Blog: April 2014

Sunday, April 20, 2014

XP is close to dead (but why is it still alive?)

Now that Windows XP is officially out of support, various websites are reporting the current Windows XP share. I have decided share my stats. Currently, 5.7% of the Windows visitors to my blog run Windows XP. That is not a tiny amount. That is more than my Windows Vista readers (3.16%). This begs the question, why is it so high?

1) Timing

Windows XP was released in 2001. This was around the time that non-technical people were getting computers. Services like @Home were providing broadband internet for the masses. This means the operating system that those masses of people were running was Windows XP. Just about every computer you purchased at the time was running Windows XP. For many of the current internet users, Windows XP was their first operating system.

2) Backwards compatibility

This is something that affects business users more than consumers. In the workplace, your computers often have speciality programs that MUST work. This means upgrading has a cost that is much larger than the licensing cost for the new version of Windows. There is massive testing effort. Every native program must be tested by QA engineers before a roll out. To make things worse, many programs need to be heavily modified to work in newer versions of Windows. This is not always cheap, or even possible.

3) People dislike change

This is probably the biggest reason. Because XP was the first version that most people used, people know how to use it. They know the ins and outs of the OS. They know how to update drivers, open task manager, browse the network and change the desktop wallpaper. Users have spent users getting used to these administrative interfaces. Microsoft has a habit of completely rewriting all of those interfaces for every major Windows upgrade. Users don't want to relearn how to do all of those tasks.

Tuesday, April 15, 2014

The Screenshot Saga: Episode 2 - Background All the Things!

We get the fix pack, and we did a quick regression test. Everything seemed like it still worked. I told my project manager that it would take me some time to integrate the server flush fix. You see, the method we call to flush to the server already worked in one way....in the foreground. Any professional library developer would know that you need to maintain backwards compatibility. Therefore, I assumed that they would provide a new method for the background version. I was wrong. They changed the existing method. Our issue (the freeze after taking a screenshot) was gone. I checked everything in, let Jenkins build the APK and told our tester that it was ready to test.

We are near the end of our release cycle so it took a few days for the tester to actually test the screenshot functionality. I was in for a nasty surprise though. The tester reported that the screenshot functionality wasn't working! I fired up Logcat to figure out what was going wrong. The "fix" pack did a lot more than just fix our one issue. While it also fixed a bunch of other issues that other clients complained about, it also reduced the logging to Logcat. After 2 days of looking into it, I finally disassmbled their Jar file. I used the debugger to step line by line into their code. That is when I discovered the first problem.

The very first thing the library does is send the killswitch request to see if the library should be enabled. That is done in the foreground, delaying the start of the application. One of this company's clients complained, so they put the killswitch request in the background. The problem is the library initialization takes two steps: start() and enable(). start() fires off the killswitch request and enable() uses the killswitch response to determine if the library should start up. Since start() now runs in the backround (and returns immediately), enable() has a high likelihood of failing. If the server is running fast that day, then enable() will work. That is why my quick regression worked when I was first given the fix pack. While looking at the source code for enable(), I noticed that it checks an internal boolean to see if enable() already ran successfully. This means I could call enable() right before I take the screenshot. It is a hack, but at least I'm not calling undocumented API calls, like I did in my previous hack.

The next issue was that not all the screenshots were being sent to the server. This one drove me nuts. It took me a while before I decided to look at the source code for takeScreenShot(). I was appalled. What used to be a foreground action was now in the background! You see, another client complained about something being slow. This time is was the screenshot code. The library developers did what they always do: they shove it in the background. The problem was I call the flush code right after I take a screenshot. Yay race condition! There was no guarantee that the screenshot was done being taken before we invoke the manual flush. Once again, this was something that would work part of the time, which is why it passed our quick regression.

It is obvious that the developers that own this library don't know anything about multithreading. You can't just throw everything in the background as soon as someone complains about code running in the foreground. It requires thought and planning. There are ramifications.

The next thing that irritates me is the fact that some features are barely supported. If a company provides half ass support for a feature, then say it up front. Don't let us get invested in a feature that is just going to cause us more problems. In this case, it was the manual flush. My company wants to guarantee that the server gets the screenshot. They don't want to gamble with the possibility that the user kept our app open for another 60 seconds. If your app/library implements a feature in a shitty way, then you really don't support that feature. Let us know that up front and move on.

Sunday, April 13, 2014

The Screenshot Saga: Episode 1 - Customer Support Attacks

I recently had the (dis)pleasure of adding a 3rd party library to my company's Android app. While this library had a few features, we only needed two features: the ability to take screenshots (of our app) and the ability to turn that functionality in the event of a catastrophic failure. The idea was to take pictures of certain transactions in the event that one of our customers wanted to dispute a transaction (I said $100, not $1000). The developers had exactly no say in the picking of the library. We were not even allowed to implement it ourselves. We had to use this library.

The company that wrote this library owes me a nice bottle of single malt scotch. Customer service was a nightmare. For starters, the killswitch didn't work. It eventually came out that this was my company's fault, but it took some....extreme....measures to figure this out. The library fires off an HTTP GET and looks for a response. The developer who wrote the JSP used the JSP Designer view, not the text editor view. Therefore, the JSP didn't return the needed 0 or 1. It returned a <html><body>1</body></html>. A little bit of logging would have helped on that one.

The next issue was the fact that we couldn't take a screenshot of the entire scrollable area of the page. It didn't matter what View you passed in. The library would get the root View of the page and take a screenshot of that. The company kept giving us some blanket statement about how only our "developers know when the entire view is visible". I was not happy about their support staff's veiled insult towards me. I eventually disassembled their Jar file. I tracked down the one line where they inadvertently took the screenshot of the root view, not the passed in view. I got so frustrated with their support staff, I actually pulled up their source code on a WebEx!! They still didn't believe me. That issue was fixed in a new release of the library.

The next issue was sending the screenshot to the server. Taking the screenshot doesn't automatically send the screenshot to the server. You have to either wait for the flush interval or use a manual flush. My company opted to use the manual flush while most other clients used the interval. It turns out that so few people actually use the manual flush, that nobody noticed that it runs in the "foreground". There was a noticeable freeze in the app. I disassembled their code and turns out that call new FlushTask().execute().get(); For those of you not familiar with Android development, what that means is the flush is run in the background, but we wait for the flush to finish....essentially negating the fact that the flush is running in the background. My first instinct was to run it in my own AsyncTask, but for some reason, the FlushTask actually used a thread local variable, causing it not to work when invoked from a background thread. Why would they do this!!! Luckily FlushTask() was public so I could invoke it directly. Technical support was no help. First, they told me to just use my own AsyncTask. Can you believe we pay them money for this? Finally they tell us that another company complained about the same thing and that their next fix pack was going to fix the issue.

Our saga continues. We got the next fix pick and it was a mess. Tune in next time!

JS Ext