My company recently decided that wanted to support Kindles in the Amazon app store. The business didn't give us a lot of time. They set a pretty aggressive deadline. We proposed to wait for our next release; uploading to iTunes, Google Play and Amazon all at the same time. They rejected the idea, instead telling us to modify our previous release (that was already in the iTunes and Google Play store) and upload it to Amazon a month before our next release was supposed to launch. We did a decent amount of testing on various 2nd and 3rd Gen Kindles and found a few issues. Most of them were less about Kindle and more about the fact that we never really developed for Android tablets. We fixed the low hanging fruit. Then we saw a rather large commit but another team that concerned us.
Although my team owns our company app, there are two teams that developer code for it. This separation is similar to the JeffBank example I have given in previous posts. One group owns one part of the business (lets say checking) while the other group owns another part of the business (lets say mortgages). The developers for the other business unit changed all of their WebViews to Amazon WebViews. This concerned us for multiple reasons. First, we wanted to maintain one codebase. We don't want to have an Amazon branch and a Google Play branch. Also, we historically had lots of issues with WebViews. They require a lot of testing. Changing all of your WebViews this late in the game was dangerous to say the least. The developer assured us that the Amazon WebView is an abstraction layer; it delegates back to the Android WebView if the real Amazon WebView implementation doesn't exist. They didn't have the same fear around WebViews that we did.
We launch on the Amazon store and all is fine.
Fast forward two weeks later. One of the friends (we used to work on the same team) called me up. He was a pilot tester for Kindle. He was a developer but he wasn't involved with the Kindle rollout. When I had him test, he found lots of minor issues and brought it to our attention. He called because the app stopped working on this Kindle. I found it weird that it was working then all of a sudden it stopped. He was very busy so we set up some time for a week and a half later to diagnose the problem. That is when my team started getting phone calls.
We were getting negative feedback in the Amazon store. Finally, Amazon sent us an email claiming our app was crashing. They provided a stacktrace. The stacktrace pointed to the WebView code that the other team wrote. My team inspected the code, comparing it to the documentation. According to the documentation, the code should NEVER work. We should always be crashing. We brought this up to the other team and they confirmed that the code was wrong, that it was fixed already in the next release (due to unrelated refactoring) and meh.
We retested on all of our Kindles and we couldn't reproduce the problem, however. Management started dropping the hammer. Unhappy emails were being sent. I called up my buddy and he offered to sacrifice a lunch so we could work things out. My coworker and I went to my buddy's building and installed the debuggable version of the app that was sent to the Amazon store. His crash was the same as the one reported by Amazon. We "fixed" the WebView code and it worked. We tested a build of our next release and it worked. We verified the problem, but we didn't know why it happened all of a sudden.
Since our next release would fix it anyways and we were two weeks away from launching that release, our business decided that we shouldn't upload a fixed version to the Amazon store. Doing so would have bumped all of our version numbers, which would mess with our version checks. That would have required making changes to our next release which was in super-lockdown.
My buddy let us borrow his Kindle. We looked at our logs and identified the last time he successfully logged in. We Googled his Amazon OS version number and discovered that the version was released a few days after my buddy's last successful logon. It turns out he got a software update and that broke our app. This is why the failures seemed to snowball. As people upgraded to the latest Amazon OS, our app started to crash for them.
Now, the root cause was technically our fault. We initialized the Amazon WebView library. The problem was still frustrating for multiple reasons. First, Amazon never gave us a heads up about an upgrade. They didn't give us an opportunity to test. Google didn't even turn up an official announcement about the upgrade. We found out about it because of some shady website that offered the Amazon OS upgrade as a shady sideloading download. The second thing that was irritating was the fact that we were using the Amazon WebView. I Googled around, and I couldn't find anyone who was using this library. I have no idea why the other team decided to use it. There is no information on how to support it or the quirks related to cookies. Finally, this was the other teams' code. We spent a lot of time diagnosing the problem. It is a shame when other teams don't take ownership.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.