After going over the types of parsers, I went into some of the history of using parsers in our team. There was a batch process that took about 20 minutes to run. It frustrated the people that used the process, because if you gave a bad argument, it would take 15 minutes to tell you that you passed in the wrong arguments! People complained constantly to the developer, but he said there was nothing that could be done. They came to me asking my opinion, and I would say there is no reason the process should take 20 minutes.
Finally, the developer decided to talk to me. Instead of asking for advice on making the process faster, he reiterated the fact that he couldn't make it faster. The problem was one of the xml files that needed to be parsed. It was a large xml file that had a complicated structure. He said it took 5 minutes just for the XPath parser to return the top of the object tree. From there, he ran dozens of XPath queries against the large tree. In his mind, the performance problem didn't exist in the code that he wrote. Therefore, he couldn't do anything about it. It wasn't his code that was slow!
That is when I brought up different parser types. He complained that DOM was too hard for him to understand and that he didn't think there would be that much of a performance improvement. I explained the advantages and disadvantages of all three types of parsers. He had never heard of stream parsers before. The conversation ended with him looking into DOM parsing, but he wouldn't promise anything, since he thought it was a waste of time to make the change.
Out of my personal frustration, since I was a user of the batch process and a developer that felt that I could do better, I wrote a proof of concept that used SAX instead of DOM or XPath. The funny part was that it only took me 4 hours to write a proof of concept that did the entire 20 minute process in about 500ms. That is about 1/2 a second! When word got around to the developer, he rushed his DOM changes in. When migrating from XPath to DOM, the process time went from 20 minutes to 2 minutes. That is still much longer than my proof-of-concept 500ms, but fast enough that people were relatively happy.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.