JS Ext

Monday, April 29, 2013

Automation Overrides

One of my specialties is automation.  In most positions I have held, I wrote automation layers on top of existing manual processes.  One of the biggest lessons I have learned about automation is that it doesn't always work.  Even if it works 99% of the time, that is 1% of the time that it doesn't work.  This is why I feel every step of an automated process should have the ability to be executed manually.

Far too many times, I have worked with automated systems that completely take over the process.  They don't allow someone to manually execute a step.  This means your automated system must be full proof.  Just like every system that has every existed, those systems weren't full proof.  This led me to sit on calls with various people trying to fix the automation.  It has happened far too frequently.

So how do you make an automated system that doesn't completely take over?  The easiest way is to use the command line!  Its funny how many people keep declaring the command line dead.  Take each step in your process.  Make each step its own command.  This command can be a shell script or it could be a Java process.  It should be callable from the command line, though.  Those commands usually have inputs and outputs.  The inputs can usually be translated to command line arguments.  Outputs can either be written to stdout or they can go to a well-defined output file.  I usually take to pass in an argument that specifies the filename for the parseable output.  Once each step has its own command, then you can start grouping steps together.  It is usually pretty easy to combine all the commands of a group into a single script.

If the process doesn't branch very heavily, you can create a master script on top of all the nested groups.  This master script can be called from multiple different sources.  It can be called from a cron-based system, a webapp, from another process or manually via the command line.  What is important here is that if cron and your webapp go down, you can still execute the process!  Also, process failures are traditionally not all-or-nothing.  Usually one step in the process fails.  If you have adequate logging, you can determine which step failed.  You can fix that step, then continue executing the sub-commands in that group.  Once that group is done, execute the next group and so on up the chain.  If you are lucky, you can just re-execute the entire process.

There are a few things you might have noticed here.  First, you have a clearly defined process on how to recover from a failure.  This is important in IT, because processes do fail.  You need to be able to recover.  Second, you can still limp along in the event of a failure in the automation system.  Processing might be significantly delayed and your capacity might be severely diminished, but if a high ranking person needs their record processed, you can do that.  Finally, you might have noticed that I made no mention of a commercial system.  You can build a pretty decent automation system with just plain old scripting and programming languages.  There are limitations, but you get a lot of flexibility.  For commercial systems, what you want is something that can give you metrics and something that can facilitate determining where a process failed.  You do NOT want to use their orchestration features.  These features tend (not always) to prevent you from manually executing parts of your process.

In the end, make sure you can handle a process failure.  They do happen.  Don't start thinking about having an automation override when you are on a production call because the automation isn't working and there is no way to manually execute the step in the process.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.