Telltale Signs of a Robust Process
- Published: October 27, 2015
- Written by Peter Schooff
In part two of this excerpt from the podcast, Keith Swenson of Fujitsu North American and Peter Schooff, Managing Editor of BPM.com discuss how to know if a process is robust.
PETER: What are some of the telltale signs that you know that a system is robust?
KEITH: Well the system has to be designed from the beginning to handle problems. Again, a programmer is used to working in an idealized environment. They will write a program and they expect to eliminate all the bugs, and then they will not have to do any debugging while it's running. The program is out there and it just runs. With a distributive system, a service oriented architecture system or a BPM system, since you cannot count on distributive systems being in consistent states at all times, there is always going to be the possibility that an error will occur. So, even at run time, you have to have the ability to expose these errors that happen and you have to have some ability to inspect them and to try to figure out what went wrong and try to recover from it. This becomes a run time feature, not a design time feature or a debug time feature, but while it's running, you want to surface these to the users and let them know. Another controversial thing is that often programmers want to hide all the errors so that users never see errors. If you hide the errors, then people keep on doing the same things over and over causing more and more errors. By exposing the problems as they come up, you allow the person to resolve the problem. They can't resolve it if they don't know about it, so it's controversial.
PETER: It's also important to your idea of failing fast, which means lets move past the failure.
KEITH: That's right, let the person know what is going on. Now maybe what happened is that it was unable to contact the system, that system's offline and that person may know that the system came back online so they just press the button and say do it again; go again. But, it may be something more difficult than that, in which case they may need to get the system administrator or a programmer or something like that to resolve the problem.
PETER: Gotcha. Now, I see this is very important, robust BPM and robust processes. So if you want a listener to have one or two takeaways from this podcast, what would they be?
KEITH: OK, so the first is that the point of a BPM is to allow a business user to design a process as if it were running in an idealized system. So you want to be able to say first we check with accounting to make sure you have enough money in the account and second, you check with this system to make sure that there's, so you draw this business process that has all of the operations to be done, in the right order, as simple as possible, as cleanly as possible, so you can see the logic behind the business process. The mistake is thinking that that is going to exactly define the system that you then run. You should not get confused between your business process and your system architecture. What you need to do is take that business process and give it to someone who then understands where the pools of reliability are, which steps...you may have three or four steps that are all done in one center so they can be reliably done, but then you may have some steps that are done on different systems and you need to include the provision that there may be errors that occur on that. So what I have found is that instead of, remember that HR system I spoke about earlier we had a master process running six different processes across different systems. Instead of making that opaque, instead of making a start button and then everything else is like magic, it's better a dashboard type approach where you lift out those six remote processes and you have some sort of lights on them, red/green lights, that say whether that process has completed or not. By surfacing what's happening underneath, you're giving people the information that they need to deal with problems when they occur. If a light there is red, a process failed for some reason, they can then go look at it and see the error there and possible can restart the process. So, less opaqueness, less magic, more dashboard, here's what's happened and make it a little bit more manual, but in the end, this ends up being more robust. Now to implement that, your system need to have this sort of introspection capability, so you need to be able to call over to a remote system and say, "handle this task." But you also need to be able to call over to that system and say, "are you done yet, have you completed, are you still running, have you hit an error, if you have hit an error, what is that error?" So I think that for these robust distributive systems, this ability to ask those kinds of questions of the remote systems and to bring it back and to display it in this dashboard for the user, these are the elements that are necessary to make robust distributive processes.