Tuesday, October 20, 2009

To err or not to err

Same old question. Should we throw exception or return error code? In a long running process, what principles should we follow in the design to handle the rare, the expected, and the unexpected?

I've been struggling with this problem for about 5 years now. (shorter than the time I've been writing code)

There are many ways to analyze this, but certainly one way is to look at our goals and costs of achieving these goals.

Okay, so let's drop the act and face the facts: There will be errors and there will be cases that you haven't thought of and murphy's laws are enforced in most industry settings.


The goal is very very important. The goal here is to:

1.) run to achieve the goal of the software
2.) communicate from one piece of code to another piece of code an expected error. (program continues correctly)
3.) communicate from one piece of code to another piece of code an unexpected error. (fail, or continue gracefully)
4.) communicate to operator of program that the program is in need of immediate attention.
5.) communicate to operator of program the total amount of problem (fatal or not) the program encountered in it's entire lifetime.



The cost of these error handling modes are consumed are:
1.) immediate cycles in cpu to generate error message, log to disk, etc.
2.) network/storage costs to store this error.
3.) human attention: oncall receives a message marked "PRODUCTION ERROR!"
4.) machine/human resources required to analyze the log of error and notable conditions.


Considering these aspects may help me to design the error handling system.

No comments: