Scientific Debugging or how to debug difficult bugs

While reading Zen and the art of motorcycle maintenance, I came upon a passage where the author talks about the scientific method and in doing so uses an example of how it could be used by a mechanic to diagnose a tough motorcycle maintenance problem.

While I hadn’t applied a formal debugging process based on the scientific method in the past, I had done something similar informally countless times. Debugging often takes such an approach. You arrive at an hypothesis, you test it out and reiterate.

I decided that the next time I would get stuck on a very tough to diagnose bug, I would try out an approach based on the scientific method.

After several successful applications, I can report that I am pleased with the results. I do not recommend using such a formal process for regular run-of-the-mill bugs as it is a lot more time-consuming. This should be reserved for hard to reproduce or very complex bugs.

The scientific debugging process

The fist thing to remember when going this route is to write everything down. Writing everything down forces you to write precise hypotheses which helps the thinking process.

You start out by stating the problem. It is important not to assume anything and to try to be as unbiased as possible towards the problem. Try to avoid going into specifics as this may force you into a dead-end right from the start.

This first problem definition assumes the bug is in the database and will lead you to form hypotheses around this fact.

There is a bug when saving a customer’s shopping cart products in the database.

If you aren’t absolutely sure the problem is related to the database, you are better off with something more generic like the following problem definition.

An error can occur when saving a customer’s shopping cart.

You can then formulate an hypothesis about the bug being related to the database.

***

What follows is then a series of hypotheses, proposed experiments and results.

The hypotheses need to be falsifiable. Meaning it is possible to devise a test that would prove them wrong. This isn’t the same thing has the hypotheses being false, just that a test can be devised that could prove them false if they were.

Finally, rather than try to prove an hypothesis true, it is preferable to try to prove an hypothesis false. This is done to prevent confirmation bias.

Here is a partial example of how fictitious scientific debugging session could look like:

Problem: An error can occur when saving a customer’s shopping cart.

Observations: Problem has been reported on the production server with a non-empty shopping cart.

Hypothesis 1: I can’t reproduce the problem on the production server.
Experiment 1: Try to cause a non-empty customer’s shopping cart to save on the production server.
Results: An exception is thrown in the server side code. The hypothesis is infirmed.

Hypothesis 2: The problem is confined to the production server.

Experiment 2.a: Try to reproduce the error on the staging server with a non-empty shopping cart.
Results: I was not able to reproduce the error on the staging server. The hypothesis cannot be confirmed.

Experiment 2.b: Try to reproduce the error on a development environment with a non-empty shopping cart.
Results: I was not able to reproduce the error on a development environment. The hypothesis cannot be confirmed.

Hypothesis 3: The problem only happens with a non-empty shopping cart.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s