Computer Science is badly misnamed. For the most part, I don't see
how it's a science at all. If you look at the curriculum of a typical
undergraduate CS program, about half of it looks like a branch of
mathematics. There are classes on abstractions like automata,
algorithms, and computational complexity — areas in which many
important results were worked out by mathematicians long before the
first computers were built. Interspersed with the mathematics are
classes on software engineering; classes about how to build things,
like operating systems, compilers, and networks. There aren't many
labs, because the field isn't too big on experiments. How many people
in CS departments even consider themselves scientists?
But there's an important component of software engineering that does
qualify as a science: debugging computer programs. When you find a
bug in your program, your mental model of the program has been
invalidated. The program didn't behave as you expected. Your task in
debugging is to find out why. Or, put another way, to find out how
your program actually behaves — just because you see unexpected
results doesn't mean that you understand exactly what your program did
to produce them. Experiment has invalidated your old theory, and
you're about to perform additional experiments to help you find a new
one. Science!
Debugging is particularly interesting because it's such a simple
example of science. The relevant experiments can be performed rapidly
and without expensive equipment. The measurements are precise. The
problems can be simple, or very complex. It seems like an ideal model
for investigating how science works. If you want to learn something
about effective scientific technique, look at the neat little
microcosm of debugging technique.
For example, some people are very bad at debugging. These people
typically "guess and check": they guess at what must be going wrong,
"fix" their assumed problem, and then check to see whether the
program's behavior improves. Sometimes you get lucky this way, but if
there are a lot of possibilities, you can spend a long time guessing.
You might even forget what you've already tried and end up testing
potential solutions more than once. Or, worse, you could think you've
tried something you haven't.
This is poor scientific method, but it's also the form of the
scientific method that you were probably taught in grade school. It
has all the components of a science fair project: problem, hypothesis,
experiment, and conclusion. But if you've ever watched someone debug
this way, you know what a disaster it is.
The problem is that science as described above is only a
filter. Really, all it describes are the inputs and outputs of
experimentation. Given a situation you don't understand (problem) and
a guess about what is going on (hypothesis), you perform some
procedure (experiment) to determine whether your guess is plausible
(conclusion). This explains how to distinguish the plausible guesses
from the implausible ones, but doesn't provide any guidance about
which guesses you should make, or in what order.
In contrast, here's how effective software engineers debug, in my
experience:
- They find a very general open question about the problem. They don't jump to conclusions about exactly what might be going wrong, like "should the loop in function f in module Foo start at 1 rather than 0?". Instead, they start with more general questions, like "which modules are involved?".
- They come up with an experiment that will give as definitive an
answer as is practical. Sometimes this is a potential fix, but often
it's some more direct type of data-gathering, like tracing,
assertions, or inspection at a breakpoint in a debugger.
- They perform the experiment.
- They repeat the above steps, narrowing down their questions until
they have an accurate idea of what must be going wrong.
What's going on here? The middle steps could be part of the same
flailing experimental process that even the poorest programmers use.
The important parts are really steps 1 and 4, the identification of
open questions that cover a broad range of possibilities, and the
continual narrowing of those questions as you hone in on the answer.
While the "guess and check" method resembled a science fair project,
this method more closely resembles John R. Platt's description of what
he called "strong inference" in his famous speech on effective
scientific technique, The New Baconians:
- Devising alternative hypotheses;
- Devising a crucial experiment (or several of them), with alternative possible outcomes, each of which will, as nearly is possible, exclude one or more of the hypotheses;
- Carrying out the experiment so as to get a clean result;
- Recycling the procedure, making subhypotheses or sequential hypotheses to refine the possibilities that remain, and so on.
Where the scientific method you were taught in grade school is a
filter, strong inference is a search. The idea goes back to
Francis Bacon, who described science as tree traversal using the metaphor of "fingerposts which are set up where roads part, to indicate the several directions". Using the
filter of experimentation to distinguish valid explanations from
invalid ones, strong inference describes a method of ordering the space of possible
explanations, and sort of binary-searching (actually, n-ary-searching)
your way down to the correct explanation.
Platt didn't talk about starting with general open questions, although
he got close with his comment about "subhypotheses". Part of the
reason Platt's speech was so popular is that effective scientific
technique isn't easy to pin down. Like the algorithm you use when
searching through the phone book, debugging and other science may be
easier to do than it is to precisely specify. (Hint: searching the
phone book is sort of an interpolation
search.) But debugging at least provides more convenient guiding
examples than the slower, more complex sciences.
So how does this effective debugging procedure work, in detail? The
fundamental technique is to arrange the search space into a tree based
on the generality of the assertions each hypothesis makes. Every
hypothesis makes at least one assertion, which can be viewed as a
boolean question: is the assertion true, or false? Your tree branches
for each of those possibilities. A more general hypothesis like "the
problem is in function f" appears higher in your search tree than one
that makes more specific assertions, like "the problem is in function
f, and f's local variable foo can be null". Then you traverse the
tree, doing an experiment at each level to select the correct branch.
One variant on this process has been automated in git
bisect. When you know that a bug used to be absent from your
code, you can start with a general hypothesis that the bug was
introduced somewhere in a large range of versions, and narrow that
hypothesis down to smaller ranges and eventually a specific version.
Given a test case and an old revision that doesn't contain the bug,
you can literally binary search your way to the exact revision in
which the bug was introduced. From there, you may need to do
additional searching to get down to the specific problem.
Sometimes a single experiment can resolve more than one binary
question at a time, so it's useful to look at this process as an n-ary
search through a tree, rather than just a binary search. At any given
level of the tree, you might have branches for more than two
alternative hypotheses.
Similarly, sometimes you run into a number of equally-general
hypotheses. For example, if you already know that the problem must be
in exactly one of three functions, it might not be clear which one to
test first. One useful ordering for equally-general hypotheses is by
the probability of successful identification of a plausible branch.
If you know that functions f and g are old and have
been subject to a fair amount of testing, but function h is
much newer, you might test h first. The idea of arranging
multiple levels of a search tree according to that principle reminds
me of Huffman
coding; you organize your tree so that the most probable
hypotheses can be investigated in the least time.
Of course, questions aren't all created equal, and there isn't always
an easy experiment to resolve every question. You might find that the
obtainable evidence supports more than one hypothesis at some level in
your tree. This happens all the time in multithreaded programming,
where complex race conditions and deadlocks can be extremely difficult
to reproduce. If you can't devise an experiment to determine which of
two branches is correct, you have to take both branches, and hope one
of them is invalidated further down. You can do this one branch at a
time: if the branch you choose first is invalidated, you backtrack to
focus on the correct branch. This means that in science, having
multiple camps in support of divergent theories is perhaps a good
thing: it's parallel processing! On the other hand, there's something
to be said for the idea that a single scientist should be able to find
more than one incomplete explanation plausible simultaneously.
Worse than merely taking the wrong branch first is what happens if you
fail to consider all the possible explanations at some level in your
tree. You could wind up following down some initially plausible but
ultimately incorrect path without even being aware of the alternative.
In debugging, when you realize you've done this, it's not too
dramatic. Maybe you slap yourself on the forehead. But in more
complex subjects like physics or chemistry, major backtracks to
previously unimagined possibilities are what Thomas Kuhn called
"paradigm shifts" — these are upsets like special relativity,
and the abandonment of the phlogiston theory of combustion.
The fact that Kuhn wrote about paradigm shift in the 1960s, hundreds
of years after Francis Bacon wrote about conditional inductive trees,
shows how our understanding of scientific technique has lagged behind
our ability to use it. It might make sense to use debugging as a
model for teaching science. Students could develop methodological
intuition on a simplified model, rather than trying to accumulate it
over years of much slower work with a harder problem. Of course,
fledgling scientists would have to learn some computer programming to
be able to play with debugging tasks. There's a fair argument that
tomorrow's scientists would do well to pick up a little bit of
computer programming anyway. But, simplified computational models of
physics or chemistry could provide just as nice a model as debugging.
In fact, in computational biology, serious scientific experiments are
already being done on computer models of physics, rather than the real
thing! Why not start with something simpler, for educational
purposes?
Update: I originally asserted that Francis Bacon, "using modern CS terminology three hundred years ahead of time, called [strong inference] a 'conditional inductive tree'". I was confused — those were Platt's words, not Bacon's.