Out of curiosity, I recently started reading the haskell-prime
mailing list. Haskell' is
the interim name for the language standard that will supersede
Haskell98, like C++0x is to
C++03. There's been some discussion on the list about where to break
backward compatibility with Haskell98, and which features are worth
the trouble. It's interesting to see a language just starting to have
to deal with these issues. On one side, you've got researchers and
language designers angling to improve the language as much as possible
before a new standard is nailed down. On the other are mainly
industry users, pushing for less change and more backward
compatibility, so they don't have to spend time and money upgrading
their codebase or working with old and unsupported tools.
The influence of backward compatibility on software is hard to
overestimate. Windows Vista is still binary-compatible with many
MS-DOS programs dating back to 1981, and the DOS API was in turn
meant to make it easy to port CP/M programs from the late ’70s.
Meanwhile, Mac OS X and Linux are both Unix variants, with some core
API features dating back to the early ’70s.
The situation with processor instruction sets is similar: the
processors in today's PCs (and Macs!) are backward-compatible with
Intel's original 8086, which itself was designed so that assembly for
the 8008 — the first 8-bit microprocessor, released in 1972
— could be converted automatically.
This means that there are 30-year-old programs which would require
very little modification to run on today's operating systems. And
there's no reason to expect that in another 30 years we won't still be
using systems with an unbroken 60-year-long chain of backward
compatibility.
It's not that these technologies were so ingenious that we
haven't managed to think of anything better in the intervening
decades. Rather, when the quality of a software interface gets good
enough to become a widespread standard, the value of any given
improvement on that interface is dwarfed by the value of
compatibility. Progress in any direction that would require a break
in compatibility slows dramatically. The bar for de-facto standards
isn't "great", but merely "good enough".
What this means is that an increasing number of design features in the
software systems we use every day are attributable to historical
reasons. That's the terrible legacy of legacy code. The crushing
gravity of installed bases eventually pulls even the best-designed
systems down into a mire of hard-to-learn, hard-to-use arcana.
A lot of programming languages from the ’90s are feeling that
pressure today, and the result is a number of planned
backward-incompatible major revisions, including Python
3000, Ruby
2, and Perl 6. I'm going
to go out on a limb and claim that those languages have more users and
larger existing codebases than Haskell does. If they can make
backward-incompatible changes just to clean house, surely a primarily
research-oriented language like Haskell can.
Don't get me wrong, there are good
reasons to maintain backward compatibility in a wide variety of
situations. But if you don't have those reasons, why in the world
would you subject your programming language to the mangling, bloating
influence of backward compatibility? While backward compatibility is
a great default position when you don't have any improvements to make,
giving up too much to maintain compatibility is bad for everyone.
The question is, how does the cost to existing users compare to the
value to all future users? If your language is still growing
in popularity, the number of future users can easily exceed the number
of current users by orders of magnitude. If you don't break
compatibility to fix a problem, you're hurting all the users who will
have to live with the problem for who knows how long, in order to
avoid hurting the few who will have to upgrade now. And if you don't
fix it, someone else will, in a new language, so your users get stuck
in COBOL-support hell eventually anyway. That's if we're lucky. If
we're not lucky, your language will become so popular that the value
of compatibility will outweigh the value of better languages, and your
language will be a drag on progress. It's practically a moral
imperative to fix it while you still can.
C++ is an excellent example of a language that has valued backward
compatibility over simplicity and consistency. It's a popular,
practical tool, but few people consider it beautiful, easy to learn,
or easy to work with. As Bjarne Stroustrup put
it:
I consider it reasonably obvious that in the absence of compatibility
requirements, a new language on this model can be much smaller,
simpler, more expressive, and more amenable to tool use than C++,
without loss of performance or restriction of application domains. How
much smaller? Say 10% of the size of C++ in definition and similar in
front-end compiler size. In the "Retrospective" chapter of D&E, I
expressed that idea as "Inside C++, there is a much smaller and
cleaner language struggling to get out". Most of the simplification
would come from generalization — from eliminating the mess of
special cases that makes C++ so hard handle to rather than restriction
or moving work from compile time to run time.
What Stroustrup originally said was that "Within C++, there is a much
smaller and cleaner language struggling to get out," which "would
[...] have been an unimportant cult language." I'm not sure I agree
with that last part. Java, for example, is a syntactically similar
language whose designers did decide to give up compatibility with C in
order to achieve greater simplicity and consistency. Even though Java
is virtual-machine-based and unsuitable for a wide variety of systems
programming tasks, its popularity has by some metrics exceeded that of C++.
What has compatibility bought C++? Java shows that it wasn't a
requirement for popularity. And despite being largely
backward-compatible with C, C++ has had difficulty
supplanting it for many tasks. Indeed, there's more than one place
where a break with backward compatibility might have simplified and
improved C++ at little cost. The modern
C++ style that has become prominent in recent years bears little
relation to the way C++ was written in the ’90s. How much value
is there, really, in being able to compile both dialects with the same
compilers?
So that's my position: backward compatibility at the expense of
simplification is only appropriate when you can't gain acceptance any
other way. If a given backward-incompatible improvement isn't going
to cause a fork in the language, its value probably outweighs the
value of compatibility.