Sunday, September 10, 2006

Close-block Delimiter Symbols Considered Helpful

One of the first things one notices about Python is that it has "significant whitespace"; in particular, its syntax relies on indentation to delimit blocks of code. Most languages make indentation optional, but a few, like Python and Haskell, require consistent indentation. Some people immediately hate this feature, and some people immediately love it. My own initial reaction was that this was not a good idea, but I didn't feel strongly about it, and I gave it a chance.

Now I do feel strongly about it.

Over time I have come to feel that this is the worst single mistake in the design of the Python language. What's particularly bad about this mistake is that it might be propagated into future languages -- it could conceivably irritate programmers like me for decades to come. So this is my attempt to explain why Python should not be imitated in this respect.

First, let me separate out two aspects of this feature. The first is required indentation. Required indentation is pretty much okay with me. I don't think it's particularly wise to make your language interact badly with the tabs-vs-spaces quagmire, but who knows? Maybe a language feature like this, in a popular language, could finally end that holy war. Anyway, that's not what I'm talking about here. The second aspect, the one I'm interested in, is doing without a close-block delimiter symbol. (An example of block delimiter symbols, if that phrase isn't self-explanitory: Java's block delimiter symbols are "{" and "}".) Note that this is entirely unrelated to the first aspect -- a language could require consistent indentation just like Python does, and also have an explicit close-block delimiter.

Oh, you say, but that would be redundant. How horrible and ugly! Well, contrary to popular opinion, Python is not entirely without block delimiter symbols. In fact, every Python block must begin with the open-block delimiter, ":", like this:

if a == b:
    do_stuff()

Compare this to the equivalent Ruby code, which has a close-block delimiter, "end", but no explicit open-block delimiter:

if a == b
    do_stuff()
end

So there is no ideological purity involved in Python's abandonment of the close-block delimiter; it already has a redundant block delimiter, it just has the wrong one! Why the "wrong" one? Because, perhaps unlike open-block delimiters, close-block delimiter symbols are useful. They're so useful that even Python can't get along without one: it has a keyword that it uses a lot like a restricted close-block delimiter symbol for a few special cases where it just can't get around the need for such a thing! It's called "pass", and it's used in these cases:

# I don't care whether this works or not
try:
    this_might_fail()
except:
    pass
# make this method do nothing in this subclass
def override_method():
    pass

Again, some Ruby for comparison:

# make this method do nothing in this subclass
def override_method()
end

Okay, so technically, pass isn't a close-block delimiter, it's a no-op. Other lines can follow it at the same indentation level. But it's required for otherwise-empty blocks, and there's no reason for it to exist other than to make clear the existence of such a block, and in practice it is used very much like a close-block delimiter. The reasoning here is interesting. When a block contains lines of code, Python's designers presumably feel that the meaning of the code is clear without a close-block delimiter. But when a block contains no code, it's somehow too weird; one "def" statement directly before another, or an "except" with unindented lines after it would be too confusing. So an indication that the block is empty and over is desirable, even though it's not strictly necessary in order for the program to be parsable.

I also find it interesting that the case with open-block delimiters is not so weird -- in languages which lack them, as Ruby does, the only consequence is that while some blocks start with something like "if a == b", others start with a more generic symbol, like "begin". Many languages do that anyway, even when they have an open-block delimiter, as with the C/C++/Java "do {} while();".

Speaking of which, you may have noticed that Python lacks a do-while loop. Why? I think it's because it would be awkward:

do:
    some_stuff()
while a > b
foo()

Because of the lack of a close-block delimiter symbol, "while a > b" above looks like an ordinary Python statement, or the start of a new while loop, after the end of the block. Tellingly, the proposal for adding a do-while loop to Python suggests that it be integrated with the existing while loop, so that the "while" clause could open a new block, sort of like the way if/else and try/except work. The proposed syntax for the example I gave above would employ "pass" to visually close the block, like this:

do:
    some_stuff()
while a > b:
    pass
foo()

(This combination of "do-while" and "while" into a single looping construct is actually very appealing for other reasons -- I encourage you to read the reasoning in the PEP to see why more languages should have a construct like Python's proposed do-while-else.)

Aside from simple readability, there are some important reasons why close-block delimiter symbols are useful. Let me point out what a big deal that is. The whole question about how to do block-delimiting is usually about readability. The questions are things like "does {} look better than begin/end?", and "is (foo bar) clearer than <foo>bar</foo>?", and "which lines should the delimiters go on?" -- questions about what the code looks like. When you use indentation as your only close-block delimiter, though, there are some problems that go beyond readability. Here are the ones I'm aware of:

  • Many technologies are bad at preserving indentation in text. Pasting Java or Ruby code into a chat message or web page (without <pre>) might make it look uglier than you'd like, but pasting Python code into the same places is likely to destroy information, rendering the code uninterpretable or just subtly incorrect. I see this happen pretty frequently.

  • When moving a section of code from one nesting level to another, or changing the indentation scheme of a block of code to match the local convention, close-block delimiter symbols preserve the information about where blocks exist in the moving code section. Indentation is very bad at this, because its addition or removal can be done in a non-atomic way, affecting some lines before others. If you indent by hand, it's easy to lose track, as you move down through the pasted section, of the intended containing block for the next line of code. If you get this wrong in a language like Java, your code may look confusing, but it will behave correctly. In a language like Python, your code can quietly do the wrong thing! (This has actually happened to me.) Getting this right without undue mental effort requires special editor features, but even those can't always save you. My editor happens to have those features, and I still occasionally get bitten by this when someone else's editor interprets tab characters differently from mine, or when a nesting-level change is not a simple copy-and-paste operation (for example, when an enclosing conditional block is being removed, or when two-space indentation is being converted to four-space indentation.)
Sure, these are not such horrible problems. They don't irritate me most days. But they do irritate me sometimes, and there's no reason why I should ever have to tolerate them! I get nothing in return. Redundant close-block delimiter symbols have no associated cost, and some tangible benefits. Please, when designing new a language -- even a language that requires consistent indentation -- include a close-block delimiter symbol!

3 comments:

kragen said...

I don't agree that redundant block-closing delimiters have "no associated cost". They add ink to the program without adding information. In general, adding uninformative ink to the program means it takes more effort to read the informative ink.

There may be times when it's worth it, of course; the benefits (integrity protection for block-closers, as you suggest, or easier learnability, as with Python's ":") may be greater than the cost. But the cost to readability is real.

Chris Okasaki has some anecdotal evidence that mandatory indentation without block closers helps novice programmers quite a bit. "No other single factor I've run across has greater significance," he says. He devotes three paragraphs to the hypothesis that it's actually the missing block-closing delimiters that make the difference, not just the mandatory indentation.

(Most of his arguments actually seem that they would apply to Lisps, where the block-closing delimiters just get stuck onto the last line of code they enclose.)

kragen said...

My last line was pretty unclear. I meant that most of his arguments that closing delimiters are actively harmful would not apply to closing delimiters stuck on the end of the last line of their contents, instead of put on a line by themselves. Instead, they would suggest that sticking closing delimiters onto the end of said last line should be just as good as not having them at all.

István Albert said...

I want to point out that in python the colon that marks the beginning of a block is there because of usability testing indicated that people like having a visual feature that indicates that a block is coming. The parser could easily work without the colon.

Just an observation. Not too many languages include certain syntax based on usability testing ... rather than one person's opinion on what is more readable.