Saturday, October 7, 2006

Code Blocks and C-like Lambda Expressions

A while back, I started using Ruby for some personal software projects. I was immediately impressed with its "code block" syntax. It's a fairly natural Algol-like syntax for lambda expressions, and is used to great effect in the language. I'd like to advocate wider adoption of this kind of syntax, and suggest some improvements.

Briefly, Ruby functions can take a special "code block" parameter, which appears in their scope as a function called "yield":

def twice()
    yield()
    yield()
end

The magic yield function is then specified using the code block syntax, when twice is called:

twice() do
    print "hello"
end

Output:

hello
hello

(Ruby also allows the use of the more C-like {} delimiters -- which I generally prefer -- in place of do/end, but do/end matches the Algol-like syntax of Ruby's built-in constructs, so I prefer it in that context.)

Like regular function declarations, code blocks can take parameters:

def with(a, b)
    yield(a, b)
end

with(7, 3) do |x, y|
    print x+y
end

Output:

10

Ruby uses this for all kinds of familiar things, both from procedural and functional languages. Here are some examples:

# a foreach loop over a list
[1, 2, 3].each do |x|
    print x
end

# mapping
squares = [1, 2, 3].map do |x|
    x*x  # note the handy auto-return here
end

# like what some languages call "using" or "with"
File::open('foo.txt') do |fd|
    # the file will automatically be closed when we exit this scope
    print fd.read()
end

First of all, I think this is a great syntax for lambda expressions in Algol- and C-like languages. I prefer the C-like variant in general, so I'll use that from now on. Also, in imitation of C/C++/Java/etc function declaration, I would pull the |x, y| signature definitions out in front of the {}s. (I think I like the vertical bars better than overloading (x, y) to handle both invocation and declaration the way C does. Recruiting || for delimiters strikes me as an okay idea, but there are some good reasons to prefer distinct symbols for open and close -- you could also overload (), [], or even <>.) So I think the standard syntax for (lambda (x y) code) in C-like languages should be something like |x, y| { code }. This has the fun consequence that function declaration could be done like this:

foo = |x, y| {
    do_stuff(x*y)
}

Or, for a statically-typed language, maybe something like:

int func foo = |int x, float y| {
    do_stuff(x*y)
}

Some of the things Ruby does with its code blocks are built-in constructs (like foreach and using) in other Algol-like languages. This got me thinking: it would be nice to have a language in which all the control structures were expressible as functions that take code blocks. In Ruby (using the {} syntax) you could write an "if" function and call it like this:

if (a > b) {
    do_stuff()
}

But not like this:

if (a > b) {
    do_stuff()
} else {
    other_stuff()
}
Ruby's code block syntax is based on a similar idea in Smalltalk, which is powerful enough to express if/else (although not in C-like syntax):
(a > b)
ifTrue: [ doStuff ]
ifFalse: [ otherStuff ].

The key concept employed in the Smalltalk code above is just named parameters. There's no reason named parameters couldn't work with the C-like code block syntax -- instead of setting the magic "yield" variable the way Ruby does, just allow function-type parameters to be declared, say, at the end of the parameter list, and allow parameters to be specified by name in much the same way Ruby and Python already do. It would be cute to write a language in which "if" could be implemented like this:

if = |cond, block, else=nil| {
    # using logic operators for control flow is ugly
    (cond and (block() or true)) or (else and (else() or false))
}

and called like this:

if (cond) {
    do_stuff()
} else {
    other_stuff()
}

One interesting thing about a syntax like that is that it won't allow you to implement a standard while (cond) { code } construct, because the condition doesn't work like a parameter to a function: it can't be evaluated at call time, it needs to be re-evaluated each time the loop restarts. Instead, you'd get something like while {cond} { code }, which I find I prefer -- () and {} have clear and separate meanings, so we'd be clarifying a situation that sometimes surprises beginners by using the correct brackets for the semantics!