21 March 2010

First-class functions make printf-debugging obsolete

tl;dr: Instead of looking through lines of text in an error log, store the variable bindings of your error cases and debug in a REPL in the context of your error.

One of the most productive features of a language is a read-eval-print loop: type in an expression, and see its value. A standard RDBMS has one for SQL, and a standard browser has one for Javascript. Good languages like Ruby, Python, Haskell and Lisp each have one. They're incredibly useful for exploratory programming.

If you're playing with library functions, or doing a join on a new set of tables, or trying to do complex Ajax calls, you'll have an easier time if you see the results of your code immediately. The faster the results come back, the most you keep your train of thought, and the easier it is to debug your code, and write more code based on it until you've solved your problem.

In an error case, you'd ideally like to be in that read-eval-print loop, poking around to see where your assumptions didn't hold true, but often times there's only a few print statements logging a few values. This is one way of getting back into that read-eval-print loop in Ruby.

As a motivating example, let's say you're trying to get the average number of letters per vowel in English words. Assuming that /usr/share/dict/words is sufficient, we can loop over every word, count the characters and count the vowels, divide, and then do the average.

As a first guess, let's try

words = File.readlines("/usr/share/dict/words")
words.map {|word| word.length / word.count("aeiouy") }.average

and bam, a divide-by-zero error. Ruby doesn't tell us the local variable bindings, what "word" was set to when it broke. It'd be great if, instead of doing

words.map {|word| print word; word.length / word.count("aeiouy") }.average

we could make a read-eval-print loop that operated in the context of the function. We already have the print part, and for the read part we can pass in a string, so all we need to do is evaluate a string in the context of the function, and store that closure over the function context outside its scope. The function eval evaluates a string, we can make a function via lambda, and we can store our function into a global variable, as follows:

words.map {|word| $repl = lambda {|string| eval string }; word.length / word.count("aeiouy") }.average

and now, when the divide-by-zero error happens, $repl is a function that evaluates its argument in the loop context of the failing words, with all of its local variables set. In my case,

>>$repl.call("word")
=> "A\n"

Voila! We didn't account for uppercase letters, and there's also some trailing whitespace coming through. Map each word to its lowercase, whitespace-stripped version, and try again:

words=File.readlines("/usr/share/dict/words").map {|word| word.strip.downcase }
words.map {|word| $repl = lambda {|string| eval string}; word.length / word.count("aeiouy") }.average

and this time

>> $repl.call("word")
=> "b"

we see another case to watch out for.

But why should we junk up our mapping function? Is there some way to package up variable bindings in a library? One of Ruby's core classes, actually, is Binding. An instance of Binding is basically the lookup table that the interpreter uses to look up the value of a variable, which you can then pass to eval. You can get the current binding at any point by calling

Kernel.binding

and you can call .binding on a block or a lambda, to get their bindings. Assuming the file "save_bindings.rb" had the following code:

$bindings = {}
def save_binding(key, &block)
$bindings[key] = block.binding
block.call
end
def use_binding_to_eval(key, string)
b = $bindings[key]
b ? eval(string,b) : "invalid key"
end

then we could do something refreshing like

require "save_bindings"
words.map {|word| save_binding("counting") { word.length / word.count("aeiouy") } }.average

so instead of doing a printf, we call save_binding, and then pass in a block that contains the code we want to run. So save_binding stores the binding in a hash table based on a key (the same key every time), and runs the code, and then, when our divide-by-zero gets thrown, we can see what the problem is by calling

>> use_binding_to_eval("counting", "word")
=> "b"

or, equally easily, anything more complex:

>> use_binding_to_eval("counting", "word.length")
=> 1
>> use_binding_to_eval("counting", "word == 'b'")
=> true
>> use_binding_to_eval("counting", "word.upcase!")
=> "B"
>> use_binding_to_eval("counting", "word")
=> "B"

and the environment variables all point to live values, as we can see with the destructively-updating String#upcase!.

Imagine if this were in a webapp. Take the following Sinatra code, in "letters-per-vowel.rb":

require "rubygems"
require "sinatra"

get("/letters-per-vowel") {
word = params["word"]
(word.length / word.count("aeiouy")).to_s
}

and run it in production mode to avoid Sinatra's helpful error messages:

ruby letters-per-vowel.rb -e production -p 4567

and go to localhost:4567/letters-per-vowel?word=aeiou or whatever vowelful word you want, and you get back what you expect, the string value of the numerical value of the division of the length of the word divided by the number of vowels. But what to do when you hit localhost:4567/letters-per-vowel?word=grrr or something like that? Use "save_bindings.rb" from earlier and make yourself a browser-based read-eval-print loop:

require "rubygems"
require "sinatra"
require "save_bindings"

SECRET = "password1"

get("/letters-per-vowel") {
word = params["word"]
save_binding("LPV") {
(word.length / word.count("aeiouy")).to_s
}
}

get("/debug") {
use_binding_to_eval(params["key"], params["string"]).to_s if params["secret"] == SECRET
}

We need a secret, to avoid people evaluating arbitrary code on the server. And now, if you hit localhost:4567/letters-per-vowel?word=grrr you'll still get that error message, but then go to localhost:4567/debug?secret=password1&key=LPV&string=request.env['HTTP_USER_AGENT'] and you can see the user agent from the environment of the request that caused the error, or go to localhost:4567/debug?secret=password1&key=LPV&string=word.size to see how long the word was. It's a bit of a security hole, but you could probably get even more exciting results if you put it behind the slick front-end of http://tryruby.org.

Check it out at http://github.com/lsb/save-bindings and let me know what you think!

Lee