Skip to content

Programming

These are the stories that have been posted to the Programming category.

Programmable Text Editing on Windows with EmEditor


Published to Invisible Blocks by Dan Bernier February 21, 2007 18:04

Windows lacks a decent programmable text editor.

I know there’s Emacs and Vim, but I want something that looks good, and is easy to learn. If it’s free or cheap, that’s good, too. Based on screencasts I’ve seen, I’d say I want something like TextMate, but I’ve never actually used it, so that’s just a guess.

It has to be programmable. I spend lots of time working with text, and I want an editor that lets me write code to process my text. I want to take a list of values, and generate SQL from it. I want to auto-format HTML. I want to insert custom code templates, based on keywords I type.

Really, I want something like Ruby’s text-mashing power, built into my editor. If I could send a buffer’s contents through some code, and replace the buffer with the script’s output, that’d be something. This would let me automate probably 99% of the repeatable things I do. It doesn’t have to be Ruby, but that’d be nice.

EmEditor’s Programmable Macros

I found EmEditor, which gets me pretty close. It has a 30-day trial, and it’s $40 to buy. It has programmable macros, with a decent set of variables that give you access to the buffers, editors, and all menu items. The macros are run through the Windows Script Host (WSH), which is a little scary, but it means that you can write your macros in JavaScript. Actually, since WSH is language-agnostic, you can write your macros in any language with a WSH interface. Including Ruby. Or Haskell, or Lua, which makes it easier for me to experiment with them now. And of course there’s Python and Perl. I think this is pretty nifty.

The Downside

Since it’s built on WSH, you don’t have any means of including other files. Nothing like Ruby’s require, Java’s import, C#’s using. This means no library re-use, no utility functions. No RedCloth for blog posting, no REXML for auto-formatting XML, no Hpricot for digging through generated HTML. How about a nice, clean interface to WSH’s hairy objects? I’d love to write one! (Well, I’d love to have one, so I’d be willing to write it.) But you couldn’t use it, if I did. I think this is a pretty significant problem. You have to start from scratch with each script. Someone please tell me I’m missing something here.

None of this is stopping me from rolling up my sleeves yet. Here are some tasks I’ve automated so far (you can get some of the .js files from my Google Pages…I’ll put up more when I have a minute):

  • Replacing “smart quotes” and other funny characters with their plain-text equivalents. Great for copying code and data out of MS Office products.
  • Delete all copies of whatever text is selected. When I’m cleaning up generated HTML, I remove lots of code (<td>, <tr>, class="foo"), so I can see the parts I care about. Just highlight, run, poof. All gone.
  • Make the word I just typed into an opened-and-closed HTML tag, with the cursor in between. I have this hooked up to Ctrl-space, and Shift-Ctrl-space adds in a newline and tab indent. “b” ctrl-space produces <b>|</b>, and the cursor is where the | is. “table” shift-ctrl-space “tr” shift-ctrl-space “td” ctrl-space. Much faster.
  • Run some JavaScript on each line, and replace the line with the code’s return value. You can use this for chewing on data, cleaning up formatting, calculating values, or whatever. If you have to change many lines of similar C# or Java, you can chew each line up, and rewrite the code with this. Powerful, easy to make a mess with, and all that. This is probably my favorite.

Things I haven’t gotten around to yet:

  • Tidying XML or HTML
  • Customizeable templates. These would work like my HTML-tag-izer, where you type a word, then Ctrl-space, and your word is replaced with a template of code. Eclipse has this, and TextMate comes with a whole mess of really nicely defined ones for Ruby.

Maybe someday, I’ll have learned emacs, and I’ll look back on this post and laugh at myself. “Windows Script Host?? What was wrong with me?” In the meantime, I have something that helps me along, lets me have fun automating things, and lets me explore other languages.

How to Design A Domain Specific Language, with David Pollak


Published to Invisible Blocks by Dan Bernier February 22, 2007 02:08

I just finished watching David Pollack’s presentation at Google, How to Design A Domain Specific Language. It’s only an hour, and it’s got some interesting ideas. A nice jog, good for your mental health.

His main theme is bringing computing closer to the business users. Computers exist to help us solve problems, but most people can’t program, so we need all these programmers to translate for the business people, but the translation is often imperfect, so we go back and forth like couriers between the business and the machine. Don’t we have better things to be doing? This idea of letting business users program often riles programmers who fear being automated out of a job, but I think there’s enough hard stuff out there for anyone who’s up for cracking some hard nuts. I say, let’s make the drudge work easier, so we can get on with the real work.

The essence of a DSL is building the semantics of the business into a mini-language, so the business people can read the code. [Sure, they could probably write it themselves, but most don’t want to.] And if they can read the code, they can sign off on the code. Why translate specs from business-speak into Java? Why not make the code so clear, it is the spec?

Pollack points out that if you try this, you have to convince the business that the translation works. I’ve seen this before…when your code is readable by the business, they’ll forget that it’s code. “How do we know the software does what the requirements say?” Because the requirements basically are the software. Once they grok it, though, you can almost remove yourself from the equation. Pollack says of his DSL SiteMap:

We were able to go through with…the business users themselves, to figure out [the navigation rules] for each page…We never had a failure where the business user didn’t get something they were expecting to get. We had to demonstrate early…that the implementation of the code matched the semantics of the DSL. Once we proved that, the code reviews went really really quick.

That’s a great place to be.

BTW, thanks to Peter Cooper for sharing that video. He also shared Code Generation With Ruby with Jack Herrington…Herrington’s book Code Generation in Action has been on my wishlist ever since I read the sample chapters Manning let me download, so I’m really looking forward to the video.

Functional Programming in JavaScript and Ruby


Published to Invisible Blocks by Dan Bernier February 23, 2007 13:40

[UPDATE: I’ve been lucky enough to have some commenters who know much more about functional programming than I do. There’s some good reading in the comments, and you especially should read them before using this stuff in production.]

I’ve been playing with functional JavaScript tonight. It’s not the greatest of OO-functional hybrid languages, but it’s good to supplant my fledgling functional skills with something besides Ruby. Plus, I’m not the only one doing it, so I can compare notes. And who doesn’t love JavaScript, right? …right?

Before I get much farther, I should thank Adam Turoff for his post What’s Wrong with the For Loop. It gets at the root of the why we’d use a functional programming language instead of an OO or procedural one, and (bonus!) it helped me grok Ruby’s inject method (it’s the same as my new friend fold). Go read his post, it’s good for you. And if you like it, we recommend the 1981 SICP lectures. Robusto!

Here we go!

Introductions to functional programming usually discuss three methods: map, find, and fold. The names aren’t always the same, but those are the three amigos of functional programming. They all do something different to arrays (or lists, or collections, or whatever you want to call them):

  • find does what it says. You give it some criteria, it returns all the matching elements. The criteria is expressed as a function, a little bit of code that says “yes, I want this one,” or “no, skip it.” If we’re picking out the red gumballs, you could say (in Ruby) gumballs.find_all { |ball| ball.isRed? } find is also known as filter. [Ruby only calls it find_all because it uses find for returning just the first match.] Po-tay-to, po-tah-to.
  • fold takes each element, and “folds” it into a running total. Think of summation—you’re folding each new number into a running tally. In Haskell (and, it seems, many functional languages), it comes in two varietes, fold_left and fold_right, which just mean “start from the left” and “start from the right”. Ruby calls it inject, which confuses some people (like me).
  • map is the mathiest of them (don’t be scared, it’s easy!). In English, we’d say “change each item like this, and give me the new results.” Let’s down-case a list of words. “Change each word to lowercase, and give me the downcased words.” words.map { |word| word.downcase }. The name “map” comes from functions in math (surprise), where you map each input to one output. The whole y = f(x) thing…f is for “function”. Ruby also calls this collect.

Now, none of these comes predefined in JavaScript. How can this be? Well, all the JavaScript engines out there are in browsers, and we know that “when browser elephants battle, it is the JavaScript grass that suffers.” The browser developers are busy with other things. But this oversight means we have some easy examples to write. It’s boring to re-implement existing stuff, just for the exercise. So here we go—I’ll compare examples in JavaScript and Ruby.

A quick word about JavaScript’s open classes

JavaScript has open classes, just like Ruby. In other words, you can re-open a class definition, give it a method, and then start using that method on instances of that class. JavaScript’s OO is prototype-based, not class-based, so it looks a little weird:

Array.prototype.first = function() {
    return this[0];
}
[1, 2, 3].first(); >> 1

Array.prototype.rest = function() {
    return this.slice(1);
}
[1, 2, 3].rest(); >> [2, 3]

Array.prototype.isEmpty = function() {
    return this.length == 0;
}
[].isEmpty();    >> true
[1].isEmpty();    >> false

“For the prototypical Array, the Array from which all other Arrays spring forth, create a property called ‘first’, which shall be this function, which returns the first element of the Array…” Just a Biblical way of defining a method for a class.

Open classes is good news, because it’ll make our job of adding map, find, and fold possible, since they’ll be going into the Array class.

Find

find is the easiest, so we’ll start there. Let’s take a look:

Array.prototype.find = function(isAMatch) {
    var matches = [];
    for (i = 0; i < this.length; i++) {
        if (isAMatch(this[i])) {
            matches.push(this[i]);
        }
    }
    return matches;
}

find is a function, and it takes the function isAMatch as a parameter. One of the hallmarks of the functional programming style is that functions are first-class citizens in the language: they can be treated like any other variable, so they can be passed around as parameters, and returned from other functions. [Closures and anonymous functions are also major features.] isAMatch is a function that tells you whether we should include any given element of the Array. Use find like this:

var evens = [1, 2, 3, 4, 5].find(
    function(i) {
        return i % 2 == 0;
    }
);

function isOdd(i) {
    return i % 2 != 0;
}
var odds = [1, 2, 3, 4, 5].find(isOdd);

The first example uses an in-line, anonymous function; the second uses a named function. Both work fine, but here is where we start to see JavaScript’s awkward syntax get in the way. Consider the Ruby version:

# find_all comes pre-defined in Array, via the Enumerable module,
# but this is what it would look like...
class Array
    def find_all
        matches = []
        self.each do |item|        # Ruby says 'self', not 'this'.
            if yield(item)         # yield says "Call the block we were given"
                matches << item    # Stuff item into matches
            end
        end
        return matches
    end
end

evens = [1, 2, 3, 4, 5].find_all { |i|
    i % 2 == 0
}
def isOdd(i)
    i % 2 != 0
end
odds = [1, 2, 3, 4, 5].find_all { |i|
    isOdd(i)
}

In Ruby, every expression returns a value, so return disappears. And Ruby’s blocks mean we don’t need to wrap our match condition inside function(i) { ... }. But Ruby’s find_all won’t take a reference to a method as a parameter:

def isOdd(i)
    i % 2 != 0
end
odds = [1, 2, 3, 4, 5].find_all(isOdd) >> `isOdd': wrong number of arguments (0 for 1) (ArgumentError)

Once you’ve defined a function in JavaScript, you can pass it around by name, like any other variable, but you need that function(i) { ... } syntax around your test. In Ruby, find_all takes a block instead of parameters, so you can’t pass a reference. Given how nice blocks are in Ruby, I guess this can be forgiven, but it’s a little weird.

Fold

Now we’ll get into the recursive guys. find is a pain to implement recursively in JavaScript, so I did it iteratively, but recursion works well for fold and map. Since recursion seems to be more idiomatic in functional languages, we’ll use it here.

I took two shots at fold in JavaScript (I’m learning). Here they are:

Array.prototype.fold_recursive = function(combineFunc, base) {
    if (this.isEmpty()) {      // I added isEmpty, first, and rest, as above
        return base;
    }
    else {
        return combineFunc(
            this.first(),
            this.rest().fold_recursive(combineFunc, base));
    }
}
Array.prototype.fold_iterative = function(combineFunc, tally) {
    if (this.isEmpty()) {
        return tally;
    }
    else {
        return this.rest().fold_iterative(
            combineFunc,
            combineFunc(this.first(),
                        tally));
    }
}

The first is straightforward recursion. If the list is empty, we’re done, so return the base value (zero, if we’re adding). Otherwise, combine the first value with whatever the rest of the items fold up to. [If you have trouble writing recursively, this is the canonical pattern you should probably follow: if done, do base case; else, do this element, then the rest of them.]

The second is a little different. You’ve got a starting base, the tally so far, and a combineFunc that folds each value into the tally. If the list is empty, we’re done, so return the tally. Otherwise, fold the first item into the tally for the rest of the list.

It the first, you hang around, waiting for the answer to the rest of the problem, before you add in your part of the sum. In the second, you push your part of the sum down into the next round of processing, so you don’t have to remember anything. When the answer comes back from the recursive call, it’ll already have your part of the sum in it.

[The first one is a “linear recursive process”, the second is “linear iterative process”, even though both are recursive procedures. If your interest is piqued, check out this SICP page, but it’s not needed for this article. For the rest of this post, I’ll be using the linear recursive version, because it’s conceptually clearer. Thanks to mfp for keeping me honest.]

Here’s how fold is used:

function add(a, b) { return a + b; }  // A handy adding function

Array.prototype.sum_recursive = function() {
    return this.fold_recursive(add, 0);
}
[1, 2, 3].sum_recursive();  >> 6

To sum the elements of an array, we want to fold the numbers together by add-ing them, starting at 0. Easy, isn’t it?

Here are two Ruby versions, based on fold_recursive…one that takes a function (called a “procedure object” in Ruby) as a parameter, one that takes a block:

class Array
    def rest        # We'll define rest, to keep the examples similar
        self[1..-1]
    end

    def fold_w_proc(combineFunc, base)
        if self.empty?
            base
        else
            combineFunc.call(    # "func.call(args)" instead of JavaScript's "func(args)"
                self.first,
                self.rest.fold_w_proc(
                    combineFunc,
                    base)
            )
        end
    end
    def fold_w_block(base, &combineFunc)
        if self.empty?
            base
        else
            combineFunc.call(
                self.first,
                self.rest.fold_w_block(
                    base,
                    &combineFunc)
            )
        end
    end

    def sum_w_proc
        fold_w_proc(lambda { |a, b| a + b }, 0)
    end
    def sum_w_block
        fold_w_block(0) { |a, b| a + b }
    end
end

[1, 2, 3].sum_w_proc    >> 6
[1, 2, 3].sum_w_block    >> 6

Notice how similar fold_w_block and fold_w_proc are to the JavaScript fold_recursive. The thing that differentiates fold_w_block and fold_w_proc is how they’re used. The definitions themselves are nearly identical, except for the order of the parameters. Look at sum_w_proc and sum_w_blocksum_w_block is a bit clearer, isn’t it? But if you use blocks, you lose the ability to pass a function reference as a parameter.

Map

map looks a lot like fold.

Array.prototype.map = function(mapFunc) {
    if (this.isEmpty()) {
        return this;
    }
    else {
        return [mapFunc(this.first())].concat(
                this.rest().map(mapFunc));
    }
}

function cube(i) { return i * i * i; }

[1, 2, 3].map(function(i) { return i * i; });  >> [1, 4, 9]
[1, 2, 3].map(cube);  >> [1, 8, 18]

Again, it’s basic recursion…if we’re out of items, return the empty list (ie, this). Otherwise, make an array of the first mapped item, and concatenate it with the rest of the mapped items. Again, with JavaScript, you can pass your function in-line, or by reference.

Here’s a recursive definition of map in Ruby:

class Array
    def map(&mapFunc)
        if self.empty?
            self
        else
            [mapFunc.call(self.first)] + self.rest.map(&mapFunc)
        end
    end
end

[1, 2, 3].map { |i| i * i }   >> [1, 4, 9]

As before, calling map with a block certainly looks nicer than by passing in functions, but you lose the ability to define a function somewhere, and pass it in by name, like we did with cube in JavaScript.

Wrap-up

If you want to explore functional programming, both JavaScript and Ruby are good candidates. They both have their goods and bads. Up to now, I’ve only used Ruby, but JavaScript certainly has its benefits, and exposure to both balances out my understanding.

I hope that, if you were curious about functional programming before, this was a helpful read. If you’re a functional programmer, and I got something wrong, please let me know…I’m still learning. For instance, I now better understand how recursive processes differ from linear recursive processes.

Regular Expressions into lambdas: swapper, stripper, scanner


Published to Invisible Blocks by Dan Bernier February 16, 2008 00:59

Here’s some light Friday fun… When working in ruby with regular expressions on Arrays of Strings, I get so tired of doing this:


my_array.map! { |item|
    item.gsub(/delete this/, '')
}

Can’t we just give the regex to the array, and let it figure it out? How about this:


class Regexp
   def stripper
      lambda { |s| s.gsub(self, '') }
   end
end

my_array.map!(&/delete this/.stripper)

We can generalize it to use any string as the replacement:


class Regexp
   def swapper(new_s)
      lambda { |s| s.gsub(self, new_s) }
   end

   def stripper
      swapper ''
   end
end

my_array.map!(&/delete this/.stripper)
my_array.map!(&/replace this/.swapper('with this'))

Which gets me thinking…what if we want to extract text by a regexp?


class Regexp
   def scanner
      lambda { |s| s.scan(self) }
   end
end

my_array.map(&/[find pattern]/.scanner)

What else can we do with regular expressions?

As a bonus, this translates nicely into JavaScript, too:


RegExp.prototype.swapper = function(new_s) {
   re = this
   return function(s) {
      return s.replace(re, new_s)
   }
}

RegExp.prototype.stripper = function() {
   return this.swapper('')
}

// Don't forget those closing parens...
my_array.map(/delete this/.stripper())
my_array.map(/replace this/.swapper('with this'))

Hartford Ruby Brigade starts with a tour of Ruby Facets


Published to Invisible Blocks by Dan Bernier March 18, 2008 02:36

As Rob Bazinet has said, the Hartford Ruby Brigade is having its first meeting on March 24. You can get all the details from his post. Come join us! There’s even a book raffle.

I’ll be giving the first presentation, a tour of Ruby Facets. Facets is a pretty huge library (even after they moved some really neat parts into their own projects), and it’s crazy to think we could cover it all in one night. I’ll quickly touch on the simple features, just to let you know they’re there, and I’ll spend more time with some of the interesting parts. If you’re stuck, it’s a good chance Facets has what you need; the trick is knowing it’s there, and where to look — I want to point out enough of Facets to help you with that.

I’ll also start a Tour of Facets series here, starting with this post. I’m aiming for two to four posts a month, and will cover everything in the presentation, and then some. So, on with the tour…

compare_on and equate_on

Remember the first time you saw attr_reader and attr_writer? These tiny helpers got me excited about ruby, not just because they meant less typing and DRY-er code, but because they meant I could make helpers to generate methods, too, if only I could think of a reason to do it.

Facets has a great example of why you’d want to do that: compare_on and equate_on.

Most ruby programmers know you can make your objects sortable by defining <=>, the spaceship method, on them. Typically, you wind up delegating to some attribute:


class Person
   attr_reader :fname, :lname
   def initialize(fname, lname)
      @fname, @lname = fname, lname
   end
   def <=>(person)
      @lname.<=>(person.lname)
   end
end

Facets adds compare_on, which generates the spaceship method for you, based on that attribute. Not only that, but you can compare_on multiple fields, and it handles the hairy logic for you automatically:


require 'facets/compare_on'

class Person
   attr_reader :fname, :lname
   def initialize(fname, lname)
      @fname, @lname = fname, lname
   end
   compare_on :lname, :fname
end

people = []
people.push Person.new('Adam', 'Smith')
people.push Person.new('John', 'Adams')

people.sort #=> John Adams, Adam Smith

Correctly implementing the spaceship operator isn’t too hard, but object equality gets tricky in any language. Facets helps you here by implementing ruby’s main three equality methods for you: ==, eql?, and hash.


require 'facets/compare_on'

class Person
   attr_reader :fname, :lname
   def initialize(fname, lname)
      @fname, @lname = fname, lname
   end
   equate_on :lname, :fname
end

a_pres = Person.new('John', 'Adams')
another_pres = Person.new('John', 'Adams')
[a_pres].include?(another_pres) #=> true

Again, you can equate on multiple attributes (fname and lname), and it handles all the details for you. Hope to see you at the Hartford Ruby Brigade!

A Faster, Cheaper Fibonacci Definition


Published to Invisible Blocks by Dan Bernier March 22, 2008 13:43

The Fibonacci sequence is one of the best-known number sequences:

  1. 1
  2. 1
  3. 2
  4. 3
  5. 5
  6. 8
  7. 13
  8. 21…

Each Fibonacci number above 1 is defined as the sum of the previous two Fibonacci numbers:

F(0) = 0
F(1) = 1
F(n) = F(n-1) + F(n-2)

Just for fun, here’s another way to specify the Fibonacci sequence. It takes fewer calculations, especially for large numbers. The math is basic algebra substitutions. This could be old news — if you’ve seen this before, I’d love to hear from you.

Deriving the new Fibonacci definition

Before we begin, let’s notate F(n) as Fn, and F(n-1) as Fn1 — it’ll make everything tidier.

That Fn1 + Fn2 part, to me, always begged for substitution. Fn1 should equal Fn2 + Fn3, right? What if we re-write Fn in terms of Fn1’s definition? What might that lead to?

Fn = Fn1 + Fn2
Fn1 = Fn2 + Fn3
Fn = Fn2 + Fn2 + Fn3
Fn = 2Fn2 + Fn3, as long as n > 2.

Pretty basic: substitute the definition for Fn1 back into the definition for Fn, and simplify. We can keep going:

Fn2 = Fn3 + Fn4
Fn = 2Fn2 + Fn3, from above
Fn = (2Fn3 + 2Fn4) + Fn3
Fn = 3Fn3 + 2Fn4, for n > 3

…and then:

Fn3 = Fn4 + Fn5
Fn = (3Fn4 + 3Fn5) + 2Fn4
Fn = 5Fn4 + 3Fn5, for n > 4

…and again:

Fn4 = Fn5 + Fn6
Fn = (5Fn5 + 5Fn6) + 3Fn5
Fn = 8Fn5 + 5Fn6, for n > 5

…just one more time:

Fn5 = Fn6 + Fn7
Fn = 8Fn6 + 8Fn7 + 5Fn6
Fn = 13Fn6 + 8Fn7, for n > 6

See the pattern? Look at the coefficients:

Fn = 2Fn2 + 1Fn3, for n > 2
Fn = 3Fn3 + 2Fn4, for n > 3
Fn = 5Fn4 + 3Fn5, for n > 4
Fn = 8Fn5 + 5Fn6, for n > 5
Fn = 13Fn6 + 8Fn7, for n > 6

They’re the Fibonacci sequence starting up. Do the math, and you’ll see that the next few steps follow along. What if we replace the coefficients with their Fibonacci indexes? If F(6) = 8 and F(7) = 13, we can rewrite 13Fn6 + 8Fn7 as F(7) Fn6 + F(6) Fn7. Let’s carry that out:

Fn = F(3) Fn2 + F(2) Fn3, for n > 2
Fn = F(4) Fn3 + F(3) Fn4, for n > 3
Fn = F(5) Fn4 + F(4) Fn5, for n > 4
Fn = F(6) Fn5 + F(5) Fn6, for n > 5
Fn = F(7) Fn6 + F(6) Fn7, for n > 6

Let’s quickly verify this. We know from the original definition that F(10) = 55, so let’s see whether these new versions agree. (I’ll only test the first and last versions, to save space, but you can verify as many as you like.)

F(10) = 55 = F(3) Fn2 + F(2) Fn3
F(10) = 55 = F(3) F(8) + F(2) F(7)
F(10) = 55 = 2 * 21 + 1 * 13

F(10) = 55 = F(7) Fn6 + F(6) Fn7
F(10) = 55 = F(7) F(4) + F(6) F(3)
F(10) = 55 = 13 * 3 + 8 * 2

They both pass. More generally:

Fn = F(x) Fny + F(y) Fnx, for n > x, and y = x - 1

It’s not terribly wordier than the original definition, Fn = Fn1 + Fn2, for n > 2.

Putting this to use

A nice property of this new version is that it lets us skip some steps. If we’re calculating F(1000) with the traditional definition, we have to calculate each Fibonacci along the way; but now, we can set x = 500, and skip down to the neighborhood of F(500):

F(1000) = F(500) * F(501) + F(499) * F(500)

We can continue to skip down to about n/2, decreasing the amount of calculations we need to do.

Just for fun, I implemented both fib_orig and fib_new in ruby: here’s the file. I memoized the methods, for two reasons:

  1. It’s clearly much faster, and it’s simple to do.
  2. It lets us see exactly which Fibonacci numbers were calculated.

I put the two methods in a test case, with four test methods.

The first test ensures that the new equation matches the old. Unfortunately, I could only reach 6000 with the fib_orig, before I ran out of stack space.

The second test benchmarks the two versions. It reports the memo array size (6000 for fib_orig, as expected, and 40 for fib_new — 99.3% fewer calculations).

When fib_orig ran out of stack space so quickly, I wondered how far I could take the new version (which should recurse many fewer times). So in the third test, I benchmarked it with progressively bigger numbers. It starts to slow down around the millionth Fibonacci number: it completes in about 12 seconds on my machine. I suspect it’s spending the extra time array-seeking at that point, since the array gets pretty sparse — the last few non-null array indexes are: 125001 249999 250000 250001 499999 500000 500001 1000000. Maybe I’ll try a hash…

The fourth test is a bit of extra-credit, and a sanity check. Fn / F(n+1) approaches the golden ratio, 1.61803398874…, as N approaches infinity. So I calculated F(1401) / F(1400) with fib_new, and it’s accurate to 15 decimal points, rounding the last one, which seems to be the limit of precision on my WinXP machine. I tried using higher Fibonacci numbers, but was warned that I was exceeding the range of ruby’s floating point numbers. Here’s the output of that test:

 actual golden ratio: 1.6180339887498948482
approx. golden ratio: 1.618033988749895
precision-level test: 0.333333333333333

So it seems the new approach is correct, faster, uses less space, and is still pretty elegant. Who knows whether this will ever come in handy, but at least it was fun to do.

F(n) = F(x) F(n-y) + F(y) F(n-x), for n > x, and y = x - 1

Ruby Facets: Symbol.to_proc, Class.to_proc


Published to Invisible Blocks by Dan Bernier March 28, 2008 16:52

One pretty well-know idiom in Ruby, and Facets, is Symbol.to_proc. It lets you turn these:


[1, 2, 3].map { |num| num.next }  #=> [2, 3, 4]

%w[alpha beta gamma].map { |word| word.upcase }
#=> ["ALPHA", "BETA", "GAMMA"]

…into these:


[1, 2, 3].map(&:next)
%w[alpha beta gamma].map(&:upcase)

It’s a nice little trick, though it’s not to everyone’s taste. If you’re already comfortable with Symbol.to_proc, you can skip down to the Class.to_proc section. But if you’re not, it’s worth a minute of your attention to learn it. Read on…

How it’s done

When a method takes a block, you can call yield, to run the block.


def with_a_block(a_param)
    yield
end
with_a_block('param') {
    puts 'in the block'
}

Or, you can name the block as the last parameter to the method, and put an ampersand in front of it. The ampersand makes ruby convert the block to a procedure, by calling to_proc on it. (So any object with a to_proc method can work this way, if you want.) This example works just like the last one:


def named_block(a_param, &blk)
    blk.call
end
named_block('my_param') {
    puts 'in the named block'
}

Symbol’s to_proc method creates a procedure that takes one argument, and sends the symbol to it. Sending a symbol to an object is the same as calling a method on it: object.send(:method) works the same as object.method. In the earlier upcase example, each word is passed to a procedure that calls upcase on it, giving us a list of uppercased strings.


&:upcase
# becomes...
lambda { |obj|
    obj.send(:upcase)
}
# or...
lambda { |obj|
    obj.upcase
}

Class.to_proc

So Symbol.to_proc creates a function that takes an argument, and calls that method on it. Class.to_proc creates a function that passes its argument to its constructor, yielding an instance of itself. This is a welcome addition to the to_proc family.


require 'facets'

class Person
    def initialize(name)
        @name = name
    end
end
names = %w[mickey minney goofy]
characters = names.map(&Person)

puts characters.inspect

&Person
# becomes...
lambda { |obj|
    Person.new(obj)
}

Why it’s nice

  • It’s fewer characters — it semantically compresses your code.
  • It lets you think, makes you think, on a higher level. You think about operations on your data, rather than handling one item at a time. It raises your level of thinking.
  • It works with first-class functions, which are worth understanding. They give you new ways to elegantly solve some problems (well, new to some audiences). They’re not fringe anymore — they’ve been in C# since v2.0.

Why We Abstract, and What To Do When We Can’t


Published to Invisible Blocks by Dan Bernier April 05, 2008 22:34

Whenever you see yourself writing the same thing down more than once, there’s something wrong and you shouldn’t be doing it, and the reason is not because it’s a waste of time to write something down more than once. It’s because there’s some idea here, a very simple idea, which has to do with the Sigma notation…not depending upon what it is I’m adding up. And I would like to be able to always…divide the things up into as many pieces as I can, each of which I understand separately. I would like to understand the way of adding things up, independently of what it is I’m adding up.

- Gerald Sussman, SICP Lecture 2a, “Higher-order Procedures” (emphasis added)

The purpose of abstracting is not to be vague, but to create a new semantic level in which one can be absolutely precise.

- Edsger W. Dijkstra, The Humble Programmer

What Larry Wall said about Perl holds true: “When you say something in a small language, it comes out big. When you say something in a big language, it comes out small.” The same is true for English. The reason that biologist Ernst Haeckel could say “Ontogeny recapitulates phylogeny” in only three words was that he had these powerful words with highly specific meanings at his disposal. We allow inner complexity of the language because it enables us to shift the complexity away from the individual utterance.

- Hal Fulton, The Ruby Way, Introduction (emphasis added)

Programming is our thoughts, and with better ways to express them, we can spend more time thinking them, and less time expressing them.

3 + 3 + 3 + 3 + 3 + 3 is hard…hard to read (how many threes?), hard to get right (I lost count!), hard to reason about (piles of operations!). 3 x 6 is easy, once you learn multiplication. This is a good trade-off. We should look for ways to add abstractions, new semantic levels, to our programs.

If you’re doing the same thing twice, stop, and look for the common idea. Peel the idea away from the context, from the details. Grasp the idea, and then use it over and over. As a bonus, you’ll type less, re-use code, and debug less.

“But I can’t find ways to do that!”

When you look at similar bits of code, and can’t find a good way to remove the duplication, you’re hitting the limits of either your language, or your knowledge.

Programming languages put up very real walls, they force you down their paths, often by leaving out features. A language without recursion puts up a wall in front of recursive solutions; a language without first-class functions makes it tough to write higher-order functions. Language limitations are the cause of Greenspun’s Tenth Rule.

Sometimes, the language is not the problem. Sometimes you just can’t find your way through. This is why you read Refactoring, and Design Patterns, but really, this is why you learn other programming languages. Think about the right way to factor the problem.

If you can’t remove the duplication, you need to work around your language, or learn some new tricks.

Trading Space for Speed: Memoizing with Ruby Facets


Published to Invisible Blocks by Dan Bernier April 14, 2008 11:37

Recently, I talked about a faster, cheaper way to calculate Fibonacci numbers. One of the optimizations I made was to remember the value of each Fibonacci number: since F(7) is always 13, instead of recalculating it each time N=7, we can stuff 7 -> 13 into a look-up table for future reference. The function builds up a cheat-sheet, to avoid doing the re-work. It remembers.

This is called memoization, and it’s a nice way to trade memory for performance. But it only works when the function always returns the same answer for a given set of arguments — otherwise it’s first-in wins, forever. This property of a function, returning the same answer for the same args, is called referential transparency.

A Sample Implementation

There are lots of ways you could memoize a function. Hash tables are a natural choice, since they map a key to a value, just like functions map arguments to a value. Even if you implement it differently, a hash table is a good working model for memoization.

Let’s briefly consider factorials. The regular version:


class Unmemoized
    def factorial(n)
        puts n
        if n < 1
            1
        else
            n * factorial(n-1)
        end
    end
end

unmemoized = Unmemoized.new

5.downto(1) { |i| puts "\t#{unmemoized.factorial(i)}" }

…and the memoized version:


class Memoized
    attr_reader :factorial_memo
    def initialize
        @factorial_memo = {}
    end

    def factorial(n)
        puts n
        unless @factorial_memo.has_key? n
            if n < 1
                @factorial_memo[n] = 1
            else
                @factorial_memo[n] = n * factorial(n-1)
            end
        end

        @factorial_memo[n]
    end
end

memoized = Memoized.new

5.downto(1) { |i| puts "\t#{memoized.factorial(i)}" }

puts memoized.factorial_memo.inspect

Printing the hashtable is especially telling: {5=>120, 0=>1, 1=>1, 2=>2, 3=>6, 4=>24} It reads like a look-up table for factorials.

Memoization in Facets

As relatively easy as that example is, it has its drawbacks: we need to track our previous results in a separate variable, the memoization code is mixed up with the actual calculation (the part we care about), we can’t easily use it with other functions, and the pattern only works for functions of one argument. Facets makes memoization trivial, and removes all these issues.


require 'facets/memoize'

class FacetsMemoized
    def factorial(n)
        puts n
        if n < 1
            1
        else
            n * factorial(n-1)
        end
    end

    memoize :factorial # <= HINT
end

facets_memoized = FacetsMemoized.new

5.downto(1) { |i| puts "\t#{facets_memoized.factorial(i)}" }

In case you missed it, this is just like Unmemoized above, except we added line 13, memoize :factorial…that’s it. Just like attr_reader and friends, you can pass a list of symbols to memoize, and it’ll work on functions with any number of arguments:


require 'facets/memoize'

class MemoizedMath
    def add(n, m)
        n + m
    end
    def mult(n, m)
        n * m
    end
    memoize :add, :mult
end

When You Might Use Memoization, and What to Avoid

There are a number of places where this is useful: calculating a value by successive approximation, finding the path to the root node in an immutable tree structure, finding the Nth number in a recursively-defined series, even simple derived values (like ‘abc’.upcase). In general, a function is a good candidate if it only looks at its arguments (no global, class, or member variables, no files or databases) — especially if those arguments are immutable.

Relying on side-effects (printing to standard out, writing to a database or file, or updating a variable) in memoized methods is a bad idea: they’ll only happen the first time your method is called with those arguments, which is probably not what you intend. (Unless you’re printing the arguments to illustrate how memoizing works.) On the other hand, relying on side-effects is generally a bad idea anyway. Even if you don’t use a functional programming language, you can still benefit from minimizing state changes.

Further Reading

If memoization sounds interesting to you, you might like Oliver Steele’s article about memoizing JavaScript functions. If you’re curious about immutability, you might like this Joshua Bloch interview. If you’re interested in functional programming, there are worse places to start than the excellent Structure and Interpretation of Computer Programs. And of course, there’s more where that came from, in Ruby Facets.

Passing by reference, and dog leashes


Published to Invisible Blocks by Dan Bernier April 29, 2008 17:38

Pass-by-reference and pass-by-value are pretty confusing when you start learning to code. When I first saw them, I know I ignored the distinction (until I got tired of my code not doing what I expected). Throwing collections into the mix just makes it worse.

Today, though, we stumbled on a pretty decent analogy for passing-by-reference: a reference to an object is like a leash to a dog. Let’s take our dog Dagwood for a walk.


Dog dagwood = new Dog("Dagwood");

new Dog() creates, of course, a new Dog object. Dog dagwood creates a reference that can point to any Dog object — it’s really the leash, but we name our references for what they point to, rather than what they are: a reference, a handle, a leash. The equals sign takes the leash, and hooks it to Dagwood’s collar. Now we can take Dagwood for a walk.


dagwood.walk();

To tell Dagwood it’s time to walk, we tug on the leash. He feels the tug, and gets the message, so he starts following us. We come to a busy road, and wait for the crossing signal, but Dagwood’s oblivious, and tries to cross anyway.


dagwood.halt();

Since we’re stopped, he feels the tug of the leash again, gets the message, and stops. We’re sending messages to Dagwood through his leash. In OO terms, sending a message to an object means calling one of its methods. We’re calling methods on our Dagwood through our reference to him, through the leash.

Storing a reference in an array

In the park, we find a snack shop. We’re getting hungry, but the snack shop doesn’t let dogs inside. Luckily, there’s a chain link fence, and in our eyes, a chain link fence is nothing but a big row of places for us to attach a dog leash. We tie a spare leash to the end of the fence, and attach it to Dagwood’s collar.


Dog[] fence = new Dog[10]; // only room for 10 dogs
fence[0] = dagwood;

What’s happening here in OO terms is that our reference to Dagwood, our leash, is copied into the zeroth slot on the fence. It’s not our leash, but it’s one just like it. So now there are two leashes on Dagwood: one in our hand, and one on the fence. We’ll take our leash off Dagwood, since we can’t very well hold it while we’re in the store.


dagwood = null;

Don’t worry, he’s fine…he’s still tied to the fence, by that other leash. Let’s go buy cashews.

When we come out of the store, we want to re-attach our leash to Dagwood.


dagwood = fence[0];

Now let’s untie him from the fence, and head over to the lake.


fence[0] = null;

Passing by reference

Passing references to methods works in much the same way. Dagwood got kind of stinky, swimming in the lake, so let’s bring him to the groomer for a bath.


DogGroomer.shampoo(dagwood);

When you pass a reference to a method, your reference is copied into that method. Again, it’s like a new leash, one just like ours, springs into the groomer’s hand — now Dagwood’s attached to us, and the groomer. He gets fidgety when he’s getting bathed, so it’s just as well.

From the groomer’s perspective, it might look like this:


void shampoo(Dog doggie) {
    wet(doggie);
    apply(shampoo, doggie);
    rinse(doggie);
    towelDry(doggie);
}

The groomer doesn’t care what Dagwood’s name is, she just keeps calling him “doggie.” That’s ok, she must see a lot of dogs during the day…names aren’t that important to her. The interesting thing is, even though it’s the groomer who’s shampooing our dog, since we still have a leash on him, we can observe him getting cleaner.

When she’s done, the procedure ends, the method returns, and her leash to Dagwood disappears. Which is fine, because he’s stopped fidgeting, now that he’s dry.

Garbage collection

We head back home through the park. Dagwood’s itching to run around, but we’re tired, so we just unleash him. Hopefully we can find him before it gets dark…


dagwood = null;

Unfortunately, the dog catcher spots him running around without a leash, which is illegal in these parts — a stray dog will hang around forever, eating up resources. The dog catcher carries off our poor Dagwood, and destroys him. We take it in stride, and try to keep the whole circle of life allocation-deallocation in mind.

So…

So that’s how references work. It’s why code like this (C#) will ensure the balloon bouquet has at least one balloon that says “Happy Birthday!”:


List<Balloon> balloons = GetBalloons();
Balloon printed = balloons.Find(Balloon.IsPrinted);
if (printed == null) {
   printed = new Balloon();
   printed.PrintMessage("Happy Birthday!");
   balloons.Add(printed);
}
return balloons;