Skip to content

Chas Emerick

These are the stories that have been posted by Chas Emerick category.

Sane web development with Compojure, Jetty, and Maven


Published to Muck and Brass – Chas Emerick by Chas Emerick January 08, 2010 13:12

News Item added by Chas Emerick

I find myself slipping back into web development in the new year. I've known this was coming for some time, so I've had a fair chance to carefully choose my weapons:

What has really tied this all together is Maven (and a couple of plugins for it), which has enabled me to fill in a couple of gaps in what is otherwise the most pleasant web development environment I've ever used (where Pylons was the prior champ, FWIW).

The biggest gap is in automatic application reloading/redeployment – in concrete terms, when I save a Clojure source file, my application should be reloaded nearly immediately, thereby avoiding any code-build-deploy cycle. To be precise, this capability is built into Jetty (as it is in many other Java-based app servers). The question is, how to most readily utilize it.

I came across this post by Jim Downing, which describes how to set up a Maven project for a Compojure application, enabling development-mode app reloading using the maven-jetty-plugin (the formatting on that post appears to have degraded since it was published; you can check out the project described in the post here). This certainly appears to fit the bill; unfortunately, the setup that Jim describes there doesn't quite work for me – when I save a source file, the application is automatically redeployed, but no changes are picked up.

Thankfully, the fix is easy. Below is the relevant section of my pom.xml, configuring maven-jetty-plugin to add my Clojure source root as an extra classpath element. This allows Clojure, running in the jetty application server, to find and load any Clojure source files that are newer than their AOT-compiled counterparts in the usual target/classes directory (note the webAppConfig/extraClasspath elements):

<plugin>
    <groupId>org.mortbay.jetty</groupId>
    <artifactId>maven-jetty-plugin</artifactId>
    <version>6.1.15</version>
    <configuration>
        <contextPath>/</contextPath>
        <webAppConfig>
            <extraClasspath>src/main/clojure</extraClasspath>
        </webAppConfig>
        <scanIntervalSeconds>5</scanIntervalSeconds>
        <connectors>
            <connector implementation="org.mortbay.jetty.nio.SelectChannelConnector">
                <port>8080</port>
                <maxIdleTime>60000</maxIdleTime>
            </connector>
        </connectors>
        <scanTargetPatterns>
            <scanTargetPattern>
                <directory>src/main/clojure</directory>
                <includes>
                    <include>**/*.clj</include>
                </includes>
            </scanTargetPattern>
        </scanTargetPatterns>
    </configuration>
</plugin>

With that, I'm just a mvn jetty:run away (or, really, a single click away in NetBeans) from having a development process identical to paster serve --reload, with the added benefit of Clojurey goodness.

♫The more you know...♬♪

(Apologies to those who aren't familiar with American pop culture.)

If you want to compile Clojure code (and really, if you're involved in a project of any size or importance, you should be, if only to avoid forcing Clojure to generate bytecode at runtime, which will slow down the sort of rapid development enabled by automatic app redeployment as describe above), do me a favor and use clojure-maven-plugin. (The post I reference above manually invokes the Clojure compiler using ant's exec task, but that was what you had to do back in July 2009.) It's a great piece of kit, and additionally serves as a perfect gateway drug to Maven – which, despite the controversy, and my own quibbles with various aspects of it, will eventually save your bacon in any larger project.

All my methods take 316 arguments, and I like it that way


Published to Muck and Brass – Chas Emerick by Chas Emerick December 31, 2009 11:53

News Item edited by Chas Emerick

Of course, I'm not so daft as to say that, but:

If you use an imperative programming language that provides for mutable state, that's what you are saying.

For some background, I read this article yesterday, which contains this choice passage (emphasis mine):

Imagine you've implemented a large program in a purely functional way. All the data is properly threaded in and out of functions, and there are no truly destructive updates to speak of. Now pick the two lowest-level and most isolated functions in the entire codebase. They're used all over the place, but are never called from the same modules. Now make these dependent on each other: function A behaves differently depending on the number of times function B has been called and vice-versa.

In C, this is easy! It can be done quickly and cleanly by adding some global variables. In purely functional code, this is somewhere between a major rearchitecting of the data flow and hopeless.

A comment on proggit very concisely summed up just how crazy the above passage is:

Considering that one of the majors reasons to use FP is so that you don't have such inter-dependencies, it's odd to point that out as an issue.

The whole problem with imperative programming is that state gets threaded everywhere, and you can't look at any function individually and know how it will behave. I won't even go into problems associated with concurrency, where state becomes incredibly difficult to reason about if you allow that sort of thing.

I really appreciated the notion of imperative programming "threading state everywhere". Let's drive the point home, though.

Hey, I'm just the messenger

Consider a method you might see in any Java application (I oh-so-love the jvm, so I get to pick on Java), but the same sort of thing applies in C, C++, C#, python, ruby, perl, et al.:

public void doSomething (String arg1, int arg2, FooBar arg3) throws IOException;

Simple enough, right? Hey, we're programming, life is good. But, what if you saw a signature like this:

public void doSomething (String arg1, int arg2, FooBar arg3, .....,
                         String arg316) throws IOException;

316 arguments to a method (which I don't think is actually possible in the jvm, but bear with me)? "That's absurd!", you'd say. The problem, of course, is that the 3-arg doSomething actually has far more arguments than its signature implies:

The behaviour of every function in a mutable, imperative environment is dependent upon the state of all of the other (variables|attributes|bindings|whatever) in your program at the time the function is invoked.

So, if you have 313 other variables in your program, that 3-arg doSomething is functionally (ha!) operating over 316 arguments.

Would you ever intentionally write a method signature that takes 316 arguments? Would you use any library that contained such a function signature? No? Then why are you using tools that force such craziness upon you?

Postscript

Of course, there is a place for mutable, imperative programming. The fellow who wrote the blog post to which I linked above appears to work on games, one of the few places where one could unapologetically use an imperative programming language with mutable state. Update: Looks like the state-of-the-art in game programming is heading towards FP languages more than I thought. Thanks to this comment, here's a LtU thread, with slides, about the guys who wrote Gears of War and the Unreal engine recommending FP as the future of game development.

However, we need to collectively get past encouraging other software developers – the vast majority of whom do not have the particular requirements of game, systems, or embedded development – to inflict the pain of imperative languages and mutable state upon themselves, especially given the concurrency challenges that lie ahead (never mind the general problems such environments present, as I argue above). The languages are ready, the runtimes are widespread...let's stop doing it wrong.

Mavenization of NetBeans Platform projects


Published to Muck and Brass – Chas Emerick by Chas Emerick December 28, 2009 21:24

News Item added by Chas Emerick

Over the past month, I've been gradually porting all of our projects' builds from Ant to Maven. Everything's gone swimmingly, especially given the excellent clojure-maven-plugin, which allowed me to cleave off all of our comparatively complicated ant scripts for building and testing Clojure code. One part that did require some work was the porting of the builds associated with our NetBeans Platform-based applications – so, I thought I'd post a couple of hints to help others over the rough spots.

A plug for NetBeans
We've had a good deal of success in using the NetBeans Platform recently (often referred to as the NB RCP). It provides a metric ton of fairly high-quality plumbing for thick-client applications, and definitely saved our asses in a couple of key areas insofar as we've been able to reuse large pieces of the Platform, essentially unchanged, to meet critical new requirements. Of course, that's why we chose to use it in the first place.

Extemporaneous and Lengthy Background

To be clear, the rough spots in question aren't associated with the actual Mavenization of the NetBeans Platform-based projects – that's a relatively straightforward affair, with archetypes available in the NetBeans IDE to get one started, and very well-documented goals available, all provided by the NBM Maven Plugin. Given an existing ant-based build process, I found the actual porting of the build fairly straightforward.

The dicey part had to do with having a set of Platform artifacts available to build against. Under the ant-based build regime, it was common for those building on top of the NB RCP to keep a set of RCP artifacts available in every build environment. This was always a pain (for potentially-obvious reasons that I don't really want to get into now), and the general non-composability of the ant-based build process drove NB RCP users (and the Platform developers themselves) to extreme lengths of hacking to get stuff working properly. (BTW, just so everyone knows, I'm not picking on Fabrizio here – he's just the one who appears to have pushed the envelope more than anyone else vis á vis improving the composability of the ant-based RCP build process.)

One great thing about the NBM Maven Plugin is that it cuts this knot quite elegantly, making it possible to treat NetBeans Modules (NBMs) as first-class citizens within the maven world. So, if you have a maven repository that contains NBMs (like this one hosted by the NetBeans folks themselves), you can readily add NBM dependencies just like you would jar dependencies from maven central:

<dependency>
   <groupId>org.netbeans.api</groupId>
   <artifactId>org-openide-nodes</artifactId>
   <version>${netbeans.version}</version>
</dependency>

...and the NBM plugin will take care of using those NBM dependencies as appropriate:

  • injecting the NBMs' associated jars into the project's compile classpath
  • adding the NBMs as runtime dependencies of whatever NBM(s) your project/application produces
  • adding the NBMs to the (optional) "update site" associated with your NB RCP application (making remote updating of that application in the field trivial)

And, to complete the cycle, the nbm-maven-plugin provides a nbm packaging type, so that you can build NBMs independently, deploy them as you'd expect, and then compose them without any ceremony into however many NB RCP applications you'd like. No suite-chaining, no special platform or cluster artifacts in every build environment, nothing at all different from what one is used to in any other jvm/maven environment.

The Rough Spot

All of the above works flawlessly (at least it has for me in my ~month of usage). The key prerequisite though, is having access to a repository that contains the Platform NBMs that you'd like to use. The repository that I linked to above does not track NetBeans releases in lockstep (e.g. at the time of this posting, the http://bits.netbeans.org/maven2 repo has NBMs from NetBeans v6.5 and v6.7, but not v6.7.1, or the recently-released v6.8). The solution is to populate your own maven repository with those NBM artifacts.

Deploying NetBeans Platform artifacts to your own repository

This might have been a tedious process, were it not for another handy goal from the NBM Maven Plugin, populate-repository, which will push all of the artifacts produced by a NetBeans Platform build (the NBMs themselves, their sources, javadoc, and appropriate non-NetBeans dependency metadata) into your own maven repository.

There's a fair bit of configuration and setup that goes into this though. A HOWTO is provided by the nbm-maven-plugin project, but there are a number of things that it leaves unspoken. So, here's a dump of what I did to successfully populate a Nexus maven repo with a full set of NetBeans Platform artifacts:

  1. Pull the NetBeans Platform sources from the associated hg repo (I used the release68 repo, as we're targeting v6.8 of the NB RCP now). It appears that populating your repo with NB RCP artifacts from a binary download is possible, but then you'll not have the associated javadoc, source artifacts, etc.
  2. Build the entire project – I'm sure it's possible to restrict the build to certain clusters, but I don't see any reason to optimize this process since doing so only saves a little bit of disk.
    1. You must set your JAVA_HOME environment variable to point to a Sun JDK, especially in linux environments that often come with non-Sun JDKs (I'm looking at you, Ubuntu, with your cute gcj JDK). Not doing this will result in very strange compilation errors.
    2. You must set your ANT_OPTS environment variable to specify a higher-than-default maximum heap (export ANT_OPTS=-Xmx1024m worked for me).
    3. Within the top-level of your NetBeans Platform source checkout, run ant; ant nbms build-source-zips build-javadoc – this will build everything you care about in order to populate your maven repo.
  3. You want to have the NBMs in your repository to have appropriate dependency relationships established with third-party artifacts, right? Achieving this is easy if you have Nexus:
    1. unzip sonatype-work/nexus/storage/central/.index/nexus-maven-repository-index.zip somewhere (I used /tmp/nexus-index).
    2. set the nexusIndexDirectory property in the last step to that the path where you unzipped central's index; the nbm-maven-plugin will search that Lucene index to find dependencies referred to within the Platform's NBMs
  4. set MAVEN_OPTS to specify a higher-than-default maximum heap (export MAVEN_OPTS=-Xmx512m worked for me). I'm not sure why this would be required, but I got OutOfMemoryErrors with max heap set to anything less than 512MB. Perhaps searching the maven central repo index is what pushed allocation so high.
  5. Make sure you don't have a pom.xml in your current directory. Bad things will happen.
  6. Decide on a version number for the deployed artifacts, and use it as the value of the forcedVersion property. I used RELEASE68 to go along with the pattern established at http://bits.netbeans.org/maven2; 6.8 makes more sense to me, but if/when the NetBeans maven repo comes up to date with the NetBeans release schedule, sticking with their convention will allow us to use that authoritative repository with no changes to our projects.
  7. Assuming you're deploying to a release repository, make absolutely sure that you've (temporarily) enabled redeployment for that repository! nbm-maven-plugin deploys some NBMs multiple times (presumably while traversing various dependency graphs), and not enabling redeployment will result in errors (400 errors from Nexus, specifically – I can't say what might happen with different repository managers).
  8. Now for the big finish:
    mvn org.codehaus.mojo:nbm-maven-plugin:3.1:populate-repository -DforcedVersion=RELEASE68 -DnetbeansInstallDirectory=nbbuild/netbeans -DnetbeansSourcesDirectory=nbbuild/build/source-zips -DnexusIndexDirectory=/tmp/nexus-index -DnetbeansJavadocDirectory=nbbuild/build/javadoc -DnetbeansNbmDirectory=nbbuild/nbms -DdeployUrl=<nexus_repo_url> -DskipLocalInstall=true

Whew! Let that sucker run for a while, and you should be left with a maven repository fully populated with NetBeans Platform artifacts.

String Interpolation in Clojure


Published to Muck and Brass – Chas Emerick by Chas Emerick December 04, 2009 21:26

News Item edited by Chas Emerick

It's strange how some days or weeks have running themes. One theme for me this week programming-wise has been string interpolation:

  • I mentioned it in the #clojure channel on freenode earlier this week (sounds like Rich Hickey isn't a fan of the concept in general, yet),
  • Miles and I talked about it some in connection with the Clojure templating system he's been working on (plug: after recording another episode of the Strictly Professional podcast),
  • and just this morning, I noticed a post by Vassil Dichev about how one might implement string interpolation in Scala

I've become weary of format of late, and all of the other formats out there aren't any more pleasant – variadic (and even keyword or named-argument) string replacement is just a dull tool compared to real interpolation.

The Scala implementation post was the last straw for me, especially because (with all due respect to the Vassil, as he's doing very well with the materials he has at his disposal) it showcases so many of the aspects of Scala that I came to dislike in the course of using it for a year or so: the tortured syntax; the rope, nay, the barbed wire that is implicit conversions; the bear trap of traits.

A Clojure Implementation

OK, enough flame-bait. What I'm really here to do is show how easy it is to add string interpolation to Clojure, and how simple its implementation is:

(ns commons.clojure.strint
 (:use [clojure.contrib.duck-streams :only (slurp*)]))

(defn- silent-read
  [s]
  (try
    (let [r (-> s java.io.StringReader. java.io.PushbackReader.)]
      [(read r) (slurp* r)])
    (catch Exception e))) ; this indicates an invalid form -- s is just string data

(defn- interpolate
  ([s atom?]
    (lazy-seq
      (if-let [[form rest] (silent-read (subs s (if atom? 2 1)))]
        (cons form (interpolate (if atom? (subs rest 1) rest)))
        (cons (subs s 0 2) (interpolate (subs s 2))))))
  ([#^String s]
    (let [start (max (.indexOf s "~{") (.indexOf s "~("))]
      (if (== start -1)
        [s]
        (lazy-seq (cons
                    (subs s 0 start)
                    (interpolate (subs s start) (= \{ (.charAt s (inc start))))))))))

(defmacro <<
  [string]
  `(str ~@(interpolate string)))

Don't mind the namespace – that's just where we put extensions to Clojure-the-language. The public macro << (named as an homage to heredocs) takes a single string argument, and emits a str invocation that concatenates the string data and evaluated expressions contained within that argument.

Example Usage

First, let's get a value we can refer to:

commons.clojure.strint=> (def n 99)

You can do simple value replacement:

commons.clojure.strint=> (<< "There's ~{n} bottles of beer on the wall...")
"There's 99 bottles of beer on the wall..."

And evaluate arbitrary code:

commons.clojure.strint=> (<< "There's ~(dec n) bottles of beer on the wall...")
"There's 98 bottles of beer on the wall..."
commons.clojure.strint=> (<< "There's ~(seq (range n 90 -1))
                              bottles of beer on the wall...")
"There's (99 98 97 96 95 94 93 92 91) bottles of beer on the wall..."

You can use any functions or macros you have available in your Clojure environment:

commons.clojure.strint=> (defn- some-function [] {:name "Chas" :zip-code 01060})
#'commons.clojure.strint/some-function
commons.clojure.strint=> (<< "My name is ~(:name (some-function)), it's nice to meet you.")
"My name is Chas, it's nice to meet you."

...including interop with Java methods:

commons.clojure.strint=> (<< "You have approximately ~(.intValue 5.5) minutes left.")
"You have approximately 5 minutes left."

Caveats

First, let's say what's wrong with this implementation compared to, say, Ruby's string interpolation (I may be missing other points, I'm no Ruby hacker):

  1. Strings cannot be used within interpolated expressions; e.g. this will cause a straightforward parse exception:
    commons.clojure.strint=> (<< "~(str n "another string")")
    #<CompilerException java.lang.IllegalArgumentException:
         Wrong number of args passed to: strint$-LT--LT-
    

    The Clojure reader sees this as providing three arguments to the << macro. Being able to use strings within interpolated expressions would require a "native" Clojure reader macro for interpolated strings, or the ability to define reader macros in "userspace" (Clojure's read table cannot be modified in Clojure code right now – this is an intentional design decision right now).

    Update: pmjordan mentioned on hackernews that you can get around this by escaping the nested strings, like so:

    commons.clojure.strint=> (<< "~(str n \" another string\")")
    "99 another string"
    

    Very true, and very useful in a pinch, but I would definitely consider it to be a wart (and an issue that is insurmountable from Clojure userland right now).

  2. Heredocs aren't available. That's a far more general shortcoming compared to other languages, but is still related to string interpolation. This is significantly mitigated by the fact that Clojure strings are multiline already, but it would be nice in some circumstances to be able to specify a block of text using different delimiters for one-off templating, etc.
  3. Lazy sequences need to be made strict in order for them to print as they do at a REPL (thus the additional seq invocation in the (range n 90 -1)) example above).

Advantages

I'm sure a lot of people will look at this implementation and say, "so what?". Well, it's got a lot going for it:

  1. Simple implementation. Unless you've got a Pavlovian aversion to parentheses (but are somehow immune to piles of braces?), it's very comprehensible.
  2. It's user-land code. Many languages would require a compiler extension or modifications to the language core to pull this off.
  3. The interpolation happens at compile-time! The only processing that occurs at runtime is the concatenation of the chunks of each string, but all of the string and expression parsing happen before your code using the << macro would hit a customer's server or desktop. This is decidedly in contrast with the Scala interpolation implementation, where all of the string parsing is done at runtime; to my knowledge, doing anything else would require a compiler plugin there.
  4. It's fully composible with all other Clojure code. There's no restriction on where you can use the << macro, and no restriction on what Clojure (or Java!) code you can include in interpolation expressions.
  5. There's no magic. Many languages make it very easy to inject magical – as in, opaque – behaviour into your code. The Scala interpolation implementation is no different – to get that special behaviour out of a String, one must call a magical method i in order to rope in the machinery around the InterpolatedString implicit conversion. On the other hand, all of the effects and actors involved in the << macro are local, and its semantics and calling conventions are exactly the same as any other Clojure macro.

Exhale...

So, hopefully that puts string interpolation behind me. I'd love to see something like this become a reader macro in Clojure someday (maybe in conjunction with heredoc support), but in the meantime, this will make a lot of one-off templating jobs a whole lot easier in Clojure compared to using the usual variadic string replacement methods that are otherwise available.

Be mindful of Clojure's binding


Published to Muck and Brass – Chas Emerick by Chas Emerick November 05, 2009 10:09

News Item edited by Chas Emerick

Clojure's binding form is amazingly useful, but as with any very long length of rope, you can hang yourself in a cinch with it. So, let's review a couple of traps that I've personally fallen into while using binding of which you should be aware.

Binding is thread-local

This is super-simple, and it's the first thing that one learns upon encountering binding for the first time, but you can get bitten by sloppily thinking that an established binding will migrate to another thread, or by not understanding the concurrency semantics of a function you're calling within your binding form. Consider:

user=> (def *foo* 5)
#'user/*foo*
user=> (defn adder
         [param]
         (+ *foo* param))
#'user/adder
user=> (binding [*foo* 10]
         (doseq [v (pmap adder (repeat 3 5))]
           (println v)))
10
10
10
nil

So, we have a var *foo* holding a default value, and a function adder that just adds its argument to the current thread-local value of *foo*, returning the result. This is obviously just illustrative; you can assume that adder is a function call into an opaque library you're using that takes some arguments and perhaps pulls some configuration or other data from the values bound into some var it specifies as being part of its API.

The problem here is that adder is being invoked in threads other than the thread that is establishing the binding on *foo*; therefore, the value of *foo* within adder is always the default, 5.

The lesson? Bindings do not migrate across thread boundaries. One of the great things about Clojure is you can "do concurrency" using a variety of easy-to-use primitives (e.g. pmap is absolutely the cat's nuts, in that it's a dead-simple way to almost-transparently parallelize computation over a dataset). The ironic downside to that is that whereas thread boundaries are painfully obvious in other languages because of all the ceremony one needs to go through to get results, things like pmap have so little ceremony that it's easy to forget the basics.

One solution to the problem illustrated above would be to change the implementation of adder so that it's explicitly capturing the bound value of *foo*, and returning a new function that does the adding using that binding:

user=> (defn make-adder
         []
         (let [foo-value *foo*]
           #(+ foo-value %)))
#'user/make-adder
user=> (binding [*foo* 10]
         (doseq [v (pmap (make-adder) (repeat 3 5))]
           (println v)))
15
15
15
nil

Parenthetically, it's very much worth noting that all of the wonderful ref/transaction machinery in Clojure is implemented using thread-local bindings. That means that if you try to pmap a function across some set of refs in the course of a transaction (or otherwise attempt to poke at refs in a concurrent environment), things will go very wrong for you. There are ways around this, but they (last I checked) involve manually copying the thread-local bindings associated with any running transaction across thread boundaries – in general, it's not worth the hassle.

Lazy seqs often escape the scope of binding forms, so capture the value of any bound vars you care about explicitly

As wonderful as lazy sequences are, how and when they dereference bound vars isn't always obvious, and is entirely dependent upon how and when those lazy sequences are used/materialized. Consider, assuming *foo* is bound to 5 by default as in our first example:

user=> (defn some-fn
         []
         (lazy-seq [*foo*]))
#'user/some-fn
user=> (binding [*foo* 10]
         (some-fn))
(5)

What's going on here? The lazy-seq macro returns a lazy sequence, which will evaluate the sequence-producing form provided to it on demand – in this case, after the binding form has returned, therefore ensuring that *foo* has reverted to its default value.

This may become clearer with this example:

user=> (binding [*foo* 10]
         (doall (some-fn)))
(10)

doall forces the full evaluation of a lazy sequence – and in this case, because that evaluation is being performed within the binding form, *foo* and the returned sequence is found to have the value we expect.

These are obviously simplistic examples; the real-world scenario that this applies to is where you might be writing a library, and part of that library's public API are some number of bindable vars that callers can use to configure the behaviour of the library's functions, etc. This is super-useful, especially for libraries where there are a ton of knobs and levers: rather than forcing callers to provide a configuration object on every function call (and therefore forcing you to thread that configuration through all helper functions, etc), using bindings for such things allows callers to only change the defaults they care about, and allows you to code the implementation of the library in a straightforward way.

The lesson? If you are going to use bound values of vars, you need to make sure you capture those bindings before returning any lazy seqs that use those bound values. Aside from using doall as shown above (which defeats the point of using lazy seqs), the solution looks a lot like the make-adder function from the first section (notice a trend?):

user=> (defn some-fn
         []
         (let [foo-val *foo*]
           (lazy-seq [foo-val])))
#'user/some-fn
user=> (binding [*foo* 10]
         (some-fn))
(10)

Notice that some-fn is now explicitly capturing the bound value of the *foo* var; this ensures that, regardless of when and where or on which thread the lazy seq is materialized, the values it contains are what were bound by the caller of some-fn. This is almost always what you want to have happen.

Too many do not fully realize the degree of flexibility that vars and binding provide to the capable programmer. As is often the case though, power comes with responsibility, and whether one is writing libraries, using them, or casually using binding in localized ways in application code, it needs to be handled with care.

Reducing purchase anxiety is a feature


Published to Muck and Brass – Chas Emerick by Chas Emerick October 23, 2009 17:05

News Item added by Chas Emerick

Talk to anyone outside of the software world, and you'll quickly realize that one of the most gut-wrenching, anxiety-inducing acts is buying software. Even if one has evaluated the product in question top to bottom, past experience of bugs, botched updates, missing features, and outright failures and crashes has tempered any enthusiasm or confidence that might be felt when the time comes to pull out the credit card or write the purchase order.

Of course, the blame for this lies squarely with the software industry itself – the failures in software quality are well known, both discrete instances as well as in aggregate. Those of us whose business and livelihood are tied to the sale of software (whether sent out the door or delivered as a service) must do whatever we can to reverse this zeitgeist.

Given that, we've decided to adopt a very simple, no-nonsense "Satisfaction Guaranteed" policy for PDFTextStream. Hopefully this will help take the anxiety out of someone's day, somewhere.

This isn't a new idea, of course. Lots of software companies have had guarantees of some sort or another for ages, but I think my first encounter with the concept as a business owner was Joel Spolsky's post from a couple of years ago:

I think that our customers are nice because they’re not worried. They’re not worried because we have a ridiculously liberal return policy: “We don’t want your money if you’re not amazingly happy.”

Joel raised the issue again on a recent StackOverflow podcast, which prompted me to think about our own approach...

What do we do about unhappy customers?

To be honest, our customers are pretty happy. Of course, we occasionally receive a bug report, but we generally knock out patches within a couple of days, and sometimes faster. In the 5 years we've been selling PDFTextStream, we've never had a single request for a refund. Part of that is offering up a very liberal evaluation version, but I'd like to think it's because what we sell does the job it's meant to do very well.

Given that, I've never thought to make a big stink about a refund policy – it just never came up. But hearing Joel and Jeff talk about the ire that they felt towards various companies that refused to issue refunds when they weren't happy with something motivated me to make our de facto policy explicit. Thus, the new "Satisfaction Guaranteed" statement.

Part II: the Open Source Influence

An elephant in the room is the influence of open source software on customers' attitudes towards buying software, and the assessment of risk that goes along with it. As more and more users of technology (just to spread the net as widely as possible) are exposed and become accustomed to the value associated with open source software (which, in simple terms, is generally high because of its zero or near-zero price), it increases pressure on commercial vendors (like us) to up our game along the same vector.

But, the impact of open source software on pricing is a pretty stale story. The real impact is derivative, in that a zero or near-zero price means that the apparent risk associated with using open source software is zero or near-zero. The promise of proprietary, commercial software is that, if it does what the vendor claims (whatever that is), then that software will deliver benefits far in excess of its cost and far in excess of the aggregate benefit provided by the open source alternatives, even given the price differential.

The problem is that a lot of people only turn towards commercial options as a last resort because of the aforementioned historical failures of the software industry vis á vis quality: the apparent risk of commercial options is higher than that associated with open source options, simply because the latter's super-low price is a psychological antidote to any anxiety about quality issues. So, there's flight towards low-priced options, rather than a thorough search for optimal solutions. Injecting an explicit guarantee of performance and reliability (like our new "Satisfaction Guarantee") might be enough to tip the relative apparent risk in favor of the commercial option – or, at the very least, minimize the imbalance so that it's more likely that price won't dominate other factors (which are potentially more relevant to overall benefits).

Of course, this can only work if one's product is actually better than the open source alternatives, and by a good stretch to boot so as to compensate for the price differential. In any case, it's a win-win for the formerly-anxious software user and buyer: they should feel like they have more choice overall, and therefore have a better chance of discovering and adopting the best solution for any given problem, regardless of software licenses and distribution models.

Activity is not Progress (or, 'Did you really need to shave that yak')


Published to Muck and Brass – Chas Emerick by Chas Emerick October 08, 2009 16:47

News Item edited by Chas Emerick

Anyone who is accountable for any sufficiently-complex objective is constantly having their focus being pulled away from that larger goal by a thousand different fiddly tasks. Christened as yak shaving some time ago by a fellow at the MIT media lab, the concept has become a favorite shorthand in various programming and software development circles. I only heard of it this year, but it's helped to coalesce my thinking about focused work and the relationship between activity and progress.

In particular, I think it's helpful to occasionally check one's activity using what I'd call "root objective analysis".

Many people in technical fields are familiar with root cause analysis, where a problem or failure is analyzed in such a way as to determine its root cause. There are lots of flavors of root cause analysis, with Five Whys being popular among programmers due to the Joel Effect and probably some loose association between Five Whys and the lean development/startup methodologies that are all the rage these days.

In contrast, root objective analysis runs in the "opposite direction", so to speak: for any given activity, you trace the likely causal link between that activity you're engaged in, and the progress you want to make. In short: "Is what you're doing right now getting you closer to your end goal?" 1 If you do this right, or at all, you'll go down fewer dead-ends, waste less time, and prioritize the yaks you do shave so that you get to your desired end state sooner rather than later.

There's obviously a lot of fuzziness in any kind of speculative analysis like this; if there weren't, then project management would always bring jobs in on time and within budget. However, if your work often leads you far afield of your "main line" of focus, then asking yourself the question above from time to time may help you to ensure that every yak shaving you engage in is necessary, as opposed to a distraction caused by confusing activity for progress.

And Now for Something Completely Different

A yak shaving that is near and dear to my heart is the fable of the software developer and the PDF documents (not surprising, since we talk to a lot of developers who have worked with lots of PDF documents). There are many variations, but the most extreme goes something like this:

  1. Joe the developer needs to get some chunk of data into his company's database (maybe it's financial data, maybe he's working with excerpts of academic journal articles – such details are mostly irrelevant)
  2. The data is only available in PDF documents, and there's a lot of them. Thousands, perhaps millions of chunks of data in as many different PDF documents.
  3. Joe's first thought is that he needs to build a function to extract text from these PDFs so that he can get at the data he needs.  But, after...
    • reading the 1,000+ page PDF specification,
    • adding support for the 8 different versions of the spec,
    • adding support for a half-dozen encryption protocols, and
    • adding support for extracting Chinese (or Japanese, or Korean, or Icelandic with its lovely ð ("eth") character) along with the embedded fonts that go along with it
  4. ...Joe now has spent nearly a year building a one-off PDF text extraction library that (again, depending on the version of the fable) fails on 24% of the documents his company needs to access, and still doesn't run fast enough to finish in the batch window he has to work with.

Seriously, scouts-honor, I've heard this story at least 5 times...and each time right before or right after the developer/company in question purchased PDFTextStream to replace their homebrew PDF library. That, my friends, is activity without progress, yak shaving at its most epic.

Java is dead, but you'll learn to love it


Published to Muck and Brass – Chas Emerick by Chas Emerick October 01, 2009 14:22

News Item added by Chas Emerick

A favorite hobby-horse among various programming-related communities is to talk about why "Java is dead", and further, that programmers working in the Java ecosystem should really look for greener pastures elsewhere.  You see these sorts of posts pop up on proggit, for example, often enough for it to get old.  That's a lot of hot air, with plenty blowing in the other direction from various folks that have been pushing hard for significant improvements and changes to Java. Both sides are wrong, though, because as a result of its success and a series of historical accidents:

Java-the-language is dead.
Get over it, and realize that because of that fact, you'll probably come to depend upon Java more than you ever thought possible.

The JVM is probably one of the most vibrant platforms for developing new programming languages there is, in part because of the status of Java-the-language.

First, let's settle the premise. In comments on one of his recent blog posts, Joe Darcy, one of the fellows the heads up Sun's management of the JVM and JDK (I'm not sure of his exact title and portfolio), said a couple of key things about the never-ending saga regarding closures in Java:

There are millions upon millions of Java developers who would have to learn about closures if they were added in the platform.

...there is far from unanimity in the Java community on the underlying choice of whether or not closures would be an appropriate language change for Java at this time.

OK, there it is, closures are never going to be added to the Java language.  Done, and done.  And if closures aren't going in, then you can surely bet that other things aren't going to make it, either.  To further make the point, Joe commented on an earlier blog post of his here 2 , saying in reference to a question about why the Java standard libraries don't slough off deprecated APIs:

To date, we have valued continued binary compatibility with code calling the deprecated elements more than cleaning up the API.

This sort of stuff pisses a lot of people off, and leads others to propose mildly absurd things IMO, like forking the Java language into "stable" and "experimental" versions. This a lot of wasted effort.


It seems that Sun decided long ago, through pressure from its customers and developers, that compatibility is more important than innovating at the language level. With that, managing Java and the JDK became more an exercise in stewardship than anything else. The quotes above from an authoritative source are proof-positive that this is the case.

That may make the Java language dead with regard to features, but it's hardly useless – it's simply transitioned to be the stable "systems language" for the JVM that a large swath of programmers (who Sun likely correctly identifies as being uninterested in things like closures, syntactic improvements, etc. etc.) happen to use for applications as well.

Trading off "progress" for stability bestows upon Java at least two characteristics that are shared by other systems languages:

  • screaming into the void about how improvements and changes should be made yesterday is generally pointless and irrelevant
  • knowing that the language is essentially fixed for years to come means that it fades into the background as a very useful artifact for those that want to build on top of a system with well-known characteristics

A side effect of this is that the JVM is a very fertile spot for new(er) languages, where language implementers don't have to worry about their building blocks being taken away or changed radically from year to year 3 . At the same time, the JVM itself has been getting tweaked and tuned heavily under the covers to support non-Java languages, not the least of which is Sun's JavaFX, their entry into the post-Java JVM language fray 4 . So, you want your fork of Java that pushes boundaries? They are many and plentiful, so go choose one, already.

The upshot of all this is that it's more likely than not that over the course of the coming years, your life (and quite likely your professional life as well, if you're involved in software) will come to rely upon Java, the JVM behind it, and many different other language stacks built on one or both of those technologies.

Of course, interop between these languages is a concern: only APIs matching Java's binary signatures are accessible by all languages, there's no standard interface for closures, there's no standard (sane) numeric tower, etc. etc. These things are frustrating if one happens to be working in a polyglot environment, but I've no doubt that necessity will draw the larger players in the JVM language space together to establish certain baselines to ensure interoperability.

In the end, we might have all been better off if the current state of affairs had arrived years ago. A steady drip, drip, drip of Java language improvements serves only to keep developers tied around what is functionally a frozen language, and away from superior alternatives (on the same JVM platform!) if they're so inclined to look up from their work. Since the state of play vis á vis Java-the-language is clear, maybe those that care so deeply about programming language productivity, innovation, and progress can set about enjoying the advantages of the future that Java has ensured for us all.

Working with git submodules recursively


Published to Muck and Brass – Chas Emerick by Chas Emerick September 28, 2009 15:45

News Item edited by Chas Emerick

Git submodules are a relatively decent way to compose multiple source trees together, but they definitely fall short in a number of areas (which others have discussed at length elsewhere).  One thing that immediately irritated me was that there is no way to recursively update, commit, push, etc., across all of one's project's submodules.  This is something I ran into immediately upon moving to git from svn some months back, and it almost scared me away from git (we used a lot of svn:externals, and now a lot of git submodules).

Thankfully, the raw materials are there in git to work around this.  (I've since noticed a bunch of other attempts to do similar things, but they all seem way more complicated than my approach...maybe it's the perl? ;-))

Here's the script we use for operating over git submodules recursively:

git-submodule-recur.sh
#!/bin/sh

case "$1" in
        "init") CMD="submodule update --init" ;;
        *) CMD="$*" ;;
esac

git $CMD
git submodule foreach "$0" $CMD

Throw that into your $PATH (I trim the .sh), chmod +x, and git submodules become pretty pleasant to work with.  All this is doing is applying whatever arguments you would otherwise provide to git within each submodule, and their submodules, etc., all the way down.  The one special invocation, git-submodule-recur init, just executes git submodule update --init in all submodules.

So, want to get the status of your current working directory, and all submodules?  git-submodule-recur status  Want to commit all modifications in cwd and all submodules? git-submodule-recur commit -a -m "some comment"  Want to push all commits?  git-submodule-recur push  You get the picture.

Note
Starting in git 1.6.5, git submodule will grow a --recursive option for the foreach, update and status commands. That's very helpful for the most common actions (and critical for building projects that have submodules in CI containers like hudson), but git-submodule-recur definitely still has a place IMO, especially for pushing.

This script has saved me a *ton* of typing over the past months.  Hopefully, it finds a good home elsewhere, too.

Edited 2009/09/28
I tweaked the git-submodule-recur script to quote the path to the script ("$0" instead of $0); this became necessary when I dropped the script into C:\Program Files\Git\bin in our Windows-hosted Hudson environment.

Snowtide Informatics Welcomes Ben Fry (of Processing fame) to Northampton


Published to Muck and Brass – Chas Emerick by Chas Emerick April 28, 2009 13:03

News Item added by Chas Emerick

Next Tuesday, the 5th of May @ 6:30PM, Snowtide Informatics and Atalasoft will be hosting Ben Fry, creator of the Processing programming language and environment and author of Visualizing Data from O’Reilly, at Snowtide’s offices in Northampton, MA.

(This hasn’t been a secret or anything (for good reason!), but I thought I’d put out an announcement post.)

Dr. Fry will be presenting “Computational Information Design” – a mix of his work in visualization and coding plus a quick introduction to the Processing language and environment.  Processing has had a huge impact on the field of data visualization, and Dr. Fry’s presentation will no doubt be enlightening for anyone who engages in data visualization at any level.

There will be refreshments.  There’s a Google Maps link on this page if you need directions; please note that the presentation will be held in the second-floor conference room, Suite 234.

Small afterthought: the three avid readers of my blog may recall that a similar event was held a year ago, when we hosted Rich Hickey, creator of the Clojure language.  I think we (meaning Snowtide, Atalasoft, the Western Mass. Developer’s Group, et al.) have a pretty unique combination in this area of outrageously talented people with a collectively broad set of experience and specialties, and a relatively intimate environment where ideas or presentations can be fully fleshed out with lively feedback from everyone involved.  I think there’s some potential to build this foundation up into something very worthwhile; perhaps a regular flow of software wizards to give talks, show off their newest ideas, and recruit evangelists (zealots? ;-)).  Something to think about anyway…