Skip to content

Paul B

These are the stories that have been posted by Paul B category.

A different kind of URL shortener


Published to E-Scribe News: a programmer's blog by Paul B August 30, 2010 00:08

Today I'm launching my first Google App Engine site. While I built it largely to play with GAE, it is also useful in its own right (I like to think so anyway). It does two different things:

Link shortening without redirection. Put in a godawful long Amazon link and get back a shorter Amazon link. Works with eBay and a few others too. I welcome recipes for other sites. (For the programmers in the audience, which is most of you -- yes, the processing is via regular expressions.)

It does some basic checks to confirm that the shortened URL returns the same page as the original one.

Link expansion. Put in a link from a URL shortening/redirection service, e.g. bit.ly, and see where it redirects to. Works with a slew of popular link-shorteners, including the house brands goo.gl and nyti.ms.

Some of the shortening services do offer a way to see the link target before you visit it, but they're all different; this presents a simple unified interface to that feature.

There's a bookmarklet too. If you have someone in your online life who frequently bombards you with, say, mile-long eBay links, tell them about it.

http://urlworks.appspot.com/

The syncbox


Published to E-Scribe News: a programmer's blog by Paul B June 13, 2010 19:10

I move between a couple different computers regularly: my old 12" PowerBook and the 15" MacBook Pro my job provides me with. Like all multi-computer users I periodically bump up against the challenges of what files (and versions) are where, especially when there's work in progress.

To further complicate things, I also have an extra laptop running Ubuntu. And sometimes I just SSH to my web server from somebody else's machine.

I spent a while thinking about solutions. Some people keep a "master" home directory on a server, using rsync to pull new copies (or freshen old copies) on machines where they work. Being an rsync fan, I tried this approach. After my first accidental rsync --delete casualty, though, I started thinking about ways to preserve history.

That's when the ideal solution hit me (making a big resonant "DUH" sound): distributed version control. Perfect synchronization: check. Multi-platform clients: check. Full history: check.

I created a Mercurial repository on my web server, then cloned it out to the two laptops.

For stuff that needs to be secure, I decided that simple command-line encryption was the answer (hence this tweet from a while back with a Blowfish encrypt/decrypt one-liner). And I use SSH for transport, so even the plaintext stuff is safe from in-transit snooping.

I call the synced directory "syncbox". It contains a little script for keeping things in sync. It amounts to these steps:

hg addremove
hg commit -m "Update"
hg push
hg fetch

Ironically, after having set all this up, I got an invite to try Dropbox, a nifty-looking service that offers many of the same benefits and many other features besides (e.g. desktop OS integration, selective file sharing, browser-based acess option). About all I can tout for advantages of my approach are: 1) unlimited history (Dropbox gives you 30 days), 2) no additional fees if I exceed 2G of storage, and 3) I control it completely.

Branching and merging in real life


Published to E-Scribe News: a programmer's blog by Paul B November 08, 2009 03:18

At work I still mostly use Subversion for version control. Its main selling points: stable, performs as expected, integrates nicely with Trac, holds all our old stuff (legacy inertia).

Note that "pain-free branching and merging" is not on that list. (And don't give me the old "branching is cheap in svn!" line. It's not about the branching, it's about the merging.) A couple years ago I started also using Mercurial and plan to eventually replace svn with it entirely. The aspect of Mercurial that made my life better recently is its support for branching and merging.

The scenario: an important internal web app (in use all day every school day) needed some significant changes on a short timetable. Normally I'd work on the app thus: edit the staging copy, commit, update the live copy. I didn't want to take that approach here. I knew that during the development window there might arise unrelated urgent change requests; I wanted to keep the new code isolated during development, but also deploy and track those unrelated urgent changes. Branching seemed like the right approach.

I could have made a full clone of the app (hg clone mainrepo newrepo). However, handling environment dependencies (web server, PythonPath, database) would have added time and fussiness to the job, and time was in short supply. So, using Mercurial's named-branches feature, I made a new branch (hg branch newstuff) right inside the fully-functional staging copy of the app. That way I was able to develop and test as usual, secure that my unproven work-in-progress was not "polluting" the current app's revision history.

To handle "unrelated urgent changes" as mentioned above, I'd:

  1. Commit any current work on the "newstuff" branch
  2. Switch to the main branch (hg update -r default)
  3. Make the urgent change, test, commit, update the live copy
  4. Switch back to the new branch (hg update -r newstuff)

It took me a couple tries to understand how branch-switching worked, but it's simple: you really are updating your working directory to a new revision, it just happens to be a revision stored in a different branch from the current one.

It was fun looking at the graph (via HgWeb) and seeing my two parallel branches with their individual commits.

The moment of truth came at the end of the day Friday, when it was time to merge the tested and complete "newstuff" code with the current live codebase. It was dead simple, and effectively instantaneous. Condensed version: hg update -r default; hg merge -r newstuff; hg ci -m "merged new stuff". Followed by: update live copy and let out a big sigh.

Summer Spam


Published to E-Scribe News: a programmer's blog by Paul B July 24, 2009 03:44

Spam is occupying more than its customary share of my attention in recent weeks. I've long had a morbid fascination with sleazy human communication (hence Purportal.com). That makes the always-relentless stream of spam, though not exactly welcome, at least interesting.

Spam volume also seems to have increased during this period. The number of spam attempts my mail server rejects per day had been steady at around 3,000 for months. Now it's back up around 5,000 or 6,000.

I run my own mail server and fight spam via greylisting, blacklisting, and other strict technical rules. This setup rejects 99+% of the spam aimed at the domains I host, but some still gets through to me. Never enough to displace real mail, but enough to keep my little hobby-interest alive. Here are some of the spam highlights of my summer so far:

  • After one too many identical HTML spams, I took the rare step of adding a custom rule to my mail server config. I started rejecting all mail with "Content-Type: text/html; charset=us-ascii". In this age of Unicode, that's turned out to be a pretty safe bet. Lots of rejections and no known false positives.

  • I received a weird email about money via Craigslist. It looked like a response to an ad -- one I'd never seen before, and certainly hadn't placed. Naturally my first thought was that the Craigslist bit was all a ruse, but a at the message headers showed it was real: it had been sent via Craigslist in response to an ad with my email address attached. In other words, a Craigslist ad that had been created (copied verbatim from a legit ad) just to send spam to me via Craigslist's email forwarding feature.

  • I spent a few minutes trying to convince emusic.com (via email) of the fact that since I received spam at an email address that I had invented purely for use with their service, and which had never been used for anything else, this meant that somebody had poached their list from inside. They are still thinking about this silently.

  • I encountered a new form of referrer-spam. Remember referrer spam? Spammers would put their URLs in the HTTP_REFERER header when hitting blogs and other websites that had dynamically generated lists of "top referrers", then the spammers' sites would show up in those lists. Well, this week I saw an inscrutable but surely related anomaly in the headers of some requests made to one of my sites (which I was looking at for other reasons, not spam-hunting). This HTTP_REFERER header was a giant comma-delimited list of approximately 10 or 15 URLs.

And finally, there was the phishing message I received today. It was a fake eBay notice, with the usual "click here to resolve the dispute" links. Those links were supposed to take the victim to a fake eBay page the scammers had set up (where the victim would type in all sorts of exploitable personal information). Looking at the message's raw source, I noticed something very odd -- the pages they were trying to link to were on an FTP server in Russia. Even weirder and better, the link code contained their FTP username and password! A minute later I was logged into their FTP server, looking at the one file there: the fake eBay page.

This was a darkly humorous reminder that the international spam-and-scam business is, from what I can see, a refuge for IT people (or wannabes) with poor skills and poorer ethics. So by this point I was kind of feeling bad for the incompetent underling who had put this thing together for his terrible boss.

However, I didn't let my compassion interfere with my sense of justice and fun. I replaced their fake eBay page with my own content, a much simpler message in plain text: "We are scammers."

SPF-enabled spam domains


Published to E-Scribe News: a programmer's blog by Paul B June 03, 2009 22:22

Among the many anti-spam measures on my mail server -- which help me reject 5000 spam attempts per day -- is SPF. SPF allows domain name owners to specify which mail servers are allowed to send its mail. That makes it an excellent way to detect address forgeries, a favorite spammer tool.

One of the early questions raised about SPF was: won't spammers just buy their own domains and set up their own SPF records that say it's all OK? You can read the answer in the SPF FAQ, but the short version is: Yes, they will, but it won't give them a free pass.

That's because if spammers register a domain, publish SPF records for it, and send spam, they've identified that domain as one intended to be used for spam. Very good blacklist fodder.

With that in mind, here's a list of about 50 domain names that have recently been used to send me spam. All of these have published SPF records, and all the spam I received was from servers approved by those SPF records.

In other words, as far as I can tell, these are domains that exist primarily, if not purely, to send spam.

Update -- Here are the latest as of 2009-07-24: alg.com barrewardonline.com blueheavenbooks.com boudy.com eautocentral.com export2000.ro outpost.mm302.com qeentreeforlife.com ronaldvnash.com sistemas.com.ar smartserv.net solorpowernowme.com spig-int.com synergynetfour.com topproducerhelp.com truehouseinfo.com truelifeproducts.com unafraidrewardonline.com weathersearchontheweb.com

If for some reason a perfectly innocent non-spammy domain of yours has made it into this list, please let me know. (You might have to use my contact form, since I've already blacklisted all these domains!)

Chess via iPod


Published to E-Scribe News: a programmer's blog by Paul B May 11, 2009 20:53

I'm still loving my iPod touch. It's really a great little handheld computer. I'm able to do almost everything I need with the stock apps, but there are a couple free third-party apps that have earned a permanent place on it. One is the game Chess With Friends from NewToy.

Chess with God This is a version of what is also known as "postal" or "correspondence" chess. You make a move and send it to your opponent; your opponent makes a move and sends it back to you. (In this version, the CWF app rather than your mail carrier is the middleman.) You can pick somebody out of your address book, or ask the CWF app to find you a random opponent. Nice touches include in-game chat, step-by-step replays, and optional email or SMS notifications.

The human angle is what makes it fun. Most chess players have been periodically disheartened by computer opponents that beat humans (those who play at a mortal level like I do, anyway) coldly, soundly, and rapidly. The variety of human players that the CWF random-opponent feature delivers is a welcome change.

You get to pick your own screen name. People who know you can search by this name if they like, so it serves a useful purpose in addition to being a nametag. It also is occasionally the source of some amusement, as in the screen capture included here from the end of a recent game.

Aesthetics and computation


Published to E-Scribe News: a programmer's blog by Paul B May 06, 2009 01:33

This evening, the Western Mass. Developers Group was treated to a talk by Ben Fry of Processing fame. It was excellent and inspiring. Having not much prior exposure to Processing or his work, I left hungry for more. (The title of this post is taken from the name of the group at the MIT Media Lab where Fry did his PhD work.)

I liked the graphical-REPL flavor of his live demos. Surprisingly, the feeling reminded me of being a kid flipping through Alan Kay's article about the Xerox Alto in Scientific American 30 years ago.

He gave a fun tour of creations by Processing users, with various highlights along the way including magazine cover art, a Superbowl ad, a scene from Minority Report, and the work by Robert Hodgin that was picked up by Apple for the iTunes 8 visualizer. Along the way he was concientious about giving his co-conspirator Casey Reas (not in attendance) his share of the credit.

Turnout was good, by our small-town standards: a full room, 25 people or so. Many had come out of the woodwork from local colleges (notably Smith and UMass). O'Reilly gave us a few copies of his book, which we had a drawing for at the end.

I found his work to be a heady mix of technical acuity, aesthetic commitment, and pragmatism. And I liked his dry sense of humor -- jokes that many non-technical audiences probably wouldn't have even known were jokes.

His work is especially interesting to me because I've straddled the design/enginering line most of my professional life.

At the end I asked him about this cross-disciplinary world of his, and whether he had observations about qualities that were good predictors of success. He thought for a moment. His answer, which included mention of a Harvard class he taught to a mix of art/literature/CS/etc. majors, began with one clear word: "Curiosity."

Hello from BarCampBoston


Published to E-Scribe News: a programmer's blog by Paul B April 25, 2009 14:18

Greetings from Boston -- specifically, BarCampBoston. My first "unconference". Nerds galore.

The format is (mostly) half-hour talks from attendees on whatever subjects interest them -- as long as other attendees have also expressed interest. It's all tracked on a big board in the lobby. So far I've been in discussions involving localization, designing for technophobes, cloud computing, physics simulation in games, and Lisp. The level of interactivity is high -- as is the collective expertise brought by the participants.

Stata Center This is taking place in MIT's Stata Center, a wild-looking Frank Gehry creation that clearly houses a lot of fun regular old MIT stuff in addition to transient visitors like us. Walking down a side hall during lunch I peered through the glass of a closed door into a large office containing a pile of what looked like robots.

Update: In BarCamp spirit, I gave a talk. The title was "Moving from PHP to Django". Much like some of my earlier blog posts on the subject, I talked about how to make the transition from PHP to Python/Django smoother and more enjoyable. At the end, I noted that I had two copies of our book on hand. I made a special offer: buy one (at a great discount price, in fact) and I would donate the proceeds to BarCamp. I had two takers and made my donation immediately after. Thanks, guys!

robots.txt via Django, in one line


Published to E-Scribe News: a programmer's blog by Paul B April 25, 2009 11:13

A significant difference between developing Django sites versus static-HTML-based approaches (among which I count PHP and the like) is that static files, aka "media", live in a dedicated spot.

Sometimes you need a piece of static content to be available at a specific URL outside your media root. robots.txt for example. This can be done in pure Django (i.e. without even touching your Apache configuration), and is especially nice if your robots.txt content is short. The example below serves a basic "keep out" configuration.

At the top of your root URLconf, add this import:

from django.http import HttpResponse

and below, among your list of URL patterns, add:

(r'^robots\.txt$', lambda r: HttpResponse("User-agent: *\nDisallow: /*", mimetype="text/plain"))

The lambda r bit is a concise way of creating a function object which accepts (and discards) the HttpRequest object that Django provides to all views. The "mimetype" setting (aka "content_type" in Django 1.0) is important too, because robots don't like text/html.

So there you have it -- a classic one-line (plus an import) robots.txt solution.

Django LogEntry to the rescue


Published to E-Scribe News: a programmer's blog by Paul B February 14, 2009 00:47

If you use Django's admin application, you're familiar with its "Recent Actions" sidebar. It gives a simple summary of your latest edits, including clickable links to the relevant objects (not any ones you deleted, naturally, but ones you added or changed).

It's probably not something you look at very often, unless you do such intensive work in the admin that you lose track of things.

Django stores that log data (via the admin's LogEntry model) for all admin users, a fact which has caused me to repeatedly daydream about writing a custom view or two to display it. In other words, I'd like to let superusers browse all object editing history. Because sometimes you need to answer questions like "When was that changed?" and/or "Who changed it?"

Today at work, a question arose about some data that was deleted via the admin several months ago. It didn't need recovering, we just needed a record of its deletion. An audit trail.

LogEntry to the rescue! Via manage.py shell and manage.py dbshell I was able to do some quick spelunking and get exactly the records we needed.

It was a very positive experience. I love being able answer questions that begin, "Paul, is there any way to..." with: "Yes!" After this, I may even be a little bit closer to writing that code I've been daydreaming about.