Stop spam. Read books.

Friday, August 3rd, 2007

Despite the meager audience this blog enjoys and the infrequency with which it is updated, I continue to get plenty of comment spam. It’s especially strange considering that I have moderation turned on, meaning that no spam message will ever get posted live. But I guess, from the spammer’s point of view, it’s cheaper to spam first and ask questions later—or just to spam, not ask questions, and hope something sticks.

But, as with every challenge I encounter these days as a self-employed web wonk, I decided to turn this problem into a teachable moment. As in, Let’s Learn about Captchas!

Here’s what I already knew:

  • Captchas are those elements on comment forms that you to interpret a few characters of scribbled text.
  • They often look like they were written by a serial killer.
  • They are tedious.
  • They are often unnecessary, as there are simpler, less cryptic ways to tell if you are a spambot.

After some research, here’s what I learned:

  • Captcha is actually CAPTCHA, an acronym.
  • What CAPTCHA stands for isn’t important, as the words were clearly defined after the acronym, and the fit is less than snug. The important point is that the T stands for Turing.
  • Actually, the T stands for Turing Test, which goes to show you what I mean.
  • The folks at Carnegie Mellon’s School of Computer Science, who devised CAPTCHAs in the first place, have come up with reCAPTCHA, something so ingeniously cool I found myself thanking the spammers for giving me reason to use it.

ReCAPTCHAIn a nutshell, reCAPTCHA tests your humanness by asking you to retype two words. One of the words is computer-generated, and it’s against this word that the system tests your entry to see if you’ve got a brain. The second word looks exactly the same, but it comes from a different place: a repository of text scanned in via OCR, as part of a massive book digitization project. OCR technology is not perfect, and for very old books, the text is often illegible. By harnessing human brainpower in very small, distributed chunks, reCAPTCHA has random blog users interpret the words for them, adding value to the project without requiring massive amounts of extra resources.

The thing that makes reCAPTCHA so brilliant is the very same concept behind the water-pumping playground and SETI@Home—100% recycled energy. Capture some energy that’s already being used, and funnel it into the powers of good.

I’ve installed the reCAPTCHA Wordpress plugin on this here blog; I don’t have much to contribute, traffic-wise, but hopefully those numbers will increase over time. Big kudos to Carnegie Mellon for designing and developing this project. I’m embarrassed to admit I had a chance to attend your school for undergraduate studies, but passed you up for a sexier campus with a evener boy-girl ratio. Clearly you’re much cooler than I realized back then. I hope we can still be friends.