CAPTCHAYou’ve probably seen these annoying boxes on many websites asking you to type the two words to help “stop spam.” They often ask you to type in words that are difficult to read, barely legible, and are not always in English. Sometimes they are bent out of shape and other times they are masked by another random image. Every time I come across these visual challenges I cringe – not because of the extra effort of typing in 2 words, but rather because my success rate is extremely low! Not sure what I mean? See the image above.

What is it?
These visual challenges are called CAPTCHA which stands for Completely Automated Public Turing test to tell Computers and Humans Apart. In people speak, these boxes are used to ensure that you are actually a human visitor and not a web robot. If you look all the way towards the bottom of this blog post, you’ll actually see a CAPTCHA box. As annoying as they are, they actually do serve a good purpose.

The benefit of CAPTCHA
To better understand the benefits of CAPTCHA, I’m going to digress a bit. Google (and other brave companies) have taken on the mammoth task to digitize the worlds libraries. The books they are digitizing span all years, languages and continents. Many of these books are faded or damaged, and unless captured digitally would otherwise eventually be lost in time. However, simply taking a snapshot of each page of each book would serve no purpose if they were not readable, and more importantly, searchable. Seeing an opportunity in this, researchers at Carnegie Mellon University decided to use the CAPTCHA program to help digitize these books. They created another program called reCAPTCHA.

In order to assist in the digitization of these scanned books, the goal of reCAPTCHA is to continue to prevent spam while also assisting in the digitization of books. Every time you type these words into the box, you are helping to preserve works that may be hundreds or thousands of years old.

The reCAPTCHA program is so effective that they have already been able to digitize 20 years worth of archives of the New York Times. While computers may be able to process information at a rapid rate, their accuracy is less impressive when it comes to interpreting scanned text. The image below shows the difference between what gets scanned (the first line) and what the computer “reads” by Optical Character Recognition (the second line).

I hope you have a better appreciation of how the use of technology to stop spam can also result in preserving literature destined to fade into history.

Love CAPTCHA? Hate CAPTCHA? Let me know in the comments?

Image 1 Credit

Image 2 Credit