Tag Archives: captcha

To Err is Human; to pick up errors is human too

We’ve just bought something online with a credit card, or made a comment on a news story, or gone to vote on whether public transport in Brisbane is getting better or worse (answer: better but more expensive) – and up pop a few badly shaped letters and numbers to retype, because the last stage in this computer transaction consists of proving we are human beings.

At the dawn of the computer age – 1950 – the British mathematician Alan Turing looked at the possibility of artificial intelligence.  Can computers think?  And how do we know whether we are talking to a computer, or to a human being?  He devised various ways to test this, known today as a ‘Turing test’.  The Turing test we most commonly encounter today is the CAPTCHA – an acronym for ‘completely automated public Turing test to tell computers and humans apart’.

So far, humans are holding the line.  We are better at face recognition – though I confess I find it fascinating and occasionally rather scary when my iMac crunches through a newly imported set of photos and ‘recognizes’ people in them, even when some of these photos are of painted portraits!

Humans are better at reading poor quality printed text, too, and are certainly better at dealing with handwriting.  Yet there’s no doubt that computers are transforming the way we research history these days.  Searchable digital sources mean that it’s possible to go to original newspapers to find the most unlikely references, and we have access to huge databases of material as a result of digitization.

This has led to an interesting collaboration between artificial and human intelligence, through Captcha, described in some detail in a recent article in the New York Times.

‘Digitization is normally a three-stage process: create a photographic image of the text, also known as a bitmap; encode the text in a compact, easily handled and searchable form using optical character recognition software, commonly called O.C.R.; and, finally, correct the mistakes.’

Only human beings can identify and correct mistakes.  There are several ways of doing this.  You can pay for proofreaders, or recruit volunteers to do it, but Google is now mainly using the co-opted labour of individuals like us, each reading and typing out a few swirly letters on the screen.  There are about 200 million such transactions each day.  In this way, we are all helping to make printed works accessible and searchable.  And we didn’t even know it!


The National Library of Australia, on the other hand, has relied very successfully on volunteers to correct its digitized newspaper collection.

The collection is freely available to everyone.  However anyone can set up a username and password, and login as a recognized reader.  Once logged in, I can correct the text, using my superior human intelligence to read the digitized image in ways that the computer, as yet (thank God!) can’t.  It’s a weirdly addictive process, and thousands of volunteers are at work, gradually improving the text.  Like a Wiki, it’s a cooperative project, where people give their time freely, without any reward other than the satisfaction of working for the general good.

However the future probably lies with the software approach.  Re-Captcha was designed at Carnegie Mellon University by a team led by Luis von Ahn.  It doesn’t work for everything.  According to Dr von Ahn, ‘nobody reads handwriting anymore’.

Sigh.  Reading handwriting is an essential skill for any historian, and no, it’s not easy, especially as we move further and further back into the past.  But reading handwriting brings one so much closer to the thought processes of the original writer.  I’ve read letters by the same person, through the decades, moving from quill to steel nib, their writing eventually getting shaky with age.  It adds a richness of texture that is missing in the digitized version, grateful though I am for that easily accessible version.

But it will be a while before computers can cope with crossed letters like this one: