Using reCAPTCHA with Perl

| 1 Comment

You all know: spam is a PITA. It just sucks because if you have a blog users won't like to read it any longer with all that junk. It also hurts if you collect user registrations or other data, as you'll need to filter the collected data and it always changes how software should understand what is spam and what isn't (as spam changes).

CAPTCHAs" have been somewhat a solution for some years. Requiring you to type a word which is in a picture, they should be able to tell if you're a human or a piece of software. First CAPTCHAs were simple and easy to ready, but soon OCR software was able to read them too; so they became more and more complex, featuring noise, distorted words, fancy colours, etc. The art of designing a CAPTCHA is fairly simple: it should be unreadable by a machine and as simple as possible to read for a human, as you surely don't want anybody to be unable to use your web site no matter how stupid they might be because they are unable to figure out a few letters on an image.

Since it's a bit of a pain to always have to change and improve your CAPTCHA images in order to keep spammers outside, I recently began exploring reCAPTCHA.

This article is about reCAPTCHA and is usage with Perl both directly with Captcha::reCAPTCHA and with HTML::FormFu.

reCAPTCHA is a web service, and it has some really interesting advantages: it's easier to setup (you don't need to deal with generating the letter, the image, storing them in the session or in some other place etc...); it's automatically improved by the admins of the servers if it gets broken; it features an audio challenge if a user is visually impaired; there's an automatic IP-based banning system; it's easy enough to read for humans and difficult enough to parse for OCR software. Oh, and did I mention it? It's free as in free beer. Plus, there's another big feature, which has nothing to do with spam: by using it, you help to digitize books which are not readable by OCR. The mechanism for doing this is simple: the two words you see on the CAPTCHA (see image above) both come from scanned text, but while one is already recognized by the system the other isn't so one word is for the user to resolve or he won't go on, and the other is for the user to resolve to help digitizing books: of course, the user doesn't know which words is already known and which isn't. Genius, uh?

Now to the whole point of this article... how can this service be easily used with our favourite language, Perl? Thanks to Andy Armstong's Captcha::reCAPTCHA CPAN module, it's trivial. Once you registered your domain on the web site and got your public and private keys, you're ready to set the thing up.

On the action which creates your web form you to write nothing more than something like:


    use Captcha::reCAPTCHA;

    my $rc = Captcha::reCAPTCHA->new();

    print 'HTML form...';
    print $c->get_html('PUBLIC_KEY' );
    print 'End of HTML form...';

Well, most likely you won't make your web pages in such a rude way, but rather use something like Template Toolkit. Anyhow, you got the concept: $rc->get_html() returns you the HTML code which you can put into your page where you want the reCAPTCHA widget to appear.

The verification, to be performed on the action which receives your form (if different from the one which created it), is just a bit more complex:


    use CGI::Simple;
    use Captcha::reCAPTCHA;

    my $q = CGI::Simple->new();
    my $rc = Captcha::reCAPTCHA->new();

    # Verify submission
    my $result = $rc->check_answer(
        'PRIVATE_KEY', $q->remote_addr,
        $q->param('recaptcha_challenge_field'),
        $q->param('recaptcha_response_field'),
    );

    if ( $result->{is_valid} ) {
        # Process the form
    } else {
        # Redisplay the form, with reCAPTCHA error
    }

This is damn easy, but it still could prove to be a bit inelegant to hack it inside an HTML::FormFu form, which it's an highly recommended form generation and validation framework maintained by Carl Franks (most Catalyst users manage their forms with FormFu).

Luckily, FormFu provides an element dedicated to reCAPTCHA, which wraps the calls to Andy Armstrong's module.


    # Yes folks, I like to write elements in Perl instead of YAML ;-)
    $fieldset->{elements} = [
       # ...elements ...
       , {
            type             => 'reCAPTCHA',
            name           => 'recaptcha',
            public_key   => $self->{recaptcha_public_key},
            private_key  => $self->{recaptcha_private_key},
        },
        # ...elements...
    ];

The above code does all the magic: it add the reCAPTCHA widget to your form with a constraint attached to it which makes sure the user typed the correct words in. The handling of the field on the submitted form is transparent to the programmer as you just leave the standard:


    if ( $form->submitted_and_valid ) {
        # Process the form
    }

untouched, as reCAPTCHA errors are treated just like another invalid input.

1 Comment

reCAPTCHA was also our first choice, the implementation was
trivial (we use Perl/Mason). Adding blackbox tests (test driven
development...) required tweaks of course ;)

Unlike other services you have the problem of translation: we use
it for an Italian site for example and all help texts are english
and cannot be replaced easily. No user ever complained though.

marc tobias

Leave a comment

About this Entry

This page contains a single entry by Michele Beltrame published on September 22, 2008 10:39 AM.

Programming Amazon Web Services was the previous entry in this blog.

Mastering Dojo is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Categories

Pages

OpenID accepted here Learn more about OpenID
Powered by Movable Type 5.14-en