I coded a simple JSON web service, which converts any HTML input to PDF (using the excellent wkthmltopdf software). You can supply HTML code or an URI where to get it.

Using the powerful Mojolicious web framework (Mojolicious::Lite is enough for this application, actually) framework it’s just roughly 100 lines of code for the whole thing:

#!/usr/bin/perl

# Pulls in strict and unicode_strings, but this
# program doesn't require perl 5.14 to work
use v5.14;

use Mojolicious::Lite;
use Path::Class;
use File::Temp;
use Mojo::UserAgent;
use MIME::Base64;

my $config = {
    wkh     => '/usr/local/bin/wkhtmltopdf',
    tmpdir  => '/tmp',
    auth    => 'maitai',
};

post '/' => sub {
    my $self = shift;

    my $args = $self->req->json;

    # Handle obvious error cases
    return $self->mkerror('invalid-JSON-content')
        if !defined $args;
    return $self->mkerror('invalid-auth-information')
        if $args->{auth} ne $config->{auth};

    # Clients can pass us HTML content or an URI where to fetch if
    if ( !defined $args->{html} ) {
        return $self->mkerror('no-html-nor-uri')
            if !$args->{uri};

        # Fetch the page
        my $ua = Mojo::UserAgent->new();
        my $tx = $ua->get($args->{uri});
        my $res = $tx->success;
        if (!$res) {
            my ($msg, $code) = $tx->error;
            return $self->mkerror("fetch-page-error: $msg");
        }
        $args->{html} = $res->body;
    }

    my $html_file = $self->make_html_file( $args->{html} );

    my $pdf_fn = $html_file->filename;
    $pdf_fn =~ s/\.html/.pdf/xms;

    # Build the command line
    my $hcmd = $self->build_wkh_command($args);
    $hcmd .= ' ' . $html_file->filename . " $pdf_fn";

    # Create the PDF file
    my $output = `$hcmd`; # TODO: error handling

    # Read the output and return it
    my $pdf_file = Path::Class::File->new($pdf_fn);
    my $pdf = $pdf_file->slurp();

    # Unlink the PDF file
    $pdf_file->remove();

    return $self->render_json({
        status  => 'ok',
        pdf     => encode_base64($pdf),
    });
};

helper build_wkh_command => sub {
    my ($self, $args) = @_;

    # Usual page size A4, but labels would need a smaller one so we leave it
    my $page_size = '--page-size ' . ($args->{page_size} || 'a4');

    # Custom page size will override the previous
    if ( defined $args->{page_width} && defined $args->{page_height} ) {
        $page_size = "--page-width $args->{page_width}"
            . " --page-height $args->{page_height} ";
    }

    # Build htmldoc command line
    my $hcmd = $config->{wkh} ." --encoding \"utf-8\" $page_size ";
    $hcmd .= "--margin-top $args->{top_margin}mm "
        if defined $args->{top_margin};
    $hcmd .= "--margin-left $args->{left_margin}mm "
        if defined $args->{left_margin};
    $hcmd .= "--margin-bottom $args->{bottom_margin}mm "
        if defined $args->{bottom_margin};
    $hcmd .= "--margin-right $args->{right_margin}mm "
        if defined $args->{right_margin};
    $hcmd .= "--orientation $args->{orientation} "
        if defined $args->{orientation};

    return $hcmd;
};

helper make_html_file => sub {
    my ($self, $html) = @_;

    my $htmlf = File::Temp->new(
        DIR     => $config->{tmpdir},
        SUFFIX  => '.html',
        UNLINK  => 1,
    );
    binmode $htmlf, ':encoding(UTF-8)';
    print $htmlf $html;

    return $htmlf;
};

helper mkerror => sub {
    my ($self, $error) = @_;

    return $self->render_json({
        status  => 'error',
        error   => $error,
    });
};

app->start;

Once you have this setup (via CGI, FastCGI, morbo, starman or whatever you like best), you just need to POST you data via JSON, and it could be something like:

{
    "auth"    : "maitai",
    "html"    : "<html><head><meta charset=UTF-8></head><body>Ciao!</body></html>"
}

or:

{
    "auth"    : "maitai",
    "uri"     : "http://www.skm.to/"
}

and you get a JSON response such as this:

{
    "status"  : "ok",
    "pdf"     : "pdf_data_base64_encoded"
}

The PDF data is base64 encoded in order to be safely transferred without risking corruption because of character set encoding/decoding.

Some notes:

  • There are some configuration options (page size, …): take a look at the source code.
  • Not all wkhtmltopdf features are implemented, but it’s very easy to extend the software.
  • Authentication system is just an example, it should be way more robust.
  • Arguments should really be checked for safety, otherwise security issues could arise (well, provided the authentication system is broken before).
  • Error handling should be improved.

Why did I do this? Basically, I have some web apps hosted under a managed FreeBSD server, where compiling wkhtmltopdf does not work very well (and there are a lot of pre-requisites, anyway). This way I can “outsource” PDF generation easily.