Converting HTML to PDF with a JSON web service (Mojolicious + wkhtmltopdf)

| 4 Comments

I coded a simple JSON web service, which converts any HTML input to PDF (using the excellent wkthmltopdf software). You can supply HTML code or an URI where to get it.

Using the powerful Mojolicious web framework (Mojolicious::Lite is enough for this application, actually) framework it's just roughly 100 lines of code for the whole thing:

#!/usr/bin/perl

# Pulls in strict and unicode_strings, but this
# program doesn't require perl 5.14 to work
use v5.14;

use Mojolicious::Lite;
use Path::Class;
use File::Temp;
use Mojo::UserAgent;
use MIME::Base64;

my $config = {
    wkh     => '/usr/local/bin/wkhtmltopdf',
    tmpdir  => '/tmp',
    auth    => 'maitai',
};

post '/' => sub {
    my $self = shift;
    
    my $args = $self->req->json;

    # Handle obvious error cases
    return $self->mkerror('invalid-JSON-content')
        if !defined $args;
    return $self->mkerror('invalid-auth-information')
        if $args->{auth} ne $config->{auth};
        
    # Clients can pass us HTML content or an URI where to fetch if
    if ( !defined $args->{html} ) {
        return $self->mkerror('no-html-nor-uri')
            if !$args->{uri};
        
        # Fetch the page
        my $ua = Mojo::UserAgent->new();
        my $tx = $ua->get($args->{uri});
        my $res = $tx->success;
        if (!$res) {
            my ($msg, $code) = $tx->error;
            return $self->mkerror("fetch-page-error: $msg");
        }
        $args->{html} = $res->body;
    }

    my $html_file = $self->make_html_file( $args->{html} );

    my $pdf_fn = $html_file->filename;
    $pdf_fn =~ s/\.html/.pdf/xms;

    # Build the command line
    my $hcmd = $self->build_wkh_command($args);
    $hcmd .= ' ' . $html_file->filename . " $pdf_fn";

    # Create the PDF file
    my $output = `$hcmd`; # TODO: error handling

    # Read the output and return it
    my $pdf_file = Path::Class::File->new($pdf_fn);
    my $pdf = $pdf_file->slurp();

    # Unlink the PDF file
    $pdf_file->remove();

    return $self->render_json({
        status  => 'ok',
        pdf     => encode_base64($pdf),
    });
};

helper build_wkh_command => sub {
    my ($self, $args) = @_;

    # Usual page size A4, but labels would need a smaller one so we leave it
    my $page_size = '--page-size ' . ($args->{page_size} || 'a4');

    # Custom page size will override the previous
    if ( defined $args->{page_width} && defined $args->{page_height} ) {
        $page_size = "--page-width $args->{page_width}"
            . " --page-height $args->{page_height} ";
    }

    # Build htmldoc command line
    my $hcmd = $config->{wkh} ." --encoding \"utf-8\" $page_size ";
    $hcmd .= "--margin-top $args->{top_margin}mm "
        if defined $args->{top_margin};
    $hcmd .= "--margin-left $args->{left_margin}mm "
        if defined $args->{left_margin};
    $hcmd .= "--margin-bottom $args->{bottom_margin}mm "
        if defined $args->{bottom_margin};
    $hcmd .= "--margin-right $args->{right_margin}mm "
        if defined $args->{right_margin};
    $hcmd .= "--orientation $args->{orientation} "
        if defined $args->{orientation};

    return $hcmd;
};

helper make_html_file => sub {
    my ($self, $html) = @_;

    my $htmlf = File::Temp->new(
        DIR     => $config->{tmpdir},
        SUFFIX  => '.html',
        UNLINK  => 1,
    );
    binmode $htmlf, ':encoding(UTF-8)';
    print $htmlf $html;

    return $htmlf; 
};

helper mkerror => sub {
    my ($self, $error) = @_;
    
    return $self->render_json({
        status  => 'error',
        error   => $error,
    });
};

app->start;

Once you have this setup (via CGI, FastCGI, morbo, starman or whatever you like best), you just need to POST you data via JSON, and it could be something like:

{
    "auth"    : "maitai",
    "html"    : "<html><head><meta charset=UTF-8></head><body>Ciao!</body></html>"
}

or:

{
    "auth"    : "maitai",
    "uri"     : "http://www.skm.to/"
}

and you get a JSON response such as this:

{
    "status"  : "ok",
    "pdf"     : "pdf_data_base64_encoded"
}

The PDF data is base64 encoded in order to be safely transferred without risking corruption because of character set encoding/decoding.

Some notes:

  • There are some configuration options (page size, ...): take a look at the source code.
  • Not all wkhtmltopdf features are implemented, but it's very easy to extend the software.
  • Authentication system is just an example, it should be way more robust.
  • Arguments should really be checked for safety, otherwise security issues could arise (well, provided the authentication system is broken before).
  • Error handling should be improved.

Why did I do this? Basically, I have some web apps hosted under a managed FreeBSD server, where compiling wkhtmltopdf does not work very well (and there are a lot of pre-requisites, anyway). This way I can "outsource" PDF generation easily.

4 Comments

The thing that pops out at me is that you are returning the bytes of the PDF file directly in a JSON string in the response. I can see problems coming from this, as a JSON string is expected to be UTF-8 encoded. For binary transfer, you should probably encode the PDF bytes as something like base64.

Hello!

You're right. It actually worked like a charm anyway, but it's probably unsafe.

Fixed!

Thanks,
Michele.

Hi,
This script doesn't want to work, for me... :-(

My setup is:
ubuntu 11.10
wkhtmltopdf-0.11.0_rc1-static-i386

I see it does produce a correct PDF (on /tmp), but I can't get it on my web page. This is my page:






$(document).ready(function() {
$.post(
"http://linrs:3000",
JSON.stringify(
{
"auth": "maitai",
"html": "Ciao!",
//"uri": "http://www.portovenere.biz",
}
),
function(data, textStatus) {
console.log("status: " + textStatus);
console.log("data: " + data);
}
);
});




In the firebug console I get:

POST http://localhost:3000/ 200 OK 883ms
,

always with an empty RESPONSE...

What do I miss?

Thanks in advance for your attention!

Hello marcosolari!

> This script doesn't want to work, for me... :-(

The problem is that you calling it cross-site, so XMLHttpRequest won't work (it's cross-site even if you change only the port, ie you have the web page on http://linrs and the script on http://linrs:3000).

To solve this you can either have a URL of the app serve your page, or use the "script" dataType in $.ajax and then handle it accordingly.

Cheers,
M.

Leave a comment

About this Entry

This page contains a single entry by Michele Beltrame published on December 22, 2011 9:57 AM.

Nasce Udine Programmers was the previous entry in this blog.

Un perlista al Django Day (Brescia, 21 Aprile 2012) is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Categories

Pages

OpenID accepted here Learn more about OpenID
Powered by Movable Type 5.14-en