One hosting service I have contains several domains. In order to understand which of these web sites was the most trafficked, I needed to analyze the logs a bit. There are sever Apache log analysis solutions out there, but frankly I just needed some basic information, and didn’t want a software bloated with a lot of feature 99% of which I didn’t need.

Perl came to the rescue, and in some 15 minutes I wrote a working script.

What I needed to know

Basically, I wanted to know the number of hits and the bytes transferred for each of the web sites, in order to make a ranking.

The Apache log is something that is a standard now, and is something like this: - - [16/Dec/2021:00:01:16 -0500] "GET /assets/js/main.js?v=5.6.0 HTTP/1.1" 200 6696 "" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:95.0) Gecko/20100101 Firefox/95.0"

Being a shared Apache, my log also has an extra field at the beginning, with the various domain names: - - [16/Dec/2021:00:01:16 -0500] "GET /assets/js/main.js?v=5.6.0 HTTP/1.1" 200 6696 "" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:95.0) Gecko/20100101 Firefox/95.0"

The script

The script is meant to be used feeding a log on its standard input, so for instance.

# tail -1000 access_log |

Here goes the script, with some comments where needed.

use Path::Class qw/file/;
use Number::Bytes::Human qw/format_bytes/;
use Text::Table;
use Arthas::Defaults::520;

my $sites;
my $totbytes = 0;

while (my $row = <STDIN>) {
    chomp $row;

    # Regex which parses a line of the Apache log.
    my (
       $sitename, $clientip,  $rfc1413, $username, 
       $when,     $reqstring, $status,  $bytesout,
       $referer,  $useragent
    ) = $row =~ /^(\S+) (\S+) (\S+) (\S+) \[(.+)\] \"(.+)\" (\S+) (\S+) \"(.*)\" \"(.*)\"/o;

    $sites->{$sitename} //= {
       hits    => 0,
       okhits  => 0,
       bytes   => 0,

    # Don't care about byte count on non-OK hits. Maybe we should add that as well, but... no.
    next if $status != 200;

    $sites->{$sitename}->{bytesout} += $bytesout;
    $totbytes += $bytesout;

# Create a nicely-formatted table
my $tb = Text::Table->new('Site','OK Hits','Transfer Out');

# Only display the first 10 entries, sorted by number of 200 OK hits
my $ii = 0;
for my $sk(
    sort { $sites->{$b}->{okhits} <=> $sites->{$a}->{okhits} } keys %$sites
) {
    my $sv = $sites->{$sk};
    my $tout = format_bytes($sv->{bytesout});
    $tb->add($sk, $sv->{okhits}, $tout);
    last if $ii == 9;

say $tb;

# We also want the totals, why not?
my $totbytes_h = format_bytes($totbytes);
say "TOTAL TRANSFER: $totbytes_h";

We use a couple of nice Perl modules here, which are Number::Bytes::Human, used to automatically convert bytes in something easier to read, and especially Text::Table, which produces a great nicely-formatted table on the console.

Here’s the output:

Site                             OK Hits Transfer Out                  128360  2.6G                108854  3.5G                98164   4.2G               59108   6.6G              35417   560M               29424   520M                20547   623M                 12633   403M                       11330   292M                    10030   1.4M   

Text::Table also supports borders for cells and other niceties, which I didn’t use.

This quick and dirty solution is very easy to adapt and improve, for example to show the IPs which hit the server most, grouped by web site.