Mittwoch, 13. November 2019

Improve md5 calculation -- an unexpected journey

In a project it was necessary to calculate the md5 checksums of files as fast as possible. Under Perl5 there is the module Digest::MD5.

The suggested way to use this module is not the fastest. The reason is that the method addfile() does not use the buffer optimally.

In the following I have tested all possible variants: the suggested addfile approach, the buffer optimized, the File::Map based and the system call to 'md5sum' variant:


#!/usr/bin/env perl 
# bench to check how fast is memory mapped access

use strict;
use warnings;
use utf8;
use Benchmark qw(:all) ;
use File::Map qw( map_file);
use Digest::MD5;
use File::Slurp;

sub md5offile_mapped {
    my $fn = shift;
    map_file my $data, $fn, '<';
    my $md5obj = Digest::MD5->new;
    $md5obj->add($data);
    return $md5obj->hexdigest;
}

sub md5offile_orig {
    my $fn = shift;
    my $fh;
    open($fh, '<', $fn) || die ("Can't open '$fn', $!");
    binmode($fh);
    my ($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size,
        $atime,$mtime,$ctime,$blksize,$blocks)
    = stat $fh;
    my $buffer;
    my $md5obj = Digest::MD5->new;
    while (read($fh, $buffer, $blksize)) {
        $md5obj->add($buffer);
    }
    close $fh || die ("could not close file '$fn', $!");
    return $md5obj->hexdigest;
}

sub md5offile_addfile {
    my $fn = shift;
    my $fh;
    open($fh, '<', $fn) || die ("Can't open '$fn', $!");
    binmode($fh);
    my $md5obj = Digest::MD5->new;
    $md5obj->addfile( $fh );
    close $fh || die ("could not close file '$fn', $!");
    return $md5obj->hexdigest;
}

sub md5offile_md5file {
    my $fn = shift;
    return system("md5sum $fn >/dev/null 2>&1");
}


my $file = shift @ARGV;
read_file($file); # to warm cache

timethese(500, {
        'memory_mapped' => sub{ md5offile_mapped( $file ); },
        'original'      => sub{ md5offile_orig( $file ); },
        'add_file'      => sub{ md5offile_addfile( $file ); },
        'system'        => sub{ md5offile_md5file( $file ); },
;}
);




The variant "memory-mapped" is about 10% faster than the others. Here a result for checksumming a DNG-file with size of 13MB on a NVME device:

Benchmark: timing 500 iterations of add_file, memory_mapped, original, system...
     add_file: 12 wallclock secs (10.68 usr +  1.09 sys = 11.77 CPU) @ 42.48/s (n=500)
memory_mapped: 10 wallclock secs (10.43 usr +  0.23 sys = 10.66 CPU) @ 46.90/s (n=500)
     original: 12 wallclock secs (10.98 usr +  0.95 sys = 11.93 CPU) @ 41.91/s (n=500)
       system: 16 wallclock secs ( 0.13 usr 0.27 sys + 13.97 cusr  1.34 csys = 15.71 CPU) @ 31.83/s (n=500)


Unfortunately there is a problem with large files. The Digest::MD5 probably calculates the values wrong for scalars >1GB (see https://rt.cpan.org/Public/Bug/Display.html?id=123185). In this case, the memory mapped approach should not be used.

Dienstag, 14. Mai 2019

Asciiart via Perl komprimieren

Als kleine Fingerübung habe ich ff. Perl-Script geschrieben, welches zur "Komprimierung" von AsciiArt verwendet werden kann.

Sei die Vorlage:

            /| |\
        ,  ( `-´ )
       /(  _\   /____      []
      |  >(__|9|_____)     ||
       )(    | |  ___  ___ ||
      -==-   | | / _ \/ _ \||
             |8|| (_) |(_) ||
            /   \\___/\___(%%)
           ( ,-. )        |''|
            \| |/         ||||
                          `--'


So erzeugt das Script folgenden Perl-Code:

#!/usr/bin/perl
use v5.10;
say ;
say " "x12,"/| |\\";
say " "x8,",  ( `-´ )";
say " "x7,"/(  _\\   /","_"x4," "x6,"[]";
say " "x6,"|  >(__|9|","_"x5,")"," "x5,"||";
say " "x7,")("," "x4,"| |  ___  ___ ||";
say " "x6,"-==-   | | / _ \\/ _ \\||";
say " "x13,"|8|| (_) |(_) ||";
say " "x12,"/   \\\\___/\\___(%%)";
say " "x11,"( ,-. )"," "x8,"|''|";
say " "x12,"\\| |/"," "x9,"|"x4;
say " "x26,"`--'";
say ;
say ;
say ;1;


Alternativ kann der Fuzzy-Mode eingeschaltet werden, der ff. Code erzeugt:

#!/usr/bin/perl
use v5.10;
used in fuzzy mode
say "\n"," "x12,"/| |\\\n"," "x8,","," "x2,"( `-´ )\n"," "x7,"/("," "x2,"_\\"," "x3,"/","_"x4," "x6,"[]\n"," "x6,"|"," "x2,">(","_"x2,"|9|","_"x5,")"," "x5,"|"x2,"\n"," "x7,")("," "x4,"| |"," "x2,"_"x3," "x2,"_"x3," ","|"x2,"\n"," "x6,"-","="x2,"-"," "x3,"| | / _ \\/ _ \\","|"x2,"\n"," "x13,"|8","|"x2," (_) |(_) ","|"x2,"\n"," "x12,"/"," "x3,"\\"x2,"_"x3,"/\\","_"x3,"(","%"x2,")\n"," "x11,"( ,-. )"," "x8,"|","'"x2,"|\n"," "x12,"\\| |/"," "x9,"|"x4,"\n"," "x26,"`","-"x2,"'\n","\n","\n",;1;


Hier das Script:
#!/usr/bin/env perl 
#===============================================================================
#
#         FILE: compress_asciiart.pl
#
#        USAGE: ./compress_asciiart.pl  
#
#  DESCRIPTION: compresses asciiart (or any text) to small perlcode using RLE
#
#      OPTIONS: -f (enables fuzzing)
# REQUIREMENTS: ---
#         BUGS: ---
#        NOTES: ---
#       AUTHOR: Andreas Romeyke
# ORGANIZATION: 
#      VERSION: 1.0
#      CREATED: 08.04.2019
#     REVISION: ---
#===============================================================================

use strict;
use warnings;
use utf8;

print "#!/usr/bin/perl\n";
print "use v5.10;\n";
my $fuzzy=0;
my $runlength=3;
if (defined $ARGV[0] && $ARGV[0] eq "-f") {
    print STDERR "used in fuzzy mode\n";
    $runlength=1;
    $fuzzy=1;
}

sub subs {
    my $s1 = shift;
    my $s2 = shift;
    my $l = length($s2)+1;
    $s1=~s/\\/\\\\/g;
    $s1=~s/"/\\"/g;
    $s2=~s/\\/\\\\/g;
    $s2=~s/"/\\"/g;
    my $v;
    if ($l > $runlength) {
        return 
        qq(").
        qq($s1).
        "\"x$l,";
    } else {
        return
        qq(").
        qq($s1$s2).
        qq(",)
        ;
    }
}
print "say ";
foreach my $line () {
    chomp $line;
    $line=~s/(.)(\1*)/my $l=length($2);subs($1, $2);/eg;
    if ($fuzzy) {
        $line =~ s/$/\"\\n\",/;
    } else {
        $line =~ s/$/;\nsay /;
    }

    while ($line=~s/"([^"]+)","([^"]+)",/"$1$2",/g) {};
    $line=~s/,;\n/;\n/g;
    print $line;
}
print ";1;";
1;




Wer noch weitere Vorschläge hat, wie man noch eleganteren Code erzeugen kann, immer her damit :)

Montag, 28. Mai 2018

Vim, Ersetzen von "foo" durch Dateinamen der zu bearbeitenden Datei

Manchmal, zB. bei der Erstellung von C++-Klassen möchte man bestimmte Textstellen durch Texte ersetzen, die aus dem Dateinamen der Datei gewonnen werden.

Will man zB. in der Datei "datei.txt" alle Vorkommen von "foo" durch "datei" ersetzen, gelingt dies in Vim im Kommandomodus so:

%s/foo/\=expand('%:r:t')/

Mit "\=" weist man Vim an, nicht direkt den Ersetzungstext zu verwenden, sondern ein Vim-Kommando auszuführen.

"expand" liest hier den Dateinamen aus "%" aus, wendet Prefix- und Suffix-Magie (":r:t") an und gibt den Ersetzungsstring zurück.