PERL







Need Perl ?

http://www.perl.com/pub



Perl looks like C and UNIX shell. Somebody who knows well C and UNIX shell will have no problem at all to learn Perl. Perl is a specialized tool to manipulate text (files, strings).

It is hard to compare Perl and C because Perl is "interpreted" => A "program" written in Perl is very slow (compared to the same program written in C).

But, if you don't need a very fast program to manipulate text, you should write it in perl. In Perl there is no memory allocation (C malloc) and all the useful "functions" are already there.

Perl is very well adapted to write CGI scripts:
  • CGI scripts mainly manipulate text.
  • Most CGI scripts don't need to run fast (C is not necessary).
  • A CGI "program" is easier and faster to write that a C program.
  • All the functions you need for CGI are already (well) written.
  • A Perl script is portable.
And, because CGI scripts can call "extern programs", it is possible to write a C program for a specific task that requires high performances.

In C it is possible to find (or to write) libraries specifically written for CGI development. But because C is tricky (it's easy to forget something :), C development takes much longer (to write and to test).

Perl can also be used to replace the UNIX utilities sed and awk.

Good links:





Printing



print "\nline 1\n\n";


line 1

print <<End_Of_String;
line 1 - item 1
line2 - item 2
line3 - item 3
End_Of_String

line 1 - item 1
line2 - item 2
line3 - item 3

print "\n\Special charactere - \"like in C\"\n\n";

Special charactere - "like in C"

print qq!The funny "qq" method ... not bad\n\n!;

The funny "qq" method ... not bad

$var=2;
$var_name="var";

printf ("\n%s = %d\n", $var_name, $var);
print ("String formatting ... like in C ;)\n\n");

var = 2
String formatting ... like in C ;)

$var=2;
$var_name="var";
$line = sprintf ("\n%s = %d\n\n", $var_name, $var);
printf ("%s", $line);
print ("\n\"sprintf()\" like in C ... ;)\n\n");

var = 2

"sprintf()" like in C ... ;)

$line = "$var_name = $var";
print "\n==> $line (pretty handy)\n";

==> var = 2 (pretty handy)

$line = '$var_name = $var';
print "\n$line ... like the UNIX shell\n\n";

$var_name = $var ... like the UNIX shell


Numbers



$i = 1;
$i++;
$i=$i+2;
print "i=$i ... C and UNIX shell mix\n\n";

$j=3;
printf ("total = %d\n\n", $i+$j);

i=4 ... C and UNIX shell mix

total = 7

$val = 3;
if ($val == 3)
{
  print "val is equal to 3\n";
}

val is equal to 3


Arrays


Number of elemensts


@tab = ("item1", "item2", "item3");
print "first element = $tab[0]\n";
$n = int @tab;
print "number of item: $n\n";

Index of the last element


$n = $#tab;
print "last element is index: $n";


first element = item1
number of item: 3
last element is index: 2

@tab = ("item1", "item2", "item3");
$tab[3] = "New_item";
print "New array: @tab\n\n";
$tab[0] = "item0";
print "New array: @tab\n\n";

New array: item1 item2 item3 New_item

New array: item0 item2 item3 New_item

@tab = (item0, item2, item3, New_item);
splice (@tab, 2, 2, "ITEM1", "ITEM2");
print "New array: @tab\n\n";

splice (@tab, 2, 2);
print "New array: @tab\n\n";

New array: item0 item2 ITEM1 ITEM2

New array: item0 item2

Note: (item0, item2, item3, New_item)
                       ^ ^
                      2-1 2-2

@tab = (item0, item2);
push (@tab, "item3");
print "New array: @tab\n\n";

New array: item0 item2 item3

@tab = (item0, item2, item3);
$item = pop (@tab);
print "item = $item\n";
print "New array: @tab\n\n";

item = item3
New array: item0 item2

@tab = (item0, item2);
unshift (@tab, "ITEM0");
print "New array: @tab\n\n";

New array: ITEM0 item0 item2

@tab = (ITEM0, item0, item2);
$item = shift (@tab);
print "item = $item\n";
print "New array: @tab\n\n";

item = ITEM0
New array: item0 item2
Array of arrays of arrays ...


use strict;

###########################
## 2 by 2 array ##
###########################

my @tab = (
            [ ('a1', 'a2') ],
            [ ('b1', 'b2') ]
          );

my $i;
my $j;
my $k;

print "2*2 array\n\n";

foreach $i (@tab) { foreach $j (@{$i}) { print $j . "\n"; } }

print "\n\n---\n\n";

for ($i=0; $i<int @tab; $i++)
{
  print "$i\n";

  for ($j=0; $j<int @{$tab[$i]}; $j++) { print "\t$j -> " . $tab[$i]->[$j] . "\n"; }
}

###########################
## 2 by 3 array ##
###########################

@tab = (
         [ ( [ ('a1', 'a2') ], [ ('b1', 'b2') ] ) ],
         [ ( [ ('c1', 'c2') ], [ ('d1', 'd2') ] ) ]
       );

print "\n\n2*3 array\n\n";

foreach $i (@tab)
{
  foreach $j (@{$i})
  {
    foreach $k (@{$j}) { print $k . "\n"; }
  }
}

print "\n\n---\n\n";

for ($i=0; $i<int @tab; $i++)
{
  print "$i\n";

  for ($j=0; $j<int @{$tab[$i]}; $j++)
  {
    print "\t$j\n";
    for ($k=0; $k<int @{$tab[$i]->[$j]}; $k++)
    {
      print "\t\t" . $tab[$i]->[$j]->[$k] . "\n";
    }
  }
}


2*2 array

a1
a2
b1
b2


---

0
        0 -> a1
        1 -> a2
1
        0 -> b1
        1 -> b2


2*3 array

a1
a2
b1
b2
c1
c2
d1
d2


---

0
        0
                a1
                a2
        1
                b1
                b2
1
        0
                c1
                c2
        1
                d1
                d2
Note about the use of "foreach"


use strict;

my @array = ('A', 'B', 'C');
my $elem;

foreach $elem (@array) { $elem = "new $elem"; }

foreach $elem (@array) { print "$elem\n"; }


new A
new B
new C

Please note that the array's content changed.



associative arrays


Syntax: %Array_Name = (key0, value0, key1, value1, ..., keyN, valueN)
$Array_Name('keyI') = valueI
Note: "$" means "value".



%array = ('index1', 'toto', 'index2', 'titi', 'index3', 'fin');
printf ("index1 --> %s\n", $array{'index1'} );
printf ("index2 --> %s\n", $array{'index2'} );
$var = index3;
printf ("index3 --> %s\n\n", $array{$var} );

index1 --> toto
index2 --> titi
index3 --> fin
Getting all keys and values from a hash table


%array = ('index1', 'toto', 'index2', 'titi', 'index3', 'fin');
@All_Keys = keys (%array);
print "array keys = @All_Keys \n";
@All_Values = values (%array);
print "array values = @All_Values \n\n";


array keys = index1 index2 index3
array values = toto titi fin

%array = ('index1', 'toto', 'index2', 'titi', 'index3', 'fin');
$array{'index2'} = "new_value !";
@All_Values = values (%array);
print "array values = @All_Values \n\n";

array values = toto new_value ! fin

%array = (toto, "new_value !", fin);
delete ($array{'index1'});
@All_Values = values (%array);
print "array values = @All_Values \n\n";

array values = new_value ! fin
Testing if a key exists


%h = ( 'key1', 'value1', 'key2', 'value2' );

foreach $key (keys %h)
{
  print "\n$key -> $h{$key}";
}

if (exists($h{'key1'}))
{ print "\nkey key1 exists in h"; }
else
{ print "\nkey key1 does not exist in h"; }

if (exists($h{'key3'}))
{ print "\nkey key3 exists in h"; }
else
{ print "\nkey key3 does not exist in h"; }


key1 -> value1
key2 -> value2
key key1 exists in h
key key3 does not exist in h[dbeurive@sun-dev /tmp]$ more t2.pl
Testing if a key has a value


key1 -> value1
key2 -> value2
key3 -> Not defined![dbeurive@sun-dev /tmp]$ more t3.pl
%h = ( 'key1', 'value1', 'key2', 'value2', 'key3' );

foreach $key (keys %h)
{
  if (defined($h{$key})) { print "\n$key -> $h{$key}"; }
  else { print "\n$key -> Not defined!"; }
}


key1 -> value1
key2 -> value2
key3 -> Not defined!
A hash of arrays

Sometimes you need to reference arrays. To do that you reference the references to the arrays.


use strict;

#####################################################################
# This function is used to interface with the user. It takes a refe-
# -rence to a hash table. This hash table contains references to ar-
# -rays.
#
# o The key of the hash table is the variable name.
# o The first element of the array is the default value for the
# variable.
# o The second element of the array is a reference to the variable.
#####################################################################

sub print_choices
{
  my ($conf) = @_;
  my $key; ## Key used to walk through the hash table
  my $value_name; ## Name of the variable
  my $value_val; ## Default value for the variable
  my $value_ref; ## Reference to the variable
  my $answer; ## Answer read from the standart input

  foreach $key (keys %{$conf})
  {
    #You extract the data from the hash reference

    $value_name = $key;
    $value_val = $conf->{"$key"}->[0];
    $value_ref = $conf->{"$key"}->[1];

    while (1)
    {
      while (1)
      {
        print STDOUT "${value_name} = \"${value_val}\" Is is OK for you ? (yes/no) ";
        $answer = <STDIN>;
        chomp ($answer);

        if (($answer eq "yes") || ($answer eq "no")) { last; }
        print STDOUT "Invalid answer, valid choices are \"yes\" or \"no\"\n\n";
      }

      if ($answer eq "yes") {
                              $$value_ref = $value_val;
                              last;
                            }

      print STDOUT "You answered \"no\", please enter the correct value for ${value_name}: ";
      $answer = <STDIN>;
      chomp ($answer);
      $$value_ref = $answer;
      print STDOUT "\n";
      last;
    }
  }
}



my $hostname;
my $login;
my $password;

#You create a hash of array

my %h = (
          'Host name' => [ ('Default host' , \$hostname) ],
          'Login' => [ ('Default login' , \$login) ],
          'Password' => [ ('Default password' , \$password) ]
        );

# Or you could say:
#
# $h{'Host name'} = [ ('Default host' , \$hostname) ];
# $h{'Login'} = [ ('Default login' , \$login) ];
# $h{'Password'} = [ ('Default password' , \$password) ];

#You pass the reference to the hash as argument for the function print_choices()

print_choices (\%h);

print "hostname = $hostname\n";
print "login = $login\n";
print "password = $password\n";
print "\n";


Login = "Default login" Is is OK for you ? (yes/no) yes
Password = "Default password" Is is OK for you ? (yes/no) no
You answered "no", please enter the correct value for Password: toto

Host name = "Default host" Is is OK for you ? (yes/no) yes
hostname = Default host
login = Default login
password = toto
Hash ot hashes of hashes...
use strict;

###################################
## Hash of hashes ##
###################################

my %h = (
          'key1' => { ## Not bracket, this is a brace
                      'subkey1' => 'val11',
                      'subkey2' => 'val12'
                    },

          'key2' => {
                      'subkey1' => 'val21',
                      'subkey2' => 'val22'
                    }
        );

# Or you could say:
#
# $h{'key1'}{'subkey1'} = 'val11';
# $h{'key1'}{'subkey2'} = 'val12';
# $h{'key2'}{'subkey1'} = 'val21';
# $h{'key2'}{'subkey2'} = 'val22';

my $key1;
my $key2;
my $key3;

###################################
## Fisrt solution ##
###################################

print "Hash of hashes\n\n";

foreach $key1 (keys %h)
{
  print STDOUT "$key1\n";
  print STDOUT "\t" . $h{$key1}{'subkey1'} . "\n";
  print STDOUT "\t" . $h{$key1}{'subkey2'} . "\n";
}

###################################
## Second solution ##
###################################

print "\n\n---\n\n";

foreach $key1 (keys %h)
{
  print STDOUT "$key1\n";

  foreach $key2 (keys %{$h{$key1}}) { print STDOUT "\t" . $h{$key1}{$key2} . "\n"; }
}

###################################
## Hash of hashes of hashes ##
###################################

 %h = (
        'key1' => {
                    'subkey1' => { 'k1' => 'v11', 'k2' => 'v12' },
                    'subkey2' => { 'k1' => 'v21', 'k2' => 'v22' }
                  },

        'key2' => {
                    'subkey1' => { 'k1' => 'w11', 'k2' => 'w12' },
                    'subkey2' => { 'k1' => 'w21', 'k2' => 'w22' }
                  }
      );

###################################
## Fisrt solution ##
###################################

print "\n\nHash of hashes of hashes\n\n";

foreach $key1 (keys %h)
{
  print STDOUT "$key1\n";
  print STDOUT "\t" . $h{$key1}{'subkey1'}{'k1'} . "\n";
  print STDOUT "\t" . $h{$key1}{'subkey1'}{'k2'} . "\n";
  print STDOUT "\t" . $h{$key1}{'subkey2'}{'k1'} . "\n";
  print STDOUT "\t" . $h{$key1}{'subkey2'}{'k2'} . "\n";
}

print "\n\n---\n\n";

###################################
## Second solution ##
###################################

foreach $key1 (keys %h)
{
  print STDOUT "$key1\n";

  foreach $key2 (keys %{$h{$key1}})
  {
    print STDOUT "\t$key2\n";

    foreach $key3 (keys %{$h{$key1}{$key2}})
    { print STDOUT "\t\t$key3 -> " . $h{$key1}{$key2}{$key3} . "\n"; }

  }
}


Hash of hashes

key1
        val11
        val12
key2
        val21
        val22


---

key1
        val11
        val12
key2
        val21
        val22


Hash of hashes of hashes

key1
        v11
        v12
        v21
        v22
key2
        w11
        w12
        w21
        w22

---

key1
        subkey1
                k1 -> v11
                k2 -> v12
        subkey2
                k1 -> v21
                k2 -> v22
key2
        subkey1
                k1 -> w11
                k2 -> w12
        subkey2
                k1 -> w21
                k2 -> w22
Sorting hash by keys or values


use strict;

my %h = (
          'kz' => '2',
          'km' => '1',
          'ka' => '3'
        );

my $key;

print "Size of the hash: " . int (keys %h) . "\n\n";

print "Hash in (keys) alphabetical order:\n\n";

foreach $key (sort (keys %h))
{
  print "\t" . $key . ' -> ' . $h{$key} . "\n";
}

print "\nHash in (values) alphabetical order:\n\n";

foreach $key ( sort{$h{$a} cmp $h{$b}} (keys %h) )
{
  print "\t" . $key . ' -> ' . $h{$key} . "\n";
}



Size of the hash: 3

Hash in (keys) alphabetical order:

        ka -> 3
        km -> 1
        kz -> 2

Hash in (values) alphabetical order:

        km -> 1
        kz -> 2
        ka -> 3


Strings manipulation



$line="le dernier caractere (A) va disparaitre A";
$last=chop($line);
print "$line\n\n";
printf ("size of line: %d\n\n", length($line));

le dernier caractere (A) va disparaitre

size of line: 40

$line="le dernier caractere (A) va disparaitre";
$extr = substr ($line, 3, 7);
print "[$extr] begining byte_number\n\n";

[dernier] begining byte_number

Note: "le dernier caractere (A) va disparaitre"
      ^ ^ ^
      0 3 10=3+7

$val = "ex";
print "\nval = $val\n";

if ($val ne "")
{
  print "val is not empty\n";
}

if ($val eq "ex")
{
  print "OK, val is equal to ex\n";
}

val = ex
val is not empty
OK, val is equal to ex

$texte = "ceci&est&un&essais&de&texte";
@tab = split (/&/, $texte);
print "Now tab is equal to: @tab\n\n";

Now tab is equal to: ceci est un essais de texte

@tab = ("ceci", "es", "un", "essais", "de", "texte");
$union = join (" - ", @tab);
print "The tab union is: $union\n\n";

The tab union is: ceci - est - un - essais - de - texte
Converting a string to lowercase


$var = 'TITi';
$var =~ tr/[A-Z]/[a-z]/;
print "$var\n\n";
      

titi


Regular expressions


Testing for a pattern


$string = "ABCD_EFGH_IJKL_MNOP";

if ($string =~ m/CD_+EF/) { print "\n\"$string\" matches the reg expr"; }
else { print "\n\"$string\" does not matches the reg expr"; }


"ABCD_EFGH_IJKL_MNOP" matches the reg expr
Find patterns


use strict;

my $string = "2002-01-14 INFO [message]";
my $date;
my $level;

if ($string =~ m/([0-9]{4}-[0-9]{2}-[0-9]{2}) ([A-Z]+) \[[a-zA-Z0-9 ]+\]/)
{
   my $date = $1;
   my $level = $2;

   print STDOUT "Date = $date\n";
   print STDOUT "Level = $level\n";
}


Date = 2002-01-14
Level = INFO
Search


$string = "1:2:3ABCD_EFGH_IJKL_MNOP";


$resul = $string;
$resul =~ s/^([0-9]:[0-9]:[0-9]).*([0-9]{3}).*$/$1 and $2/;

print "\n\"$string\" => \"$resul\"\n";


"1:2:3ABCD_EFGH_IJKL_123MNOP" => "1:2:3 and 123"
Removing leading and ending spaces


$string = " toto ";
$string =~ s/^\s*([^\s]+)\s*/$1/;
print "\n\"$string\"\n";


"toto"
Replacing tags by values

Given a hash table "h{'TAG'}=value", the function replace_all() replaces all tags by its value into a given text.


sub split_string
{
  my ($string, $delimitor, $end) = @_;
  my @list = ();
  my @lines = ();
  my $tag = '';
  my $pre_tag = '';
  my $tag_index = 0;


  @{$end} = ();
  @lines = split (/\n/, $string);

  foreach $string (@lines)
  {
        ##
        ## See Greedy and ungreedy matches
        ##

    while ($string =~ m/(${delimitor}[a-zA-Z0-9]+?${delimitor})/)
    {
      $tag = $1; # don't try to use s/.../$1 !!!
      $tag_index = index($string, $tag);
      $pre_tag = substr($string, 0, $tag_index);

      push (@list, $pre_tag, $tag);
      push (@{$end}, '', '');
      $string = substr($string, $tag_index + length($tag));
    }

    push (@list, $string);
    push (@{$end}, "\n");
  }

  return @list;
}

# text: text that contains tags to replace.
# tags: Perl reference to a hash table that contains the values associated with each tag.
# delimitor: tag's delimitor (ex: '__' => tag is '__TAG__').

sub replace_all
{
  my ($text, $tags, $delimitor) = @_;

  my @tokens = ();
  my @end = ();
  my $key = '';
  my $value = '';
  my $i = 0;


  @tokens = split_string ($text, $delimitor, \@end);

  for ($i=0; $i<(int @tokens); $i++)
  {
    foreach $key (keys %{$tags})
    {
      $value = $tags->{$key};
      $key = $delimitor . $key . $delimitor;
      if ($tokens[$i] eq $key) { $tokens[$i] = $value; last; }
    }
  }

  $text = '';
  for ($i=0; $i<int @tokens; $i++) { $text .= $tokens[$i] . $end[$i]; }

  return $text;
}

# Example

# -------------------------------------------------
# Load text to replace
# -------------------------------------------------

my $text = '';
while (<STDIN>) { $text .= $_; }

# -------------------------------------------------
# Cut text into tokens
# -------------------------------------------------

if ($ARGV[0] eq 'split')
{
  my @tab = ();
  my @end = ();
  my $elem = '';
  my $i = 0;

  @tab = split_string($text, '__', \@end);

  print (STDOUT "\n\n\n------------------------\n\n\n");
  foreach $elem (@tab) { print (STDOUT "\nELEMENT $i: [$elem]\n"); $i++; }
  print (STDOUT "\n\n\n------------------------\n\n\n");
}

# -------------------------------------------------
# Replace tags by values
# -------------------------------------------------

if ($ARGV[0] eq 'replace')
{
  my $done = '';
  my $key = '';
  my %tags = (
               'TAG1' => '[ceci est le tag TAG1]',
               'SCRIPT' => '[ceci est le tag SCRIPT]'
             );

  $done = replace_all ($text, \%tags, '__');


  print (STDOUT "\n\n\n------------------------\n\n\n");
  print (STDOUT "\nORIGINAL TEST: [$text]\n");
  print (STDOUT "\n\n\n------------------------\n\n\n");
  print (STDOUT "\nTAGS / VALUE:\n\n");
  foreach $key (keys %tags) { print (STDOUT "\n $key, " . $tags{$key}); }
  print (STDOUT "\n\n\n");
  print (STDOUT "\n\n\n------------------------\n\n\n");
  print (STDOUT "\nREPLACED TEXT: [$done]\n\n\n");
  print (STDOUT "\n\n\n------------------------\n\n\n");
}

Given the sample text:
ceci est un exemple simple __TAG1__FAUX_TAG__ de remplacement.
Pour le tests numero __TAG1____TAG1____TAG2__ lancer le script __SCRIPT__.
Fin du test.

The test program prints:


$ perl test.pl split < mail.body



------------------------



ELEMENT 0: [ceci est un exemple simple ]

ELEMENT 1: [__TAG1__]

ELEMENT 2: [FAUX_TAG__ de remplacement.]

ELEMENT 3: [Pour le tests numero ]

ELEMENT 4: [__TAG1__]

ELEMENT 5: []

ELEMENT 6: [__TAG1__]

ELEMENT 7: []

ELEMENT 8: [__TAG2__]

ELEMENT 9: [ lancer le script ]

ELEMENT 10: [__SCRIPT__]

ELEMENT 11: [.]

ELEMENT 12: [Fin du test.]



------------------------


$ perl test.pl replace < mail.body



------------------------



ORIGINAL TEST: [ceci est un exemple simple __TAG1__FAUX_TAG__ de remplacement.
Pour le tests numero __TAG1____TAG1____TAG2__ lancer le script __SCRIPT__.
Fin du test.
]



------------------------



TAGS / VALUE:


  SCRIPT, [ceci est le tag SCRIPT]
  TAG1, [ceci est le tag TAG1]





------------------------



REPLACED TEXT: [ceci est un exemple simple [ceci est le tag TAG1]FAUX_TAG__ de remplacement.
Pour le tests numero [ceci est le tag TAG1][ceci est le tag TAG1]__TAG2__ lancer le script [ceci est le tag SCRIPT].
Fin du test.
]





------------------------

Greedy and ungreedy matches

By default, a regular expression matches the longest string of characters that matches the expression's rules. This is called "greedy matching".

Sometimes you may want to match the shortest string of characters. This is called "ungreedy matching".

The following code illustrates the use of "greedy and ungreedy matches":


use strict;

my $string = "__TOTO__TITI__TATA__";
my $delimitor = "__";
my $match;

if ($string =~ m/(${delimitor}[_a-zA-Z0-9]+${delimitor})/)
{
  $match = $1;
  print STDOUT "Greedy match = [$match]\n";
}

if ($string =~ m/(${delimitor}[_a-zA-Z0-9]+?${delimitor})/)
{
  $match = $1;
  print STDOUT "Ungreedy match = [$match]\n";
}


Greedy match = [__TOTO__TITI__TATA__]
Ungreedy match = [__TOTO__]
Matching a succession of the same sequence


use strict;

my $string;
my $res1;
my $res2;
my $res3;
my $res4;
my $res5;

if (int @ARGV != 1)
{
  print "Usage: perl test.pl <string>\n";
  exit 1;
}

$string = $ARGV[0];
$res1 = $string;
$res2 = $string;
$res3 = $string;
$res4 = $string;
$res5 = $string;

$res1 =~ s/(.)\1/$1/g;
print 's/(.)\1/$1/g ' . " => $res1\n\n";

$res2 =~ s/(.)\1\1/$1/g;
print 's/(.)\1\1/$1/g ' . " => $res2\n\n";

$res3 =~ s/(.)\1+/$1/g;
print 's/(.)\1+/$1/g ' . " => $res3\n\n";

$res4 =~ s/(.)\1([ABC])\2/$1$2/g;
print 's/(.)\1([ABC])\2/$1$2/g ' . " => $res4\n\n";

$res5 =~ s/(.)\1+([ABC])\2+/$1$2/g;
print 's/(.)\1+([ABC])\2+/$1$2/g ' . " => $res4\n\n";

exit 0;


perl test.pl 1111AAAA2222BBBB
s/(.)\1/$1/g => 11AA22BB

s/(.)\1\1/$1/g => 11AA22BB

s/(.)\1+/$1/g => 1A2B

s/(.)\1([ABC])\2/$1$2/g => 111AAA222BBB

s/(.)\1+([ABC])\2+/$1$2/g => 111AAA222BBB
Repeating patterns


use strict;


## the expression '\1' represents the first pattern (between '(' ')').
## the expression '\2' represents the second pattern (between '(' ')').
## ...

## '\1' means: the first pattern 1 times (not 3 times)
## '\1{1}' means: the first pattern 2 times (not 1 times)
## '\1{2}' means: the first pattern 3 times (not 2 times)
## '\1{3}' means: the first pattern 4 times (not 3 times)



my $tel = $ARGV[0];
my $type = 'unknown';

while (1)
{

  ####
  #### derniers chiffres egaux (0xxxxx1111)
  ####

  if ($tel =~ m/^0[0-9]{5}([0-9])\1{3}$/)
  {
    $type = '0xxxxx1111';
    last;
  }

  ####
  #### Double quadruplette (xx10241024)
  ####

  if ($tel =~ m/^[0-9]{2}([0-9]{4})\1{1}$/)
  {
    $type = 'xx10241024';
    last;
  }

  ####
  #### 2 centaines en fin de numéro (xxxx100200)
  ####

  if ($tel =~ m/^[0-9]{4}[1-9]00[1-9]00$/)
  {
    $type = 'xxxx100200';
    last;
  }

  ####
  #### 2 paires (xxxxxx1212)
  ####

  if ($tel =~ m/^[0-9]{6}([0-9]{2})\1{1}$/)
  {
    $type = 'xxxxxx1212';
    last;
  }

  ####
  #### Double triplette finale (xxxx720720)
  ####

  if ($tel =~ m/^[0-9]{4}([0-9]{3})\1{1}$/)
  {
    $type = 'xxxx720720';
    last;
  }

  ####
  #### 3 derniers chiffres égaux (xxxxxxx111)
  ####

  if ($tel =~ m/^[0-9]{7}([1-9])\1{2}$/)
  {
    $type = 'xxxxxxx111';
    last;
  }

  ####
  #### 2 doubles (xxxxxx1122)
  ####

  if ($tel =~ m/^[0-9]{6}([0-9])\1{1}([0-9])\2{1}$/)
  {
    $type = 'xxxxxx1122';
    last;
  }

  ####
  #### Les 4 remiers chiffres sont égaux aux 4 derniers (0154xx0154)
  ####

  if ($tel =~ m/^([0-9]{4})[0-9]{2}\1$/)
  {
    $type = '0154xx0154';
    last;
  }

  ####
  #### Les années 2000 (xxxxx2002)
  ####

  if ($tel =~ m/^0[0-9]{5}200[0-9]{1}$/)
  {
    $type = '0xxxxx2002';
    last;
  }

  ####
  #### 3 derniers chiffres égal a zéro (0xxxxx9000)
  ####

  if ($tel =~ m/^0[0-9]{5}[1-9]000$/)
  {
    $type = '0xxxxx9000';
    last;
  }

  ####
  #### Les années 1995 à 1999 (0xxxxx1995)
  ####

  if ($tel =~ m/^0[0-9]{5}199[5-9]{1}$/)
  {
    $type = '0xxxxx1995';
    last;
  }

  last;
}

print "Type = $tel est de type <$type>\n";


perl regexp.pl 0123452211
Type = 0123452211 est de type <xxxxxx1122>
Matches' contents


use strict;

my $text = 'Before the match "cat" and after the match. Before the match "dog" and after.';

print 'm/"cat"/:' . "\n";
if ($text =~ m/"cat"/)
{
  print "\t\$` = [$`]\n";
  print "\t\$& = [$&]\n";
  print "\t\$' = [$']\n";
}
print "\n\n";

print 'm/("cat").*("dog")/:' . "\n";
if ($text =~ m/("cat").*("dog")/)
{
  print "\t\$` = [$`]\n";
  print "\t\$& = [$&]\n";
  print "\t\$' = [$']\n";
  print "\tStart of \$& = " . $-[0] . "\n";
  print "\tEnd of \$& = " . $+[0] . "\n";

  print "\n";
  print "\t\$1 = [$1]\n";
  print "\tStart of \$2 = " . $-[1] . "\n";
  print "\tEnd of \$2 = " . $+[1] . "\n";

  print "\n";
  print "\t\$2 = [$2]\n";
  print "\tStart of \$2 = " . $-[2] . "\n";
  print "\tEnd of \$2 = " . $+[2] . "\n";
}
print "\n\n";


m/"cat"/:
        $` = [Before the match ]
        $& = ["cat"]
        $' = [ and after the match. Before the match "dog" and after.]


m/("cat").*("dog")/:
        $` = [Before the match ]
        $& = ["cat" and after the match. Before the match "dog"]
        $' = [ and after.]
        Start of $& = 17
        End of $& = 66

        $1 = ["cat"]
        Start of $2 = 17
        End of $2 = 22

        $2 = ["dog"]
        Start of $2 = 61
        End of $2 = 66


Loops, tests, ...



@tab = ("pomme", "poire", "fruit");
$n = 0;
foreach $i (@tab)
{
  print "$n => $i\n";
  $n += 1;
  $n++;
}

0 => pomme
2 => poire
4 => fruit

for ($i=0; $i<5; $i++)
{
  print "=> $i\n";
}

=> 0
=> 1
=> 2
=> 3
=> 4

$i = 10;
$j = 15;

if (($i == 10) && ($j == 15)) { print "\nTEST and"; }
if (($i == 10) || ($j == 12)) { print "\nTEST or\n\n"; }

TEST and
TEST or
This code illustrates some strange things about Perl programming. We can explain it of course, but it looks fishy. If you want to avoid these problems, make sure to use the Perl option "-w".


# -------------------------------------------------------
# Testing the operator 'defined' and 'if'
# -------------------------------------------------------

use strict;


# -------------------------------------------------------
# Variable not initialized
# -------------------------------------------------------

my $var;
if (defined($var)) { print STDOUT "if (defined(\$var not initialized)) returns TRUE\n"; }
   else
   { print STDOUT "defined(\$var not initialized)) returns FALSE\n"; }

if ($var) { print STDOUT "if (\$var not initialized) returns TRUE\n"; }
   else
   { print STDOUT "if (\$var not initialized) returns FALSE\n"; }

print STDOUT "\n\n";

# -------------------------------------------------------
# Variable set to 'undef'
# -------------------------------------------------------

my $var = undef;
if (defined($var)) { print STDOUT "if (defined(undef)) returns TRUE\n"; }
   else
   { print STDOUT "if (defined(undef)) returns FALSE\n"; }

if ($var) { print STDOUT "if (undef) returns TRUE\n"; }
   else
   { print STDOUT "if (undef) returns FALSE\n"; }

print STDOUT "\n\n";

# -------------------------------------------------------
# Empty string
# -------------------------------------------------------

$var = '';
if (defined($var)) { print STDOUT "if (defined('')) returns TRUE\n"; }
   else
   { print STDOUT "if (defined('')) returns FALSE\n"; }

if ($var) { print STDOUT "if ('') returns TRUE\n"; }
   else
   { print STDOUT "if ('') returns FALSE\n"; }

print STDOUT "\n\n";

# -------------------------------------------------------
# Empty string
# -------------------------------------------------------

$var = "";
if (defined($var)) { print STDOUT "if (defined(\"\")) returns TRUE\n"; }
   else
   { print STDOUT "if (defined(\"\")) returns FALSE\n"; }

if ($var) { print STDOUT "if (\"\") returns TRUE\n"; }
   else
   { print STDOUT "if (\"\") returns FALSE\n"; }

print STDOUT "\n\n";

# -------------------------------------------------------
# Non empty string
# -------------------------------------------------------

$var = "\n0";
if (defined($var)) { print STDOUT "if (defined(\"\\n0\")) returns TRUE\n"; }
   else
   { print STDOUT "if (defined(\"\\n0\")) returns FALSE\n"; }

if ($var) { print STDOUT "if (\"\\n0\") returns TRUE\n"; }
   else
   { print STDOUT "if (\"\\n0\") returns FALSE\n"; }

print STDOUT "\n\n";

# -------------------------------------------------------
# Non empty string
# -------------------------------------------------------

$var = '\n0';
if (defined($var)) { print STDOUT "if (defined('\\n0')) returns TRUE\n"; }
   else
   { print STDOUT "if (defined('\\n0')) returns FALSE\n"; }

if ($var) { print STDOUT "if ('\\n0') returns TRUE\n"; }
   else
   { print STDOUT "if ('\\n0') returns FALSE\n"; }

print STDOUT "\n\n";

# -------------------------------------------------------
# Space
# -------------------------------------------------------

$var = ' ';
if (defined($var)) { print STDOUT "if (defined(' ')) returns TRUE\n"; }
   else
   { print STDOUT "if (defined(' ')) returns FALSE\n"; }

if ($var) { print STDOUT "if (' ') returns TRUE\n"; }
   else
   { print STDOUT "if (' ') returns FALSE\n"; }

print STDOUT "\n\n";

# -------------------------------------------------------
# Space
# -------------------------------------------------------

$var = " ";
if (defined($var)) { print STDOUT "if (defined(\" \")) returns TRUE\n"; }
   else
   { print STDOUT "if (defined(\" \")) returns FALSE\n"; }

if ($var) { print STDOUT "if (\" \") returns TRUE\n"; }
   else
   { print STDOUT "if (\" \") returns FALSE\n"; }

print STDOUT "\n\n";

# -------------------------------------------------------
# Zero value
# -------------------------------------------------------

$var = 0;
if (defined($var)) { print STDOUT "if (defined(0)) returns TRUE\n"; }
   else
   { print STDOUT "if (defined(0)) returns FALSE\n"; }

if ($var) { print STDOUT "if (0) returns TRUE\n"; }
   else
   { print STDOUT "if (0) returns FALSE\n"; }

print STDOUT "\n\n";

# -------------------------------------------------------
# Empty array
# -------------------------------------------------------

my @tab = ();
if (defined(@tab)) { print STDOUT "if (defined(empty array)) returns TRUE\n"; }
   else
   { print STDOUT "if (defined(empty array)) returns FALSE\n"; }

if (@tab) { print STDOUT "if (empty array) returns TRUE\n"; }
   else
   { print STDOUT "if (empty array) returns FALSE\n"; }

print STDOUT "\n\n";

# -------------------------------------------------------
# Empty hash
# -------------------------------------------------------

my %hash = ();
if (defined(%hash)) { print STDOUT "if (defined(empty hash)) returns TRUE\n"; }
   else
   { print STDOUT "if (defined(empty hash)) returns FALSE\n"; }

if (%hash) { print STDOUT "if (empty hash) returns TRUE\n"; }
   else
   { print STDOUT "if (empty hash) returns FALSE\n"; }

print STDOUT "\n\n";

# -------------------------------------------------------
# Testing 'undef' against other values
# -------------------------------------------------------

$var = 0;
        if ($var == undef) { print STDOUT "0 == undef\n"; }
        else
        { print STDOUT "0 != undef\n"; }
        print STDOUT "\n\n";

$var = '';
        if ($var == undef) { print STDOUT "'' == undef\n"; }
        else
        { print STDOUT "'' != undef\n"; }
        print STDOUT "\n\n";

$var = "";
        if ($var == undef) { print STDOUT "\"\" == undef\n"; }
        else
        { print STDOUT "\"\" != undef\n"; }
        print STDOUT "\n\n";

$var = ' ';
        if ($var == undef) { print STDOUT "' ' == undef\n"; }
        else
        { print STDOUT "' ' != undef\n"; }
        print STDOUT "\n\n";

$var = " ";
        if ($var == undef) { print STDOUT "\" \" == undef\n"; }
        else
        { print STDOUT "\" \" != undef\n"; }
        print STDOUT "\n\n";

$var = '0';
        if ($var == undef) { print STDOUT "'0' == undef\n"; }
        else
        { print STDOUT "'0' != undef\n"; }
        print STDOUT "\n\n";

$var = "0";
        if ($var == undef) { print STDOUT "\"0\" == undef\n"; }
        else
        { print STDOUT "\"0\" != undef\n"; }
        print STDOUT "\n\n";

$var = '00';
        if ($var == undef) { print STDOUT "'00' == undef\n"; }
        else
        { print STDOUT "'00' != undef\n"; }
        print STDOUT "\n\n";

$var = "00";
        if ($var == undef) { print STDOUT "\"00\" == undef\n"; }
        else
        { print STDOUT "\"00\" != undef\n"; }
        print STDOUT "\n\n";

$var = '\n';
        if ($var == undef) { print STDOUT "'\\n' == undef\n"; }
        else
        { print STDOUT "'\\n' != undef\n"; }
        print STDOUT "\n\n";

$var = "\n";
        if ($var == undef) { print STDOUT "\"\\n\" == undef\n"; }
        else
        { print STDOUT "\"\\n\" != undef\n"; }
        print STDOUT "\n\n";

$var = '\n0';
        if ($var == undef) { print STDOUT "'\\n0' == undef\n"; }
        else
        { print STDOUT "'\\n0' != undef\n"; }
        print STDOUT "\n\n";

$var = "\n0";
        if ($var == undef) { print STDOUT "\"\\n0\" == undef\n"; }
        else
        { print STDOUT "\"\\n0\" != undef\n"; }
        print STDOUT "\n\n";

$var = '\n1';
        if ($var == undef) { print STDOUT "'\\n1' == undef\n"; }
        else
        { print STDOUT "'\\n1' != undef\n"; }
        print STDOUT "\n\n";

$var = "\n1";
        if ($var == undef) { print STDOUT "\"\\n1\" == undef\n"; }
        else
        { print STDOUT "\"\\n1\" != undef\n"; }
        print STDOUT "\n\n";

$var = '1';
        if ($var == undef) { print STDOUT "'1' == undef\n"; }
        else
        { print STDOUT "'1' != undef\n"; }
        print STDOUT "\n\n";

$var = "1";
        if ($var == undef) { print STDOUT "\"1\" == undef\n"; }
        else
        { print STDOUT "\"1\" != undef\n"; }
        print STDOUT "\n\n";

@tab = ();
        if (@tab == undef) { print STDOUT "Empty array == undef\n"; }
        else
        { print STDOUT "Empty array != undef\n"; }
        print STDOUT "\n\n";

%hash = ();
        if (%hash == undef) { print STDOUT "Empty hash == undef\n"; }
        else
        { print STDOUT "Empty hash != undef\n"; }
        print STDOUT "\n\n";

@tab = (1, 2);
        if (@tab == undef) { print STDOUT "Non empty array == undef\n"; }
        else
        { print STDOUT "Non empty array != undef\n"; }
        print STDOUT "\n\n";

%hash = ( '1' => 'A' );
        if (%hash == undef) { print STDOUT "Non empty hash == undef\n"; }
        else
        { print STDOUT "Non empty hash != undef\n"; }
        print STDOUT "\n\n";

# -------------------------------------------------------
# Other strange stuff
# -------------------------------------------------------

$var = '';
        if ($var == 0) { print STDOUT "'' == 0\n"; }
        else
        { print STDOUT "'' != 0\n"; }
        print STDOUT "\n\n";

$var = "\n";
        if ($var == 0) { print STDOUT "\"\\n\" == 0\n"; }
        else
        { print STDOUT "\"\\n\" != 0\n"; }
        print STDOUT "\n\n";

$var = "\n0";
        if ($var == 0) { print STDOUT "\"\\n0\" == 0\n"; }
        else
        { print STDOUT "\"\\n0\" != 0\n"; }
        print STDOUT "\n\n";

$var = "0\n0";
        if ($var == 0) { print STDOUT "\"0\\n0\" == 0\n"; }
        else
        { print STDOUT "\"0\\n0\" != 0\n"; }
        print STDOUT "\n\n";

$var = "\n1";
        if ($var == 0) { print STDOUT "\"\\n1\" == 0\n"; }
        else
        { print STDOUT "\"\\n1\" != 0\n"; }
        print STDOUT "\n\n";

@tab = ();
        if (@tab == 0) { print STDOUT "Empty array == 0\n"; }
        else
        { print STDOUT "Empty array != 0\n"; }
        print STDOUT "\n\n";

%hash = ();
        if (%hash == 0) { print STDOUT "Empty hash == 0\n"; }
        else
        { print STDOUT "Empty hash != 0\n"; }
        print STDOUT "\n\n";


defined($var not initialized)) returns FALSE
if ($var not initialized) returns FALSE


if (defined(undef)) returns FALSE
if (undef) returns FALSE


if (defined('')) returns TRUE
if ('') returns FALSE


if (defined("")) returns TRUE
if ("") returns FALSE


if (defined("\n0")) returns TRUE
if ("\n0") returns TRUE


if (defined('\n0')) returns TRUE
if ('\n0') returns TRUE


if (defined(' ')) returns TRUE
if (' ') returns TRUE


if (defined(" ")) returns TRUE
if (" ") returns TRUE


if (defined(0)) returns TRUE
if (0) returns FALSE


if (defined(empty array)) returns FALSE
if (empty array) returns FALSE


if (defined(empty hash)) returns FALSE
if (empty hash) returns FALSE


0 == undef


'' == undef


"" == undef


' ' == undef


" " == undef


'0' == undef


"0" == undef


'00' == undef


"00" == undef


'\n' == undef


"\n" == undef


'\n0' == undef


"\n0" == undef


'\n1' == undef


"\n1" != undef


'1' != undef


"1" != undef


Empty array == undef


Empty hash == undef


Non empty array != undef


Non empty hash != undef


'' == 0


"\n" == 0


"\n0" == 0


"0\n0" == 0


"\n1" != 0


Empty array == 0


Empty hash == 0


Files



open (fd, "./exx");
close (fd);
open (fd, "./exxx") || { print "\nthe file 'exxx' does not exist\n" };
Note: The file "./exx" exists but "./exxx" does not exist.

the file 'exxx' does not exist

open (fd, "./exx");
$i = 0;
while (<fd>)
{
   print "line $i = $_";
   $i++;
}
close (fd);
Note 1:
"$_" contains the current line.

Note 2:
The file "./exx" contains:
item1 item2 item3
item0 item9 item1
item3 item7 item2
item4 item2 item5

line 0 = item1 item2 item3
line 1 = item0 item9 item1
line 2 = item3 item7 item2
line 3 = item4 item2 item5

open (fd, ">./exxx");
print fd "ceci est un exemple";
close (fd);
the ">" means:
  • if "./exxx" does not exist, it is created.
  • if "./exxx" exists, it is overwritten.

open (fd, ">>./exxx");
print fd "\najout en fin de fichier";
close (fd);
the ">>" means:
  • if "./exxx" does not exist, it is created.
  • if "./exxx" exists, you add data at the end of the file.

chmod (0777, "exxx");
rename ("exxx", "exx_");
unlink ("exx_");
unlink ("exx_") || print "\nfile does not exist -- can't exist\n\n";
Like under UNIX. "unlink" means "delete".

if ( -e "exx" ) { print "\nfile exx exists\n"; }
if ((-e "exx") && (-d "..")) { print "file and directory\n"; }
Testing if files or directories exist. Like UNIX shell scripts.

opendir (dir, ".");
@files = readdir(dir);
  print "\n ===> @files";
  $N = @files;
  for ($i=0; $i<$N; $i++)
  {
    print "\n => $files[$i]";
  }
closedir (dir);
This code prints out the content of the directory (equivalent to the ls UNIX command).
Passing a file handle as argument for a function


#################################################################
# Function log_parser()
#
# This function is used to parse log files from the massive sen-
# -der.
#
# Arguments:
# 1. (file_handle): reference to the file handle that repre-
# -sents the open log file.
# 2. (err): reference to a string that will be used to store
# an error message.
# 3. (fields): reference to a hash table that will contain
# data extracted from the log file.
#
# Upon successful completion, "$err" is undefined. The function
# returns the line read from the log file.
# If an error occured, the value of "$err" is defined and it
# contains the description of the error.
#
# Example: It seems necessary to give an example that shows how
# to use this function.
#
# use strict;
#
# unless (open(FILE, "<toto"))
# {
# print STDERR "Can not open file";
# exit 1;
# }
#
# my $err;
# my %data;
#
#
# while (log_parser(\*FILE, \$err, \%data))
# {
# unless ($err)
# {
# print "\n date : $data{'date'}";
# print "\n time : $data{'time'}";
# print "\n pid : $data{'pid'}";
# print "\n level : $data{'level'}";
# print "\n message : $data{'message'}";
# }
# else
# {
# print "\n [$err]";
# }
# }
#
#################################################################


sub log_parser
{
  my ($file_handle, $err, $fields) = @_;
  my $line;
  my $date;
  my $time;
  my $pid;
  my $level;
  my $message;

  ${$err} = undef;
  %{$fields} = ();

  # ------------------------------------------------
  # Read the current line from the log file
  # ------------------------------------------------

  $line = <$file_handle>;

  unless(defined($line)) { return $line; }

  # ------------------------------------------------
  # Now extract data from line
  # ------------------------------------------------

  $date = $line;
  $time = $line;
  $pid = $line;
  $level = $line;
  $message = $line;

  unless ($line =~ m/^[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2} [0-9]+ [A-Z]+ \[.*\]$/)
  {
    $$err = "Invalid log line: \"$line\"";
    return $line;
  }

  $date =~ s/^([0-9]{4}-[0-9]{2}-[0-9]{2}).*$/$1/;
  $time =~ s/^[0-9]{4}-[0-9]{2}-[0-9]{2} ([0-9]{2}:[0-9]{2}:[0-9]{2}).*$/$1/;
  $pid =~ s/^[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2} ([0-9]+) .*$/$1/;
  $level =~ s/^[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2} [0-9]+ ([A-Z]+) .*$/$1/;
  $message =~ s/^[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2} [0-9]+ [A-Z]+ (\[.*\])$/$1/;

  # ------------------------------------------------
  # Remove trailing new line characters
  # ------------------------------------------------

  chomp($date);
  chomp($time);
  chomp($pid);
  chomp($level);
  chomp($message);

  # ------------------------------------------------
  # Now build the hash
  # ------------------------------------------------

  %{$fields}->{'date'} = "$date";
  %{$fields}->{'time'} = "$time";
  %{$fields}->{'pid'} = "$pid";
  %{$fields}->{'level'} = "$level";
  %{$fields}->{'message'} = "$message";

  return $line;
}

This code shows how to pass a file handle as argument to a function. This function reads a input file that looks something like:


2002-05-29 15:54:39 1512 INFO [starting SMTP sender]
2002-05-29 15:54:40 1512 WARNING [warning message]
...
      



References


The Six Types of References
Reference Assignment How to Dereference
$refScalar = \$scalar; ${$refScalar} is a scalar value.
$refArray = \@array; @{$refArray} is an array value. Use @{$refArray}->[index]
$refHash = \%hash; %{$refHash} is a hash value. Use %{$refHash}->{'key'}
$refFunction = \&function; &{$refFunction} is a function location.
$refGlob = \*FILE;$refGlob is a reference to a file handle and seems to be automatically dereferenced by Perl.
$refRef = \$refScalar; ${${$refScalar}} is a scalar value.
The function "ref($var)" returns FALSE is $var is not a reference. Otherwise, if $var is a reference, the function returns one of the following string:

  • "REF": $var is a reference to a reference.
  • "SCALAR": $var is a reference to a scalar.
  • "ARRAY": $var is a reference to an array.
  • "HASH": $var is a reference to a hash table.
  • "CODE": $var is a reference to a code.
  • "GLOB": $var is a reference to a glob.


use strict;


sub reference
{
  my ($arg) = @_;

  if (ref($arg))
  {
    if (ref($arg) eq "REF") { print "\t This is a reference to a reference (REF)\n"; }
    elsif (ref($arg) eq "SCALAR") { print "\t This is a reference to a scalar (SCALAR)\n"; }
    elsif (ref($arg) eq "ARRAY") { print "\t This is a reference to an array (ARRAY)\n"; }
    elsif (ref($arg) eq "HASH") { print "\t This is a reference to a hash table (HASH)\n"; }
    elsif (ref($arg) eq "CODE") { print "\t This is a reference to a code (CODE)\n"; }
    elsif (ref($arg) eq "GLOB") { print "\t This is a reference to a glob (GLOB)\n"; }
    else { print "\t This is a reference to an unknown type pf variable\n"; }
  }
  else {
         print "\t This is not a reference\n";
         print "\t The type of this variable is " . ref(\$arg) . "\n";
       }
}


my $string = "This is a simple string of characters";

print "Passing the reference to the variable:\n\n";
reference (\$string);
print "\n\n";

print "Passing the variable itself:\n\n";
reference ($string);
print "\n\n";


Passing the reference to the variable:

         This is a reference to a scalar (SCALAR)


Passing the variable itself:

         This is not a reference
         The type of this variable is SCALAR


Subroutines



sub routine
{
  local ($a, $b) = @_;

  print "\nfirst arg = $a";
  print "\nsecond arg = $b\n\n";
}

&routine (5, "toto");
"@_" means: "end of subroutine arguments declaration."
&routine: subroutine call.

Remarks:

  • You can not pass directly an array to a function (because "@_" is an array... so it will mess up the paramaters). If you want to pass an array, you must pass the reference to the array (\@array and in the function @{$refArray}->[index]).
  • Because hash tables are represented like arrays, the same remark aplies to hash tables.
  • To return an empty array (ot an empty hash), you should use the syntax "return ();". Please note that "()" is equivalvent to "false".


Getting a timestamp



sub Format
{
  my ($num) = @_;

  if ($num == 0) { return "00"; }
  if ($num == 1) { return "01"; }
  if ($num == 2) { return "02"; }
  if ($num == 3) { return "03"; }
  if ($num == 4) { return "04"; }
  if ($num == 5) { return "05"; }
  if ($num == 6) { return "06"; }
  if ($num == 7) { return "07"; }
  if ($num == 8) { return "08"; }
  if ($num == 9) { return "09"; }
  return $num;
}

sub Time_Stamp
{
  my $seconde;
  my $minute;
  my $heure;
  my $jour;
  my $mois;
  my $annee;

  ($seconde, $minute, $heure, $jour, $mois, $annee) = (localtime)[0,1,2,3,4,5];
  $annee += 1900;
  $mois += 1;

  return sprintf "%4d-%s-%s %s:%s:%s", $annee, &Format($mois), &Format($jour), &Format($heure), &Format($minute), &Format($seconde);
}
The function Time_Stamp returns the following string:

"YYYY-MM-DD HH:MM:SS"

  • YYYY: year
  • MM: month
  • DD: day
  • HH: hour
  • MM: minute
  • SS: second


HTTP request



#!/usr/bin/perl

use strict;
use HTTP::Request::Common qw(POST);
use LWP::UserAgent;

...

sub HTTP_Request
{
  my ($URL,$values) = @_;

  # Create a HTTP client
  my $client_http = new LWP::UserAgent;

  # format the request
  # $URL: URL used for the HTTP request (ex: "http://www.gnu.org")
  # $values: couples "name=value" (ex: "form1=toto")
  my $request = new HTTP::Request GET => "$URL?$values";

  # send the request
  my $response = $client_http->request($request);

  # get the response
  if ($response->is_success)
  { return $response->content; }

  return "ERROR";
}


CGI: get data from the web server



#!/usr/local/bin/perl

use strict;
use CGI qw(:standard);

...

# get data from the Apache server
my $query = new CGI;

# analyse the data from the Apache server
# Here we suppose that we have something like
# "select=value1&..." from Apache
my $data = $query->param('select');
if (! defined $data) { &Error_Exit("Error while receiving data"); }

...


File inclusion



#!/usr/local/bin/perl

use strict;
...

use vars qw
(
  $MIN_Values
  @coefs
  $total
  %queryAttr
);

require "./config";
...
The variables MIN_Values, @coefs, $total and %queryAttr are defined in the file "./config".

Example of "config" file:


%queryAttr =
(
  "POP3" => {
              "Directory_Name" => "val1",
              "Value_Name" => "val2",
            },
  "SMTP" => {
              "Directory_Name" => "val11",
              "Value_Name" => "val22",
            },
);

$MIN_Values = 5;
@coefs = @{ [ 1, 1, 1, 1, 3 ] };
$total = 7;


Simple packaging


file "my_utilities.pm"


#!/usr/bin/perl

use strict;
use Exporter;
package my_utilities;

use vars qw (@ISA @EXPORT);
@ISA = qw(Exporter);
@EXPORT = qw(&Format &Log_Data &Time_Stamp &Wr_File &tmp_file);

...

# Here you define the functions:
# Format
# Log_Data
# Time_Stamp
# Wr_File
# tmp_file


#####################################################
# WARNING: don't forget the "1;" !!! #
#####################################################

1;

Using package "my_utilities.pm"


#!/usr/bin/perl

use my_utilities;
use strict;

...


Perl's special variables


Table 12.2 - Perl's Special Variables
Variable Name Description
Variables That Affect Arrays
$" The separator used between list elements when an array variable is interpolated into a double-quoted string. Normally, its value is a space character.
$[ Holds the base array index. Normally, set to 0. Most Perl authors recommend against changing it without a very good reason.
$; Holds the subscript separator for multi-dimensional array emulation. Its use is beyond the scope of this book. For a more in-depth look at Perl programming, see Que's Special Edition Using Perl for Web Programming.
Variables Used with Files.
$. This variable holds the current record or line number of the file handle last read. It is read-only and will be reset to 0 when the file handle is closed.
$/ This variable holds the input record separator. The record separator is usually the newline character. However, if $/ is set to an empty string, two or more newlines in the input file will be treated as one.
$| This variable, if nonzero, will flush the output buffer after every write() or print() function. Normally, it is set to 0.
$^F This variable holds the value of the maximum system file description. Normally, it's set to 2. The use of this variable is beyond the scope of this book.
$ARGV This variable holds the name of the current file being read when using the diamond operator (<>).
_ This file handle (the underscore) can be used when testing files. If used, the information about the last file tested will be used to evaluate the latest test.
DATA This file handle refers to any data following __END__.
STDERR This file handle is used to send output to the standard error file. Normally, this is connected to the display, but it can be redirected if needed.
STDIN This file handle is used to read input from the standard input file. Normally, this is connected to the keyboard, but it can be changed.
STDOUT This file handle is used to send output to the standard output file. Normally, this is the display, but it can be changed.
Variables Used with Patterns.
$& This variable holds the string that was matched by the last successful pattern match.
$` This variable holds the string that preceded whatever was matched by the last successful pattern match.
$' This variable holds the string that followed whatever was matched by the last successful pattern match.
$+ This variable holds the string matched by the last bracket in the last successful pattern match. For example, the statement /Fieldname: (.*)|Fldname: (.*)/ && ($fName = $+); will find the name of a field even if you don't know which of the two possible spellings will be used.
$* This variable changes the interpretation of the ^ and $ pattern anchors. Setting $* to 1 is the same as using the /m option with the regular expression matching and substitution operators. Normally, $* is equal to 0.
$<number> This group of variables ($1, $2, $3, and so on) holds the regular expression pattern memory. Each set of parentheses in a pattern stores the string that matches the components surrounded by the parentheses into one of the $<number> variables.
Variables Used with Printing
$, This variable is the output separator for the print() function. Normally, this variable is an empty string. However, setting $, to a newline might be useful if you need to print each element in the parameter list on a separate line.
$\ The variable is added as an invisible last element to the parameter list passed to the print() function. Normally, it's an empty string, but if you want to add a newline or some other suffix to everything that is printed, you can assign the suffix to $\.
$# This variable is the default format for printed numbers. Normally, it's set to %.20g, but you can use the format specifiers.
Variables Used with Processes
$$ This UNIX-based variable holds the process number of the process running the Perl interpreter.
$? This variable holds the status of the last pipe close, back-quote string, or system() function.
$0 This variable holds the name of the file containing the Perl script being executed.
$] This variable holds a string that identifies which version of Perl you are using. When used in a numeric context, it will be equal to the version number plus the patch level divided by 1000.
$! This variable, when used in a numeric context, holds the current value of errno. If used in a string context, it will hold the error string associated with errno.
$@ This variable holds the syntax error message, if any, from the last eval() function call.
$< This UNIX-based variable holds the read uid of the current process.
$> This UNIX-based variable holds the effective uid of the current process.
$) This UNIX-based variable holds the read gid of the current process. If the process belongs to multiple groups, then $) will hold a string consisting of the group names separated by spaces.
$^T This variable holds the time, in seconds, at which the script begins running.
$^X This variable holds the full path name of the Perl interpreter being used to run the current script.
%ENV This hash variable contains entries for your current environment variables. Changing or adding an entry will affect only the current process or a child process, never the parent process.
%SIG This hash variable contains entries for signal handlers.
Variables Used with Reports.
$% This variable holds the current page number for the default file handle. If you use select() to change the default file handle, $% will change to reflect the page number of the newly selected file handle.
$= This variable holds the current page length for the default file handle. Changing the default file handle will change $= to reflect the page length of the new file handle.
$- This variable holds the number of lines left to print for the default file handle. Changing the default file handle will change $- to reflect the number of lines left to print for the new file handle.
$~ This variable holds the name of the default line format for the default file handle. Normally, it is equal to the file handle's name.
$^ This variable holds the name of the default heading format for the default file handle. Normally, it is equal to the file handle's name with _TOP appended to it.
$: This variable holds a string that consists of the characters that can be used to end a word when word-wrapping is performed by the ^ report formatting character. Normally, the string consists of the space, newline, and dash characters.
$^L This variable holds the string used to eject a page for report printing.
Miscellaneous Variables
$_ This variable is used as the default parameter for a lot of functions.
$^D This variable holds the current value of the debugging flags.
$^I This variable holds the file extension used to create a backup file for the in-place editing specified by the -i command line option. For example, it could be equal to ".bak."
$^P This variable is an internal flag that the debugger clears so that it will not debug itself.
$^W This variable holds the current value of the -w command line option.
@ARGV This array variable holds a list of the command line arguments. You can use $#ARGV to determine the number of arguments minus one.
@F This array variable holds the list returned from autosplit mode. Autosplit mode is associated with the -a command line option.
@INC This array variable holds a list of directories where Perl can look for scripts to execute. The list is used mainly by the require statement.
%INC This hash variable has entries for each filename included by do or require statements. The key of the hash entries are the filenames and the values are the paths where the files were found.


How to specify path to specific modules


Sometimes you can not (or you don't want to) install Perl modules into the default location. If you do that, Perl won't be able to find the modules. You need to tell Perl where to search for the modules. Doing this is pretty easy, just add the following line at the top of your Perl script:

use lib "path to your Perl modules";


How to install a Perl module in a specific directory


  1. Download the modules from CPAN (Don't use CPAN.pm!)
  2. Gunzip and untar the package:
    • $ gunzip module.tar.gz | tar xvf -
  3. cd to the module source
    • $ cd module
  4. Run the configure script
    • perl Makefile.PL --PREFIX=/path/to/your/directory
    • The --PREFIX switch should be equal to the path to your home
    • directory.
    • EX: $ perl Makefile.PL --PREFIX=/u03/tapplega
  5. Run the make command.
    • $ make
  6. Run the make install command. (You may have to run 'make test' too. Read the README in the module directory.)
    • $ make install


MySql DBI


Connexion


  use strict;
  use DBI;
  ...
  $dbh = DBI->connect(
                     "DBI:mysql:$database:$hostname:$port" .
                     ";mysql_socket=$socket",
                     $user,
                     $password,
                     { PrintError => 0 }
                   );

  unless ($dbh)
  {
    print "\nCan not connect to the MySql server: " .
          $DBI::errstr .
          " (" .
          $DBI::err .
          ")\n";
    exit 1;
  }
  

  • The parameter "PrintError => 0", tells the DBI to mask the warning outputs.
  • The option "socket" defines the path to the UNIX socket used for local connexion. This is a MySql specific option (use the ";").

SQL query


  my $tickets;
  ...

  $sql_statement = "SELECT ...";

  $sth = $dbh->prepare($sql_statement);
  unless ($sth)
  {
    print "\nCan't prepare SQL request:\n\n$sql_statement\n\n" . $DBI::errstr . "\n";
    exit 1;
  }

  $rv = $sth->execute();
  unless ($rv)
  {
    print "\nCan't execute the query:\n\n$sql_statement\n\n" . $DBI::errstr . "\n";
    exit 1;
  }

  # Get all data from the SELECT statement. Put it into the array "tickets".

  my $tickets = $sth->fetchall_arrayref();

  unless ($tickets )
  {
    print "\nfetchall_arrayref() failed: " . $DBI::errstr . "\n";
    exit 1;
  }

  for $ticket (0 .. $#{$tickets})
  {
     $first_field = $tickets->[$ticket][0];
     $second_field = $tickets->[$ticket][1];
     ...
  }
  

Disconnexion


  $rc = $dbh->disconnect;
  unless ($rc)
  {
    print "\nErrot while disconnecting from the MySql server: " .
          $DBI::errstr .
          " (" .
          $DBI::err .
          ")\n";
    exit 1;
  }
  


Catching exceptions


Use the syntax:


  unless (eval { Perl code })
  {
     What to do if an error occured
  }
  

If an error occured, the literal error message can be printed using the variable "$@".

Example:


unless (eval { $r->load_dictionary($dictionnary); })
{
  print "ERROR: Unable to load dictionnary ($@)\n";
  exit 1;
}
  


Date manipulation


Formating the current date using the POSIX library
use strict;
use POSIX;



my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(time());

my $now = sprintf ("%s-%02d-%02d %02d:%02d:%02d",
                                                 1900+$year,
                                                 $mon+1,
                                                 $mday,
                                                 $hour,
                                                 $min,
                                                 $sec
                 );

print STDOUT "Now: $now\n";
Now: 2002-11-15 17:04:47
Getting the calendar time associated with a given local date
use strict;
use POSIX;


my $tt;

# --------------------------------------------------------
# Get the timestamp for the 1 of March 2003 at 1h00 AM.
# --------------------------------------------------------

my $sec = 0;
my $min = 0;
my $hour = 1;
my $day = 1;
my $month = 3;
my $year = 2003;

$tt = mktime (
               $sec, ## sec
               $min, ## min
               $hour, ## hour
               $day, ## day in the month
               $month - 1, ## month (from 0 to 11 included)
               $year - 1900, ## year since 1900
               0, ## day of the week
               0, ## day of the year
               -1 ## daylight saving time
                             ## 0: no daylight saving time
                             ## 1: daylight saving time
                             ## -1: find out
             );

unless (defined($tt))
{
  printf STDERR "ERROR: Can not get the timestamp - Invalid date\n";
  exit 1;
}

print "1 of March 2003 at 1h00 am:\n";
print "calendar time (number of seconds since 00:00:00 UTC, January 1, 1970): $tt\n\n";

# --------------------------------------------------------
# Get the timestamp for the 1 of August 2003 at 1h00 AM.
# --------------------------------------------------------

my $sec = 0;
my $min = 0;
my $hour = 1;
my $day = 1;
my $month = 8;
my $year = 2003;

$tt = mktime (
               $sec, ## sec
               $min, ## min
               $hour, ## hour
               $day, ## day in the month
               $month - 1, ## month (from 0 to 11 included)
               $year - 1900, ## year since 1900
               0, ## day of the week
               0, ## day of the year
               -1 ## daylight saving time
                             ## 0: no daylight saving time
                             ## 1: daylight saving time
                             ## -1: find out
             );

unless (defined($tt))
{
  printf STDERR "ERROR: Can not get the timestamp - Invalid date\n";
  exit 1;
}

print "1 of August 2003 at 1h00 am:\n";
print "calendar time (number of seconds since 00:00:00 UTC, January 1, 1970): $tt\n\n";
This code is executed on a system which configuration is:
  • Winter: MET (Middle European Time: UTC+01)
  • Summer: MEST (Middle European Summer Time: UTC+02).

The numerical values represent: "the number of seconds since 00:00:00 UTC, January 1, 1970". In other words, this is the calendar time.


1 of March 2003 at 1h00 am:
calendar time (number of seconds since 00:00:00 UTC, January 1, 1970): 1046476800

1 of August 2003 at 1h00 am:
calendar time (number of seconds since 00:00:00 UTC, January 1, 1970): 1059692400



Hash table stored on disk


Using GDBM


use strict;
use GDBM_File;

my %h;
tie %h, 'GDBM_File', 'hash_file', &GDBM_WRCREAT, 0640;

$h{'Key1'} = "value1";
$h{'Key2'} = "value2";
$h{'Key3'} = "value3";

untie %h;

Note: To open a file that represents a hash, use the following syntax:

  unless (tie %hdsk, 'GDBM_File', "$db", &GDBM_READER, 0640)
  {
    # Error processing
    ...
  }

The content of the hash table is stored in a file.


Parsing command line arguments


There are many perl modules available, but I found "Getopt::Long" pretty good.


use strict;
use Getopt::Long;

my $out_format = ''; # output format
my $input_file = ''; # input file
my $output_file = ''; # output file
my $level = ''; # log level
my $help = ''; # help flag

unless (
         GetOptions (
                      'help' => \$help,
                      'format=s' => \$out_format,
                      'input=s' => \$input_file,
                      'output=s' => \$output_file,
                      'level=s' => \$level
                    )
       )
{
  print STDERR "\nERROR: invalid command line\n";
  exit 1;
}

# --------------------------------------------------------------
# Do we print the help ?
# --------------------------------------------------------------

if ($help == 1)
{
  print STDOUT "\nUsage: perl log_parser.pl " .
               "[-input=<file name>] " .
               "[-output=<file name>] " .
               "[-format=(text | html | xml)] " .
               "[-level=<level value>]";

  print STDOUT "\n";
  print STDOUT "\ninput : Specify the input file (default: standard input)";
  print STDOUT "\noutput : Specify the output file (default: standard output)";
  print STDOUT "\nformat : Specify the output format (default: text)";
  print STDOUT "\nlevel : Specify the log level (default: all)";
  print STDOUT "\n\n";

  exit 0;
}

print STDOUT "\n";
print STDOUT "\tInput file = $input_file\n";
print STDOUT "\tOutput file = $output_file\n";
print STDOUT "\tFormat = $out_format\n";
print STDOUT "\tLevel = $level\n";


perl test.pl -help

Usage: perl log_parser.pl [-input=<file name>] [-output=<file name>] [-format=(text | html | xml)] [-level=<level value>]

input : Specify the input file (default: standard input)
output : Specify the output file (default: standard output)
format : Specify the output format (default: text)
level : Specify the log level (default: all)



perl test.pl -input name1 -output name2

        Input file = name1
        Output file = name2
        Format =
        Level =


Signal management



use strict;
use POSIX qw(:signal_h);
use IO::Handle;

STDOUT->autoflush(1);



my $STOP = 0;
my $SIGNAL = 'no signal';
my $sigset;
my $old_sigset;

# -------------------------------------------------------
# Signal handler executed when a signal is received
# -------------------------------------------------------

sub signal_handler
{
  my ($sig) = @_;

  $STOP = 1;
  $SIGNAL = "Received signal $sig";
}

# -------------------------------------------------------
# Bind signal hadler to signals
# -------------------------------------------------------

$SIG{'INT'} = 'signal_handler';
$SIG{'TERM'} = 'signal_handler';
$SIG{'HUP'} = 'signal_handler';

# -------------------------------------------------------
# Sleep 5 seconds
# -------------------------------------------------------

print STDOUT "\n\n";
print STDOUT "=&dt; You have 5 seconds to press [CTRL]-[C] (signals activated)\n\n";
sleep 5;

# -------------------------------------------------------
# Block signals and restore signals, so we can look at
# variable STOP
# -------------------------------------------------------

$sigset = POSIX::SigSet-&dt;new(SIGINT, SIGHUP, SIGTERM);
$old_sigset = POSIX::SigSet-&dt;new;
unless (defined sigprocmask(SIG_BLOCK, $sigset, $old_sigset))
{ die "Could not block signal"; }
print STDOUT "Signals blocked\n";

print STDOUT "\tmain(): STOP = $STOP\n";
$STOP = 0;
$SIGNAL = 'no signal';

print STDOUT "\nUnblock signals now\n\n";
unless (defined sigprocmask(SIG_UNBLOCK, $old_sigset))
{ die "Could not unblock signal"; }

# -------------------------------------------------------
# Block signals and wait 5 seconds
# -------------------------------------------------------

$sigset = POSIX::SigSet-&dt;new(SIGINT, SIGHUP, SIGTERM);
$old_sigset = POSIX::SigSet-&dt;new;
unless (defined sigprocmask(SIG_BLOCK, $sigset, $old_sigset))
{ die "Could not block signal"; }
print STDOUT "Signals blocked\n";

print STDOUT "\t=&dt; You have 5 seconds to press [CTRL]-[C] (signals ignored)\n";
sleep 5;

print STDOUT "Unblock signals now\n\n";
unless (defined sigprocmask(SIG_UNBLOCK, $old_sigset))
{ die "Could not unblock signal"; }

# -------------------------------------------------------
# Block signals and restore signals, so we can look at
# variable STOP
# -------------------------------------------------------

$sigset = POSIX::SigSet-&dt;new(SIGINT, SIGHUP, SIGTERM);
$old_sigset = POSIX::SigSet-&dt;new;
unless (defined sigprocmask(SIG_BLOCK, $sigset, $old_sigset))
{ die "Could not block signal"; }
print STDOUT "Signals blocked\n";

print STDOUT "\tmain(): STOP = $STOP\n";
$STOP = 0;
$SIGNAL = 'no signal';

print STDOUT "Unblock signals now\n";
unless (defined sigprocmask(SIG_UNBLOCK, $old_sigset))
{ die "Could not unblock signal"; }
print STDOUT "\n\n";
This simple example shows how to:

  • set signal handlers.
  • block signals (but save signals into a stack).
  • restaure signals (look at saved signals).


Sorting



use strict;




## -------------------------------------------------- ##
## Note about the comparaison operator 'cmp' ##
## ##
## $a cmp $b = -1 if $a < $b ##
## $a cmp $b = 0 if $a == $b ##
## $a cmp $b = +1 if $a > $b ##
## -------------------------------------------------- ##



my $elem;
my @sorted;

# ------------------------------------------------------
# Sorting regular array
# ------------------------------------------------------

my @list = ('D1', 'A1', 'B1', 'C1', 'A0');

@sorted = sort { return ($a cmp $b); } @list;

print "Initial order:\n\n";
foreach $elem (@list) { print "=> $elem\n"; }
print "\n";
print "Alphabetical order:\n\n";
foreach $elem (@sorted) { print "=> $elem\n"; }
print "\n";

# ------------------------------------------------------
# Sorting associative array
# ------------------------------------------------------

my %hash = (
             'D1' => 'value D1',
             'A1' => 'value A1',
             'B1' => 'value B1',
             'C1' => 'value C1',
             'A0' => 'value A0'
           );

print "Initial order:\n\n";
foreach $elem (keys %hash) { print "=> $elem: " . $hash{"$elem"} . "\n"; }
print "\n";

print "Alphabetical order:\n\n";
foreach $elem (sort { return ($hash{"$a"} cmp $hash{"$b"}); } (keys %hash))
{ print "=> $elem: " . $hash{"$elem"} . "\n"; }
print "\n";

print "Revers alphabetical order:\n\n";
foreach $elem (sort { return -1 * ($hash{"$a"} cmp $hash{"$b"}); } (keys %hash))
{ print "=> $elem: " . $hash{"$elem"} . "\n"; }


Initial order:

=> D1
=> A1
=> B1
=> C1
=> A0

Alphabetical order:

=> A0
=> A1
=> B1
=> C1
=> D1

Initial order:

=> A0: value A0
=> A1: value A1
=> B1: value B1
=> C1: value C1
=> D1: value D1

Alphabetical order:

=> A0: value A0
=> A1: value A1
=> B1: value B1
=> C1: value C1
=> D1: value D1

Revers alphabetical order:

=> D1: value D1
=> C1: value C1
=> B1: value B1
=> A1: value A1
=> A0: value A0

use strict;

my $ref = [['a', 'b', 'c'], ['d', 'e', 'f'], ['a', 'a', 'c']];
my @res;
my $ligne;
my $colonne;

sub compare
{
  my ($a, $b) = @_;

  return (($a->[1]) cmp ($b->[1]));
}

@res = sort { compare($a, $b); } @{$ref};

foreach $ligne (@res)
{
  foreach $colonne (@{$ligne})
  {
    print $colonne . ' ';
  }

  print "\n";
}


a a c
a b c
d e f


How to write a UNIX deamon


The following function turns the curent script into a deamon.


use POSIX qw(setsid);


# ---------------------------------------------------------------
# Run the current process as deamon
#
# err: Reference to a string that will receive an error message
# if an error occured.
#
# return: Upon succesfull completion, the function returns the
# value 0. Otherwise, the function returns the vamlue 1.
# ---------------------------------------------------------------

sub daemonize
{
    my ($err) = @_;
    my $pid;

    $$err = 'OK';

    # flush the buffer
    $| = 1;

    unless (chdir '/') { $$err = "Can't chdir to '/': $!"; return 1; }
    unless (open (STDIN, '/dev/null')) { $$err = "Can't read /dev/null: $!"; return 1; }
    unless (open (STDOUT, '>>/dev/null')) { $$err = "Can't write to /dev/null: $!"; return 1; }
    unless (open (STDERR, '>>/dev/null')) { $$err = "Can't write to /dev/null: $!"; return 1; }
    unless (defined($pid = fork())) { $$err = "Can't fork: $!"; return 1; }
    exit if $pid;

    unless (setsid) { $$err = "Can't start a new session: $!"; return 1; }
    umask 0;

    return 0;
}


How to send attached file in an email


The following function returns the text to send via SMTP in order to attach a given file to an email. Note that you need to add the SMTP header.


use MIME::Base64;
use MIME::QuotedPrint;

# ---------------------------------------------------------------
# attached_file
#
# Format a memory buffer that represents the data to send through
# SMTP for file attaching.
#
# file_path: Path to the file to attach.
# file_name: Name of the file as it will appear in the mail.
# $mes: Message of the mail.
# err: Reference to a string that will receive an error
# message if an error occured.
# b: Reference to a string that will receive the boun-
# -dary.
# content_type: The defualt content type is:
# 'application/octet-stream'. If you want to
# change this, , set this paramter to the
# content type you want to use. To use the default
# set this argument to the value 'undef'.
#
# return: Upon successfull completion, the function returns the
# string to send through SMTP. Otherwise, the function
# returns the value "undef".
# ---------------------------------------------------------------

sub attached_file
{
  my ($file_path, $file_name, $mes, $err, $b, $content_type) = @_;

  my $boundary;
  my $message;
  my $mail_body;


  $$err = 'SUCCESS';
  $boundary = "====" . time() . "====";
  $$b = $boundary;
  $message = encode_qp("$mes");

  unless (defined($content_type)) { $content_type = 'application/octet-stream'; }

  unless (open (ATTACHEMENT, "<$file_path"))
  {
    $$err = "Can not open file to attach '$file_path': $!";
    return undef;
  }

  binmode ATTACHEMENT;
  undef $/;
  $mail_body = encode_base64(<ATTACHEMENT>);
  $/ = "\n";

  close ATTACHEMENT;

  $boundary = '--'.$boundary;
  $mail_body = <<END_OF_BODY;

$boundary
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

$message
$boundary
Content-Type: $content_type; name="$file_name"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="$file_name"

$mail_body
$boundary--
END_OF_BODY

  return $mail_body;
}



DEGIN statement


Sometimes you need to execute an action before anything else. This code tests the existence of an environment variable before executing the rest of the script. If the variable is not set, then it aborts its execution.


use strict;

BEGIN {
        unless ($ENV{'DIS_BASE'})
        {
          print STDERR "WARNING:\n\n";
          print STDERR "You did not set the environment variable 'DIS_BASE'! (see documentation). ";
          print STDERR "DIS can not run if this variable is not set.\n\n";
          exit 0;
        }

        my $path = $ENV{'DIS_BASE'} . "/MODULES";
        unshift @INC, $path;
      }

...


Removing HTML comments


This may not be the best way to do it, but it works.


sub remove_html_comments
{
  my ($html) = @_;
  my @tokens = ();
  my @pure = ();
  my $text = '';

  @tokens = split (/<\!\-\-/, $html);

  push (@pure, $tokens[0]);
  for (my $i=1; $i<int @tokens; $i++)
  {
    my @broken = split (/\-\->/, $tokens[$i]);
    if (int @broken > 1) { push (@pure, $broken[1]); }
  }

  $text = join (' ', @pure);
  return $text;
}


Removing C comments



use strict;

sub remove_comment
{
  my ($file) = @_;
  my $t = undef;

  unless(open(FD, "<$file")) { return undef; }

  $/ = undef;
  $t = <FD>;
  $/ = "\n";
  close FD;

  $t =~ s/\/\*(.|\n)*?\*\///mg; ### This means 'multi-line global'
  $t =~ s/^\s+$//mg;
  $t =~ s/\n{2,}/\n\n/mg;

  return $t;
}

while (<STDIN>)
{
  my $file = undef;
  my $t = undef;

  chomp ($_);
  $file = $_;
  print STDOUT "Processing file <$file>\n";

  $t = remove_comment($file);
  unless (defined($t))
  {
    print STDERR "Error on file <$file>\n";
    exit 1;
  }
  unless (unlink($file) == 1) { print STDERR "Could not delete file <$file>: $!\n"; exit 1; }
  unless(open(FD, ">$file")) { print STDERR "Could not create file <$file>: $!\n"; exit 1; }
  print FD $t;
  close FD;
}


Running a CGI


Did you ever wonder how to execute an external command with the following constraints:

  • You want to send data to the external command via STDOUT (you write to STDOUT, and the external program reads from STDIN).
  • You want to get the result external command via STDIN (the external command writes to STDOUT, and you read from STDIN).
  • You want the exit status of the command.

The following code whows ho to proceed:


package post;

use strict;

use FileHandle;
use IPC::Open2;

use vars qw (@ISA @EXPORT);
@ISA = qw(Exporter);
@EXPORT = qw(
              &run_cgi
            );


my $GOT_SIGPIPE = 0;
my $GOT_SIGALARM = 0;


#****f* post/URLEncode
# NAME
# URLEncode - URL encode a given string of characters.
# FUNCTION
# URL encode a given string of characters.
# INPUTS
# theURL - String to URL encode.
# OUTPUT
# The function returns the URL encoded string.
# SOURCE

sub URLEncode
{
  my $theURL = $_[0];
  $theURL =~ s/([\W])/"%" . uc(sprintf("%2.2x",ord($1)))/eg;
  return $theURL;
}

#****f* post/run_cgi
# NAME
# run_cgi - Run a given plugin using the CGI POST method (to pass parameters).
# FUNCTION
# Run a given plugin using the CGI POST method (to pass parameters).
# INPUTS
# path - Path to the plugin to execute.
# params - Reference to a hash that contains the couples "name=value" (sent to the plugin).
# response - Reference to a string used to store the plugin's response.
# timeout - Timeout (in seconds) for the plugin's execution.
# pid - Reference to an integer that will be used to store the PID of the plugin.
# status - Reference to an integer that will be used to store the plugin's status (return value).
# err - Reference to a string used to store an error message, if required.
# OUTPUT
# The function may return one of the following values:
# o 'PLUGIN_OK': Plugin's execution was successful.
# This means that everything was OK and that the plugin returned the value 0.
# o 'PLUGIN_ERROR': Plugin's execution was unsuccessful.
# This means that everything was OK but the value returned by the plugin is not 0.
# o 'TIMEOUT': The plugin has been killed because its execution time exceeded the timeout).
# o 'BROKEN_PIPE': The plugin terminates prematurely.
# o 'SYSTEM_ERROR': A system error occured.
# o 'UNEXPECTED_ERROR': An unclassified error occured.
# SOURCE

sub run_cgi
{
  my ($path, $params, $response, $timeout, $pid, $status, $err) = @_;
  my $pidd;
  my $post_string;
  my $return_status;



  $$err = 'SUCCESS';
  $$response = '';
  $GOT_SIGPIPE = 0;
  $GOT_SIGALARM = 0;
  $$status = -1;
  $return_status = 'OK';

  # --------------------------------------------------
  # Format the URL encoded query string
  # --------------------------------------------------

  $post_string = '';
  foreach my $key (keys %{$params})
  {
    my $value = $params->{"$key"};
    $post_string .= URLEncode("$key") . '=' . URLEncode("$value") . '&';
  }
  $post_string =~ s/\&$//;

  # --------------------------------------------------
  # Set the CGI's environement variables. We set
  # only the very necessary.
  # --------------------------------------------------

  $ENV{'CONTENT_LENGTH'} = length("$post_string");
  $ENV{'REQUEST_METHOD'} = 'POST';

  # --------------------------------------------------
  # Execute an external program
  # --------------------------------------------------

  eval { $pidd = open2(\*READER, \*WRITER, "$path"); };

  if ($@)
  {
    $$err = "Error while starting plugin (open2() failed): $@";
    return 'SYSTEM_ERROR';
  }

  autoflush WRITER 1;
  $$pid = $pidd;

  # --------------------------------------------------
  # Write data to the plugin. This may raise SIGPIPE
  # --------------------------------------------------

  eval {
         local $SIG{'PIPE'} = sub { $GOT_SIGPIPE = 1; die "PIPE\n"; };
         print WRITER "$post_string";
         close WRITER;
       };

  if ($@)
  {
    if ($GOT_SIGPIPE == 1) {
                             $$err = "Can not run plugin: $@";
                             $return_status = 'BROKEN_PIPE';
                             goto END_OF_PLUGIN;
                           }

    $$err = "Unexpected error: $@";
    $return_status = 'UNEXPECTED_ERROR';
    goto END_OF_PLUGIN;
  }

  # --------------------------------------------------
  # Now, get the CGI's response.
  # - This may raise SIGPIPE.
  # - This may block.
  # - This may involve a timeout.
  # --------------------------------------------------

  eval {
         local $SIG{'PIPE'} = sub { $GOT_SIGPIPE = 1; die "PIPE\n"; };
         local $SIG{'ALRM'} = sub { $GOT_SIGALARM = 1; die "ALRM\n"; };
         alarm $timeout;
         while (<READER>) { $$response .= "$_"; }
         alarm 0;
       };

  if ($@) {
             if ($GOT_SIGPIPE == 1) {
                                      $$err = "Premature end of plugin... Broken pipe";
                                      $return_status = 'BROKEN_PIPE';
                                      goto END_OF_PLUGIN;
                                    }

             if ($GOT_SIGALARM == 1) {
                                       $$err = "Timeout ($timeout seconds)";
                                       $return_status = 'TIMEOUT';
                                       goto END_OF_PLUGIN;
                                     }

             $$err = "Unexpected error: $@";
             $return_status = 'UNEXPECTED_ERROR';
             goto END_OF_PLUGIN;
          }

  # --------------------------------------------------
  # Clean plugin's system ressources (avoid zombies)
  # --------------------------------------------------

  eval {
         local $SIG{'ALRM'} = sub { $GOT_SIGALARM = 1; die "ALRM\n"; };
         alarm $timeout;
         if (waitpid ($pidd, 0) == -1)
         {
           $$err = "System call waitpid() failed: $!";
           return 'SYSTEM_ERROR';
         }
         alarm 0;
       };

  if ($@) {
             if ($GOT_SIGALARM == 1) {
                                       $$err = "Timeout ($timeout seconds)";
                                       $return_status = 'TIMEOUT';
                                       goto END_OF_PLUGIN;
                                     }

             $$err = "Unexpected error: $@";
             $return_status = 'UNEXPECTED_ERROR';
             goto END_OF_PLUGIN;
          }

  # --------------------------------------------------
  # If the program's pointer reaches this point, it
  # means that the plugin is stoped.
  # --------------------------------------------------

  $$status = $?;

  if ($$status != 0) {
                       $$err = "Plugin's return value is " . $$status . " (not 0), this means error";
                       return 'PLUGIN_ERROR';
                     }

  return 'PLUGIN_OK';

  # --------------------------------------------------
  # If the program's pointer reaches this point, it
  # means that the plugin may be stoped. But in all
  # cases, it should be stoped.
  # --------------------------------------------------

  END_OF_PLUGIN:

  kill 9, $pidd;
  waitpid $pidd, 0;
  $$status = $?;

  return $return_status;
}

#******

1;