Modeling Tools, Data and Information > Perl

A quick introduction

It's hard to describe perl succinctly other than to say that it's a very handy language for lots of things.

It is particularly good at manipulating ASCII text files. Here's an example: suppose you had an ASCII dataset from some survey and each line looked like this:

     1982 F 45 16 ......
       |  |  |  |   |
       |  |  |  |   +- Other information
       |  |  |  +----- Number of children
       |  |  +-------- Age of respondent
       |  +----------- Sex of respondent
       +-------------- Year of observation
If you wanted to know the average number of children for men and women over 30 in 1982 you could use a perl program like this:
     while( <> )       			# go through all line in file
        {
        next unless /^1982/;            # gets rid of other years
        ($yr,$sex,$age,$kids) = split;  # break up the line into pieces
        next unless $age > 30;          # skip younger people
        $parents{$sex}++;               # Add one to parent tally by sex
        $totkids{$sex} += $kids         # Add kids to total by parent sex
        }
      $avg_dads = $totkids{"M"}/$parents{"M"};  # figure out average for M
      $avg_moms = $totkids{"F"}/$parents{"F"};  # figure out average for F
      print "Average for moms: $avg_moms\n";
      print "Average for dads: $avg_dads\n";<
You would put this program in file, say "countem.p", and then execute it like so:
     perl countem.p data_set_name_here
Ok, this example is contrived and the program seems very arcane with all those $'s. Also, perl certainly isn't a substitute for more sophisticated statistical packages.

However, notice the benefits of using perl: (1) the dataset need not be converted to SAS, TSP or any other format; (2) it would be easy to change the cutoff age or other details (find average age of people with more than 2 kids, etc.); (3) there are none of the number/character/formatting/initialization hassles there would be if you tried to use an ordinary programming language like C or Fortran. c Perl is also very good for managing files and other things that statistical packages can't do at all and that programming languages can do, but only with a lot more work.

There is an excellent book available which explains all about perl: "Programming Perl" by Larry Wall (the author of perl) and someone else. It is published by O'Reilly and Associates and is carried by the Co-op in their computer trade section. The manual page ("man perl") is comprehensive but less convenient.

Perl is also available for PCs and most other platforms.

Site Index | Zoom | Admin
URL: https://wilcoxen.maxwell.insightworks.com/pages/86.html
Peter J Wilcoxen, The Maxwell School, Syracuse University
Revised 06/07/2004