JDT

 

John Dixon
Technology
Limited

 
Google

Using Perl and Regular Expressions to Process ASCII Files - Part 4


In Part 1 we had a quick look at what Perl and regular expressions are, and introduced the idea of using them to process HTML files. In Part 2 we developed a Perl script to process a single HTML file. In part 3 we looked at one way of processing multiple files. In this part we'll look at an additional way to import files for processing.

In Part 3 we wrote a script (script2.pl) that enabled us to enter filenames at the command prompt:

     c:>perl script2.pl file1.htm file2.htm file3.htm

Although this script enables us to process as many files as we want to, the drawback is that all the filenames need to be manually typed in. This is fine if you only want to process a few files, but if you've got hundreds or thousands to process, this approach would not be feasible.

script2.pl

1 foreach $file (@ARGV) {
2 rename $file, "$file.bak";
3 open (IN, "<$file.bak");
4 open (OUT, ">$file");
5 while ($line = <IN>) {
6     $line =~ s/<h1>/<h1 class="big">/;
7     (print OUT $line);
8 }
9 close IN;
10 close OUT;
11 }

In script2.pl, it is line 1 that enables us to enter filenames at the command prompt. script3.pl, which is listed below, provides us with a way to process all the HTML files (that have a .htm extension) in the current directory/folder. This is the directory where all the files to be processed, and the script itself, are located.

script3.pl

1 opendir(DIR, ".") or die "can't opendir: $!";
2 @allfiles = grep (/\.htm$/i, readdir DIR);
3 closedir(DIR);
4 foreach $name (@allfiles) {
5 rename $file, "$file.bak";
6 open (IN, "<$file.bak");
7 open (OUT, ">$file");
8 while ($line = <IN>) {
9     $line =~ s/<h1>/<h1 class="big">/;
10     (print OUT $line);
11 }
12 close IN;
13 close OUT;
14 }

The only difference between script2.pl and script3.pl is the first few lines. Let's look at the new lines in script3.pl.

Line 1
Opens the current directory (signified by a dot ".") for processing. It is given a directory handle of DIR. If the directory cannot be opened, an error message is displayed.

Line 2
This line reads in all the .htm files in the directory, and puts them in an array called @allfiles. In Perl, a '@' indicates an array, and a '$' indicates a variable. A variable stores a single value, whereas an array stores a list of values.

grep is a search command from the UNIX world.

Line 3
This line closes the DIR directory handle.

Running the Script

To run the script, at the command line type:

     C:>perl script3.pl


Author: John Dixon
John Dixon Technology Ltd







Go to Using Perl and Regular Expressions to Process ASCII Files - Part 1

Go to Using Perl and Regular Expressions to Process ASCII Files - Part 2

Go to Using Perl and Regular Expressions to Process ASCII Files - Part 3

Go to Using Perl and Regular Expressions to Process ASCII Files - Part 5

Go back to Perl Tutorials home page

Go back to Tutorials home page



Need a FREE bookkeeping solution?

Why not try Earnings Tracker? John Dixon Technology's free accounting software.

The software is written in PHP and MySQL and is available to use for FREE online, or as a FREE download.

Need free accounting software
 



JDT

© 2007-2009 - John Dixon Technology Ltd

Privacy Statement

Terms & Conditions