Perl Free Tutorial

Web based School

Chapter 17

System Variables


CONTENTS

ToChapter's lesson describes the built-in system variables that can be referenced from every Perl program. These system variables are divided into five groups:

  • Global scalar variables
  • Pattern system variables
  • File system variables
  • Array system variables
  • Built-in file variables

The following sections describe these groups of system variables, and also describe how to provide English-language equivalents of their variable names.

Global Scalar Variables

The global scalar variables are built-in system variables that behave just like the scalar variables you create in the main body of your program. This means that these variables have the following properties:

  • Each built-in global scalar variable stores only one scalar value.
  • Only one copy of a global scalar variable is defined in a program.

Other kinds of built-in scalar variables, which you will see later in this lesson, do not behave in this way.

The following sections describe the global scalar variables your Perl programs can use.

The Default Scalar Variable: $_

The most commonly used global scalar variable is the $_ variable. Many Perl functions and operators modify the contents of $_ if you do not explicitly specify the scalar variable on which they are to operate.

The following functions and operators work with the $_ variable by default:

  • The pattern-matching operator
  • The substitution operator
  • The translation operator
  • The <> operator, if it appears in a while or for conditional expression
  • The chop function
  • The print function
  • The study function

The Pattern-Matching Operator and $_

Normally, the pattern-matching operator examines the value stored in the variable specified by a corresponding =~ or !~ operator. For example, the following statement prints hi if the string abc is contained in the value stored in $val:


print ("hi") if ($val =~ /abc/);

By default, the pattern-matching operator examines the value stored in $_. This means that you can leave out the =~ operator if you are searching $_:


print ("hi") if ($_ =~ /abc/);

print ("hi") if (/abc/);         # these two are the same


NOTE
If you want to use the !~ (true-if-pattern-not-matched) operator, you will always need to specify it explicitly, even if you are examining $_:
print ("hi") if ($_ !~ /abc/);
If the Perl interpreter sees just a pattern enclosed in / characters, it assumes the existence of a =~ operator

$_ enables you to use pattern-sequence memory to extract subpatterns from a string and assign them to an array variable:


$_ = "This string contains the number 25.11.";

@array = /-?(\d+)\.?(\d+)/;

In the second statement shown, each subpattern enclosed in parentheses becomes an element of the list assigned to @array. As a consequence, @array is assigned (25,11).

In Perl 5, a statement such as


@array = /-?(\d+)\.?(\d+)/;

also assigns the extracted subpatterns to the pattern-sequence scalar variables $1, $2, and so on. This means that the statement assigns 25 to $1 and 11 to $2. Perl 4 supports assignment of subpatterns to arrays, but does not assign the subpatterns to the pattern-sequence variables.

The Substitution Operator and $_

The substitution operator, like the pattern-matching operator, normally modifies the contents of the variable specified by the =~ or !~ operator. For example, the following statement searches for abc in the value stored in $val and replaces it with def:


$val =~ s/abc/def/;

The substitution operator uses the $_ variable if you do not specify a variable using =~. For example, the following statement replaces the first occurrence of abc in $_ with def:


s/abc/def/;

Similarly, the following statement replaces all white space (spaces, tabs, and newline characters) in $_ with a single space:


/\s+/ /g;

When you substitute inside $_, the substitution operator returns the number of substitutions performed:


$subcount = s/abc/def/g;

Here, $subcount contains the number of occurrences of abc that have been replaced by def. If abc is not contained in the value stored in $_, $subcount is assigned 0.

The Translation Operator and $_

The behavior of the translation operator is similar to that of the pattern-matching and substitution operators: it normally operates on the variable specified by =~, and it operates on $_ if no =~ operator is included. For example, the following statement translates all lowercase letters in the value stored in $_ to their uppercase equivalents:


tr/a-z/A-Z/;

Like the substitution operator, if the translation operator is working with $_, it returns the number of operations performed. For example:


$conversions = tr/a-z/A-Z/;

Here, $conversions contains the number of lowercase letters converted to uppercase.

You can use this feature of tr to count the number of occurrences of particular characters in a file. Listing 17.1 is an example of a program that performs this operation.


Listing 17.1. A program that counts using tr.

1:  #!/usr/local/bin/perl

2:  

3:  print ("Specify the nonblank characters you want to count:\n");

4:  $countstring = <STDIN>;

5:  chop ($countstring);

6:  @chars = split (/\s*/, $countstring);

7:  while ($input = <>) {

8:          $_ = $input;

9:          foreach $char (@chars) {

10:                 eval ("\$count = tr/$char/$char/;");

11:                 $count{$char} += $count;

12:         }

13: }

14: foreach $char (sort (@chars)) {

15:         print ("$char appears $count{$char} times\n");

16: }



$ program17_1 file1

Specify the nonblank characters you want to count:

abc

a appears 8 times

c appears 3 times

b appears 2 times

$

This program first asks the user for a line of input containing the characters to be counted. These characters can be separated by spaces or jammed into a single word.

Line 5 takes the line of input containing the characters to be counted and removes the trailing newline character. Line 6 then splits the line of input into separate characters, each of which is stored in an element of the array @chars. The pattern /\s*/ splits on zero or more occurrences of a whitespace character; this splits on every nonblank character and skips over the blank characters.

Line 7 reads a line of input from a file whose name is specified on the command line. Line 8 takes this line and stores it in the system variable $_. (In most cases, system variables can be assigned to, just like other variables.)

Lines 9-12 count the number of occurrences of each character in the input string read in line 4. Each character, in turn, is stored in $char, and the value of $char is substituted into the string in line 10. This string is then passed to eval, which executes the translate operation contained in the string.

The translate operation doesn't actually do anything because it is "translating" a character to itself. However, it returns the number of translations performed, which means that it returns the number of occurrences of the character. This count is assigned to $count.

For example, suppose that the variable $char contains the character e and that $_ contains Hi there!. In this case, the string in line 10 becomes the following because e is substituted for $char in the string:


$count = tr/e/e/;

The call to eval executes this statement, which counts the number of e's in Hi there!. Because there are two e's in Hi there!, $count is assigned 2.

An associative array, %count, keeps track of the number of occurrences of each of the characters being counted. Line 11 adds the count returned by line 10 to the associative array element whose subscript is the character currently being counted. For example, if the program is currently counting the number of e's, this number is added to the element $count{"e"}.

After all input lines have been read and their characters counted, lines 14-16 print the total number of occurrences of each character by examining the elements of %count.

The <> Operator and $_

In Listing 17.1, which you've just seen, the program reads a line of input into a scalar variable named $input and then assigns it to $_. There is a quicker way to carry out this task, however. You can replace


while ($input = <>) {

        $_ = $input;

        # more stuff here

}

with the following code:


while (<>) {

        # more stuff here

}

If the <> operator appears in a conditional expression that is part of a loop (an expression that is part of a conditional statement such as while or for) and it is not to the right of an assignment operator, the Perl interpreter automatically assigns the resulting input line to the scalar variable $_.

For example, Listing 17.2 shows a simple way to print the first character of every input line read from the standard input file.


Listing 17.2. A simple program that assigns to $_ using <STDIN>.

1:  #!/usr/local/bin/perl

2:  

3:  while (<STDIN>) {

4:          ($first) = split (//, $_);

5:          print ("$first\n");

6:  }



$ program17_2

This is a test.

T

Here is another line.

H

^D

$

Because <STDIN> is inside a conditional expression and is not assigned to a scalar variable, the Perl interpreter assigns the input line to $_. The program then retrieves the first character by passing $_ to split.

NOTE
The <> operator assigns to $_ only if it is contained in a conditional expression in a loop. The statement
<STDIN>;
reads a line of input from the standard input file and throws it away without changing the contents of $_. Similarly, the following statement does not change the value of $_:
if (<>) {
print ("The input files are not all empty.\n");
}

The chop Function and $_

By default, the chop function operates on the value stored in the $_ variable. For example:


while (<>) {

        chop;

        # you can do things with $_ here

}

Here, the call to chop removes the last character from the value stored in $_. Because the conditional expression in the while statement has just assigned a line of input to $_, chop gets rid of the newline character that terminates each input line.

The print Function and $_

The print function also operates on $_ by default. The following statement writes the contents of $_ to the standard output file:


print;

Listing 17.3 is an example of a program that simply writes out its input, which it assumes is stored in $_. This program is an implementation of the UNIX cat command, which reads input files and displays their contents.


Listing 17.3. A simple version of the cat command using $_.

1:  #!/usr/local/bin/perl

2:  

3:  print while (<>);



$ program17_3 file1

This is the only line in file "file1".

$

This program uses the <> operator to read a line of input at a time and store it in $_. If the line is nonempty, the print function is called; because no variable is specified with print, it writes out the contents of $_.

NOTE
You can use this default version of print only if you are writing to the default output file (which is usually STDOUT but can be changed using the select function). If you are specifying a file variable when you call print, you also must specify the value you are printing.
For example, to send the contents of $_ to the output file MYFILE, use the following command:
print MYFILE ($_)

The study Function and $_

If you do not specify a variable when you call study, this function uses $_ by default:


study;

The study function increases the efficiency of programs that repeatedly search the same variable. It is described on Chapter 13, "Process, String, and Mathematical Functions."

Benefits of the $_ Variable

The default behavior of the functions listed previously is useful to remember when you are writing one-line Perl programs for use with the -e option. For example, the following command is a quick way to display the contents of the files file1, file2, and file3:


$ perl -e "print while <>;" file1 file2 file3

Similarly, the following command changes all occurrences of abc in file1, file2, and file3 to def:


$ perl -ipe "s/abc/def/g" file1 file2 file3

TIP
Although $_ is useful in cases such as the preceding one, don't overuse it. Many Perl programmers write programs that have references to $_ running like an invisible thread through their programs.
Programs that overuse $_ are hard to read and are easier to break than programs that explicitly reference scalar variables you have named yourself

The Program Name: $0

The $0 variable contains the name of the program you are running. For example, if your program is named perl1, the statement


print ("Now executing $0...\n");

displays the following on your screen:


Now executing perl1...

The $0 variable is useful if you are writing programs that call other programs. If an error occurs, you can determine which program detected the error:


die ("$0: can't open input file\n");

Here, including $0 in the string passed to die enables you to specify the filename in your error message. (Of course, you can always leave off the trailing newline, which tells Perl to print the filename and the line number when printing the error message. However, $0 enables you to print the filename without the line number, if that's what you want.)

NOTE
You can change your program name while it is running by modifying the value stored in $0

The User ID: $< and $>

The $< and $> variables contain, respectively, the real user ID and effective user ID for the program. The real user ID is the ID under which the user of the program logged in. The effective user ID is the ID associated with this particular program (which is not always the same as the real user ID).

NOTE
If you are not running your Perl program on the UNIX operating system, the $< and $> variables might have no meaning. Consult your local documentation for more details

Listing 17.4 uses the real user ID to determine the user name of the person running the program.


Listing 17.4. A program that uses the $< variable.

1:  #!/usr/local/bin/perl

2:  

3:  ($username) = getpwuid($<);

4:  print ("Hello, $username!\n");



$ program17_4

Hello, dave!

$

The $< variable contains the real user ID, which is the login ID of the person running this program. Line 3 passes this user ID to getpwuid, which retrieves the password file entry corresponding to this user ID. The user name is the first element in this password file, and it is stored in the scalar variable $username. Line 4 then prints this user name.

NOTE
On certain UNIX machines, you can assign $< to $> (set the effective user ID to be the real user ID) or vice versa. If you have superuser privileges, you can set $< or $> to any defined user ID

The Group ID: $( and $)

The $( and $) variables define the real group ID and the effective group ID for this program. The real group ID is the group to which the real user ID (stored in the variable $<) belongs; the effective group ID is the group to which the effective user ID (stored in the variable $>) belongs.

If your system enables users to be in more than one group at a time, $( and $) contain a list of group IDs, with each pair of group IDs being separated by spaces. You can convert this into an array by calling split.

Normally, you can only assign $( to $), and vice versa. If you are the superuser, you can set $( or $) to any defined group ID.

NOTE
$( and $) might not have any useful meaning if you are running Perl on a machine running an operating system other than UNIX

The Version Number: $]

The $] system variable contains the current version number. You can use this variable to ensure that the Perl on which you are running this program is the right version of Perl (or is a version that can run your program).

Normally, $] contains a character string similar to this:


$RCSfile: perl.c,v $$Revision: 4.0.1.8 $$Date: 1993/02/05 19:39:30 $

Patch level: 36

The useful parts of this string are the revision number and the patch level. The first part of the revision number indicates that this is version 4 of Perl. The version number and the patch level are often combined; in this notation, this is version 4.036 of Perl.

You can use the pattern-matching operator to extract the useful information from $]. Listing 17.5 shows one way to do it.


Listing 17.5. A program that extracts information from the $] variable.

1:  #!/usr/local/bin/perl

2:  

3:  $] =~ /Revision: ([0-9.]+)/;

4:  $revision = $1;

5:  $] =~ /Patch level: ([0-9]+)/;

6:  $patchlevel = $1;

7:  print ("revision $revision, patch level $patchlevel\n");



$ program17_5

revision 4.0.1.8, patch level 36

$

This program just extracts the revision and patch level from $] using the pattern-matching operator. The built-in system variable $1, described later toChapter, is defined when a pattern is matched. It contains the substring that appears in the first subpattern enclosed in parentheses. In line 3, the first subpattern enclosed in parentheses is [0-9.]+. This subpattern matches one or more digits mixed with decimal points, and so it matches 4.0.1.8. This means that 4.0.1.8 is assigned to $1 by line 3 and is assigned to $revision by line 4.

Similarly, line 5 assigns 36 to $1 (because the subpattern [0-9]+, which matches one or more digits, is the first subpattern enclosed in parentheses). Line 6 then assigns 36 to $patchlevel.

On some machines, the value contained in $] might be completely different from the value used in this example. If you are not sure whether $] has a useful value, write a little program that just prints $]. If this program prints something useful, you'll know that you can run programs that compare $] with an expected value

The Input Line Separator: $/

When the Perl interpreter is told to read a line of input from a file, it usually reads characters until it reads a newline character. The newline character can be thought of as an input line separator; it indicates the end of a particular line.

The system variable $/ contains the current input line separator. To change the input line separator, change the value of $/. The $/ variable can be more than one character long to handle the case in which lines are separated by more than one character. If you set $/ to the null character, the Perl interpreter assumes that the input line separator is two newline characters.

Listing 17.6 shows how changing $/ can affect your program.


Listing 17.6. A program that changes the value of $/.

1:  #!/usr/local/bin/perl

2:  

3:  $/ = ":";

4:  $line = <STDIN>;

5:  print ("$line\n");



$ program17_6

Here is some test input: here is the end.

Here is some test input:

$

Line 3 sets the value of $/ to a colon. This means that when line 4 reads from the standard input file, it reads until it sees a colon. As a consequence, $line contains the following character string:


Here is some test input:

Note that the colon is included as part of the input line (just as, in the normal case, the trailing newline character is included as part of the line).

The -0 (zero, not the letter O) switch sets the value of $/. If you change the value of $/ in your program, the value specified by -0 will be thrown away.
To temporarily change the value of $/ and then restore it to the value specified by -0, save the current value of $/ in another variable before changing it.
For more information on -0, refer to Chapter 16, "Command-Line Options.

The Output Line Separator: $

The system variable $\ contains the current output line separator. This is a character or sequence of characters that is automatically printed after every call to print.

By default, $\ is the null character, which indicates that no output line separator is to be printed. Listing 17.7 shows how you can set an output line separator.


Listing 17.7. A program that uses the $\ variable.

1:  #!/usr/local/bin/perl

2:  

3:  $\ = "\n";

4:  print ("Here is one line.");

5:  print ("Here is another line.");



$ program17_7

Here is one line.

Here is another line.

$

Line 3 sets the output line separator to the newline character. This means that a list passed to a subsequent print statement always appears on its own output line. Lines 4 and 5 now no longer need to include a newline character as the last character in the line.

The -l option sets the value of $\. If you change $\ in your program without saving it first, the value supplied with -l will be lost. See Chapter 16 for more information on the -l option

The Output Field Separator: $,

The $, variable contains the character or sequence of characters to be printed between elements when print is called. For example, in the following statement the Perl interpreter first writes the contents of $a:


print ($a, $b);

It then writes the contents of $, and then finally, the contents of $b.

Normally, the $, variable is initialized to the null character, which means that the elements of a print statement are printed next to one another. Listing 17.8 is a program that sets $, before calling print.


Listing 17.8. A program that uses the $, variable.

1:  #!/usr/local/bin/perl

2:  

3:  $a = "hello";

4:  $b = "there";

5:  $, = " ";

6:  $\ = "\n";

7:  print ($a, $b);



$ program17_8

hello there

$

Line 5 sets the value of $, to a space. Consequently, line 7 prints a space after printing $a and before printing $b.

Note that $\, the default output separator, is set to the newline character. This setting ensures that the terminating newline character immediately follows $b. By contrast, the following statement prints a space before printing the trailing newline character:


print ($a, $b, "\n");

NOTE
Here's another way to print the newline immediately after the final element that doesn't involve setting $\:
print ($a, $b . "\n");
Here, the trailing newline character is part of the second element being printed. Because $b and \n are part of the same element, no space is printed between them

The Array Element Separator: $"

Normally, if an array is printed inside a string, the elements of the array are separated by a single space. For example:


@array = ("This", "is", "a", "list");

print ("@array\n");

Here, the print statement prints


This is a list

A space is printed between each pair of array elements.

The built-in system variable that controls this situation is the $" variable. By default, $" contains a space. Listing 17.9 shows how you can control your array output by changing the value of $".


Listing 17.9. A program that uses the $" variable.

1:  #!/usr/local/bin/perl

2:  

3:  $" = "::";

4:  @array = ("This", "is", "a", "list");

5:  print ("@array\n");



$ program17_9

This::is::a::list

$

Line 3 sets the array element separator to :: (two colons). Array element separators, like other separators you can define, can be more than one character long.

Line 5 prints the contents of @array. Each pair of elements is separated by the value stored in $", which is two colons.

NOTE
The $" variable affects only entire arrays printed inside strings. If you print two variables together in a string, as in
print ("$a$b\n");
the contents of the two variables are printed with nothing separating them regardless of the value of $".
To change how arrays are printed outside strings, use $\, described earlier toChapter

The Number Output Format: $#

By default, when the print function prints a number, it prints it as a 20-digit floating point number in compact format. This means that the following statements are identical if the value stored in $x is a number:


print ($x);

printf ("%.20g", $x);

To change the default format that print uses to print numbers, change the value of the $# variable. For example, to specify only 15 digits of precision, use this statement:


$# = "%.15g";

This value must be a floating-point field specifier, as used in printf and sprintf.

NOTE
The $# variable does not affect values that are not numbers and has no effect on the printf, write, and sprintf functions

For more information on the field specifiers you can use as the default value in $#, see "Formatting Output Using printf" on Chapter 11, "Formatting Your Output."

NOTE
The $# variable is deprecated in Perl 5. This means that although $# is supported, it is not recommended for use and might be removed from future versions of Perl

The eval Error Message: $@

If a statement executed by the eval function contains an error, or an error occurs during the execution of the statement, the error message is stored in the system variable $@. The program that called eval can decide either to print the error message or to perform some other action.

For example, the statement


eval ("This is not a perl statement");

assigns the following string to $@:


syntax error in file (eval) at line 1, next 2 tokens "This is"

The $@ variable also returns the error generated by a call to die inside an eval. The following statement assigns this string to $@:


eval ("die (\"nothing happened\")");

nothing happened at (eval) line 1.

NOTE
The $@ variable also returns error messages generated by the require function. See Chapter 19, "Object-Oriented Programming in Perl," for more information on require

The System Error Code: $?

The $? variable returns the error status generated by calls to the system function or by calls to functions enclosed in back quotes, as in the following:


$username = 'hostname';

The error status stored in $? consists of two parts:

  • The exit value (return code) of the process called by system or specified in back quotes
  • A status field that indicates how the process was terminated, if it terminated abnormally

The value stored in $? is a 16-bit integer. The upper eight bits are the exit value, and the lower eight bits are the status field. To retrieve the exit value, use the >> operator to shift the eight bits to the right:


$retcode = $? >> 8;

For more information on the status field, refer to the online manual page for the wait function or to the file /usr/include/sys/wait.h. For more information on commands in back quotes, refer to Chapter 20, "Miscellaneous Features of Perl."

The System Error Message: $!

Some Perl library functions call system library functions. If a system library function generates an error, the error code generated by the function is assigned to the $! variable. The Perl library functions that call system library functions vary from machine to machine.

NOTE
The $! variable in Perl is equivalent to the errno variable in the C programming language

The Current Line Number: $.

The $. variable contains the line number of the last line read from an input file. If more than one input file is being read, $. contains the line number of the last input file read. Listing 17.10 shows how $. works.


Listing 17.10. A program that uses the $. variable.

1:  #!/usr/local/bin/perl

2:  

3:  open (FILE1, "file1") ||

4:          die ("Can't open file1\n");

5:  open (FILE2, "file2") ||

6:          die ("Can't open file2\n");

7:  $input = <FILE1>;

8:  $input = <FILE1>;

9:  print ("line number is $.\n");

10: $input = <FILE2>;

11: print ("line number is $.\n");

12: $input = <FILE1>;

13: print ("line number is $.\n");



$ program17_10

line number is 2

line number is 1

line number is 3

$

When line 9 is executed, the input file FILE1 has had two lines read from it. This means that $. contains the value 2. Line 10 then reads from FILE2. Because it reads the first line from this file, $. now has the value 1. When line 12 reads a third line from FILE1, $. is set to the value 3. The Perl interpreter remembers that two lines have already been read from FILE1.

NOTE
If the program is reading using <>, which reads from the files listed on the command line, $. treats the input files as if they are one continuous file. The line number is not reset when a new input file is opened
You can use eof to test whether a particular file has ended, and then reset $. yourself (by assigning zero to it) before reading from the next file.

Multiline Matching: $*

Normally, the operators that match patterns (the pattern-matching operator and the substitution operator) assume that the character string being searched is a single line of text. If the character string being searched consists of more than one line of text (in other words, it contains newline characters), set the system variable $* to 1.

NOTE
By default, $* is set to 0, which indicates that multiline pattern matches are not required


The $* variable is deprecated in Perl 5. If you are running Perl 5, use the m pattern-matching option when matching in a multiple-line string. See Chapter 7, "Pattern Matching," for more details on this option

The First Array Subscript: $[

Normally, when a program references the first element of an array, it does so by specifying the subscript 0. For example:


@myarray = ("Here", "is", "a", "list");

$here = $myarray[0];

The array element $myarray[0] contains the string Here, which is assigned to $here.

If you are not comfortable with using 0 as the subscript for the first element of an array, you can change this setting by changing the value of the $[ variable. This variable indicates which value is to be used as the subscript for the first array element.

Here is the preceding example, modified to use 1 as the first array element subscript:


$[ = 1;

@myarray = ("Here", "is", "a", "list");

$here = $myarray[1];

In this case, the subscript 1 now references the first array element. This means that $here is assigned Here, as before.

TIP
Don't change the value of $[. It is too easy for a casual reader of your program to forget that the subscript 0 no longer references the first element of the array. Besides, using 0 as the subscript for the first element is standard practice in many programming languages, including C and C++

NOTE
$[ is deprecated in Perl 5

Multidimensional Associative Arrays and the $; Variable

So far, all the arrays you've seen have been one-dimensional arrays, which are arrays in which each array element is referenced by only one subscript. For example, the following statement uses the subscript foo to access an element of the associative array named %array:


$myvar = $array{"foo"};

Perl does not support multidimensional arrays directly. The following statement is not a legal Perl statement:


$myvar = $array{"foo"}{"bar"};

However, Perl enables you to simulate a multidimensional associative array using the built-in system variable $;.

Here is an example of a statement that accesses a (simulated) multidimensional array:


$myvar = $array{"foo","bar"};

When the Perl interpreter sees this statement, it converts it to this:


$myvar = $array{"foo" . $; . "bar"};

The system variable $; serves as a subscript separator. It automatically replaces any comma that is separating two array subscripts.

Here is another example of two equivalent statements:


$myvar = $array{"s1", 4, "hi there"};

$myvar = $array{"s1".$;.4.$;."hi there"};

The second statement shows how the value of the $; variable is inserted into the array subscript.

By default, the value of $; is \034 (the Ctrl+\ character). You can define $; to be any value you want. Listing 17.11 is an example of a program that sets $;.


Listing 17.11. A program that uses the $; variable.

1:  #!/usr/local/bin/perl

2:  

3:  $; = "::";

4:  $array{"hello","there"} = 46;

5:  $test1 = $array{"hello","there"};

6:  $test2 = $array{"hello::there"};

7:  print ("$test1 $test2\n");



$ program17_11

46 46

$

Line 3 sets $; to the string ::. As a consequence, the subscript "hello","there" in lines 4 and 5 is really hello::there because the Perl interpreter replaces the comma with the value of $;.

Line 7 shows that both "hello","there" and hello::there refer to the same element of the associative array.

If you set $;, be careful not to set it to a character that you are actually using in a subscript. For example, if you set $; to ::, the following statements reference the same element of the array:
$array{"a::b", "c"} = 1;
$array{"a", "b::c"} = 2;
In each case, the Perl interpreter replaces the comma with ::, producing the subscript a::b::c

The Word-Break Specifier: $:

On Chapter 11 you learned how to format your output using print formats and the write statement. Each print format contains one or more value fields that specify how output is to appear on the page.

If a value field in a print format begins with the ^ character, the Perl interpreter puts a word in the value field only if there is room enough for the entire word. For example, in the following program (a duplicate of Listing 11.9),


1:  #!/usr/local/bin/perl

2:  

3:  $string = "Here\nis an unbalanced line of\ntext.\n";

4:  $~ = "OUTLINE";

5:  write;

6:  

7:  format OUTLINE =

8:  ^<<<<<<<<<<<<<<<<<<<<<<<<<<<

9:  $string

10: .

the call to write uses the OUTLINE print format to write the following to the screen:


Here is an unbalanced line

Note that the word of is not printed because it cannot fit into the OUTLINE value field.

To determine whether a word can fit in a value field, the Perl interpreter counts the number of characters between the next character to be formatted and the next word-break character. A word-break character is one that denotes either the end of a word or a place where a word can be split into two parts.

By default, the legal word-break characters in Perl are the space character, the newline character, and the - (hyphen) character. The acceptable word break characters are stored in the system variable $:.

To change the list of acceptable word-break characters, change the value of $:. For example, to ensure that all hyphenated words are in the same line of formatted output, define $: as shown here:


$: = " \n";

Now only the space and newline characters are legal word-break characters.

NOTE
Normally, the tab character is not a word-break character. To allow lines to be broken on tabs, add the tab character to the list specified by the $: variable:
$: = " \t\n-"

The Perl Process ID: $$

The $$ system variable contains the process ID for the Perl interpreter itself. This is also the process ID for your program.

The Current Filename: $ARGV

When you use the <> operator, the Perl interpreter reads input from each file named on the command line. For example, suppose that you are executing the program myprog as shown here:


$ myprog test1 test2 test3

In myprog, the first occurrence of the <> operator reads from test1. Subsequent occurrences of <> continue reading from test1 until it is exhausted; at this point, <> reads from test2. This process continues until all the input files have been read.

On Chapter 6, "Reading from and Writing to Files," you learned that the @ARGV array lists the elements of the command line and that the first element of @ARGV is removed when the <> operator reads a line. (@ARGV also is discussed later toChapter.)

When the <> operator reads from a file for the first time, it assigns the name of the file to the $ARGV system variable. This enables you to keep track of what file is currently being read. Listing 17.12 shows how you can use $ARGV.


Listing 17.12. A simple file-searching program using $ARGV.

1:  #!/usr/local/bin/perl

2:  

3:  print ("Enter the search pattern:\n");

4:  $string = <STDIN>;

5:  chop ($string);

6:  while ($line = <>) {

7:          if ($line =~ /$string/) {

8:                  print ("$ARGV:$line");

9:          }

10: }



$ program17_12 file1 file2 file3

Enter the string to search:

the

file1:This line contains the word "the".

$

This program reads each line of the input files supplied on the command line. If a line contains the pattern specified by $string, line 8 prints the name of the file and then the line itself. Note that the pattern in $string can contain special pattern characters.

NOTE
If <> is reading from the standard input file (which occurs when you have not specified any input files on the command line), $ARGV contains the string - (a single hyphen)

The Write Accumulator: $^A

The $^A variable is used by write to store formatted lines to be printed. The contents of $^A are erased after the line is printed.

This variable is defined only in Perl 5.

The Internal Debugging Value: $^D

The $^D variable displays the current internal debugging value. This variable is defined only when the -D switch has been specified and when your Perl interpreter has been compiled with debugging included.

See your online Perl documentation for more details on debugging Perl. (Unless you are using an experimental version of Perl, you are not likely to need to debug it.)

The System File Flag: $^F

The $^F variable controls whether files are to be treated as system files. Its value is the largest UNIX file descriptor that is treated as a system file.

Normally, only STDIN, STDOUT, and STDERR are treated as system files, and the value assigned to $^F is 2. Unless you are on a UNIX machine, are familiar with file descriptors, and want to do something exotic with them, you are not likely to need to use the $^F system variable.

Controlling File Editing Using $^I

The $^I variable is set to a nonzero value by the Perl interpreter when you specify the -i option (which edits files as they are read by the <> operator).

The following statement turns off the editing of files being read by <>:


undef ($^I);

When $^I is undefined, the next input file is opened for reading, and the standard output file is no longer changed.

DO open the files for input and output yourself if your program wants to edit some of its input files and not others; this process is easier to follow.
DON'T use $^I if you are reading files using the -n or -p option unless you really know what you are doing, because you are not likely to get the behavior you expect. If -i has modified the default output file, undefining $^I does not automatically set the default output file to STDOUT

The Format Form-Feed Character: $^L

The $^L variable contains the character or characters written out whenever a print format wants to start a new page. The default value is \f, the form-feed character.

Controlling Debugging: $^P

The $^P variable is used by the Perl debugger. When this variable is set to zero, debugging is turned off.

You normally won't need to use $^P yourself, unless you want to specify that a certain chunk of code does not need to be debugged.

The Program Start Time: $^T

The $^T variable contains the time at which your program began running. This time is in the same format as is returned by the time function: the number of seconds since January 1, 1970.

The following statement sets the file-access and -modification times of the file test1 to the time stored in $^T:


utime ($^T, $^T, "test1");

For more information on the time and utime functions, refer to Chapter 12, "Working with the File System."

NOTE
The time format used by $^T is also the same as that used by the file test operators -A, -C, and -M

Suppressing Warning Messages: $^W

The $^W system variable controls whether warning messages are to be displayed. Normally, $^W is set to a nonzero value only when the -w option is specified.

You can set $^W to zero to turn off warnings inside your program. This capability is useful if your program contains statements that generate warnings you want to ignore (because you know that your statements are correct). For example:


$^W = 0;    # turn off warning messages

# code that generates warnings goes here

$^W = 1;    # turn warning messages back on

Some warnings are printed before program execution starts (for example, warnings of possible typos). You cannot turn off these warnings by setting $^W to zero

The $^X Variable

The $^X variable displays the first word of the command line you used to start this program. If you started this program by entering its name, the name of the program appears in $^X. If you used the perl command to start this program, $^X contains perl.

The following statement checks to see whether you started this program with the command perl:


if ($^X ne "perl") {

        print ("You did not use the 'perl' command ");

        print ("to start this program.\n");

}

Pattern System Variables

The system variables you have seen so far are all defined throughout your program. The following system variables are defined only in the current block of statements you are running. (A block of statements is any group of statements enclosed in the brace characters { and }.) These pattern system variables are set by the pattern-matching operator and the other operators that use patterns (such as, for example, the substitution operator). Many of these pattern system variables were first introduced on Chapter 7.

TIP
Even though the pattern system variables are defined only inside a particular block of statements, your programs should not take advantage of that fact. The safest way to use the pattern-matching variables is to assign any variable that you might need to a scalar variable of your own

Retrieving Matched Subpatterns

When you specify a pattern for the pattern-matching or substitution operator, you can enclose parts of the pattern in parentheses. For example, the following pattern encloses the subpattern \d+ in parentheses. (The parentheses themselves are not part of the pattern.)


/(\d+)\./

This subpattern matches one or more digits.

After a pattern has been matched, the system variables $1, $2, and so on match the subpatterns enclosed in parentheses. For example, suppose that the following pattern is successfully matched:


/(\d+)([a-z]+)/

In this case, the match found must consist of one or more digits followed by one or more lowercase letters. After the match has been found, $1 contains the sequence of one or more digits, and $2 contains the sequence of one or more lowercase letters.

Listing 17.13 is an example of a program that uses $1, $2, and $3 to match subpatterns.


Listing 17.13. A program that uses variables containing matched subpatterns.

1:  #!/usr/local/bin/perl

2:  

3:  while (<>) {

4:          while (/(-?\d+)\.(\d+)([eE][+-]?\d+)?/g) {

5:                  print ("integer part $1, decimal part $2");

6:                  if ($3 ne "") {

7:                          print (", exponent $3");

8:                  }

9:                  print ("\n");

10:         }

11: }



$ program17_13 file1

integer part 26, decimal part 147, exponent e-02

integer part -8, decimal part 997

$

This program reads each input line and searches for floating-point numbers. Line 4 matches if a floating-point number is found. (Line 4 is a while statement, not an if, to enable the program to detect lines containing more than one floating-point number. The loop starting in line 4 iterates until no more matches are found on the line.)

When a match is found, the first set of parentheses matches the digits before the decimal point; these digits are copied into $1. The second set of parentheses matches the digits after the decimal point; these matched digits are stored in $2. The third set of parentheses matches an optional exponent; if the exponent exists, it is stored in $3.

Line 5 prints the values of $1 and $2 for each match. If $3 is defined, its value is printed by line 7.

DO use $1, not $0, to retrieve the first matched subpattern. $0 contains the name of the program you are running.
DON'T confuse $1 with \1. \1, \2, and so on are defined only inside a pattern. See Chapter 7 for more information on \1

In patterns, parentheses are counted starting from the left. This rule tells the Perl interpreter how to handle nested parentheses:


/(\d+(\.)?\d+)/

This pattern matches one or more digits optionally containing a decimal point. When this pattern is matched, the outer set of parentheses is considered to be the first set of parentheses; these parentheses contain the entire matched number, which is stored in $1.

The inner set of parentheses is treated as the second set of parentheses because it includes the second left parenthesis seen by the pattern matcher. The variable $2, which contains the subpattern matched by the second set of parentheses, contains . (a period) if a decimal point is matched and the empty string if it is not.

Retrieving the Entire Pattern: $&

When a pattern is matched successfully, the matched text string is stored in the system variable $&. This is the only way to retrieve the matched pattern because the pattern matcher returns a true or false value indicating whether the pattern match is successful. (This is not strictly true, because you could enclose the entire pattern in parentheses and then check the value of $1; however, $& is easier to use in this case.) Listing 17.14 is a program that uses $& to count all the digits in a set of input files.


Listing 17.14. A program that uses $&.

1:  #!/usr/local/bin/perl

2:  

3:  while ($line = <>) {

4:          while ($line =~ /\d/g) {

5:                $digitcount[$&]++;

6:          }

7:  }

8:  print ("Totals for each digit:\n");

9:  for ($i = 0; $i <= 9; $i++) {

10:         print ("$i: $digitcount[$i]\n");

11: }



$ program17_14 file1

Totals for each digit:

0: 11

1: 6

2: 3

3: 1

4: 2

5:

6: 1

7:

8:

9: 1

$

This program reads one line at a time from the files specified on the command line. Line 4 matches each digit in the input line in turn; the matched digit is stored in $&.

Line 5 takes the value of $& and uses it as the subscript for the array @digitcount. This array keeps a count of the number of occurrences of each digit.

When the input files have all been read, lines 9-11 print the totals for each digit.

NOTE
If you need the value of $&, be sure to get it before exiting the while loop or other statement block in which the pattern is matched. (A statement block is exited when the Perl interpreter sees a } character.)
For example, the pattern matched in line 4 cannot be accessed outside of lines 4-6 because this copy of $& is defined only in these lines. (This rule also holds true for all the other pattern system variables defined in toChapter's lesson.)
The best rule to follow is to either use or assign a pattern system variable immediately following the statement that matches the pattern

Retrieving the Unmatched Text: the $` and $' Variables

When a pattern is matched, the text of the match is stored in the system variable $&. The rest of the string is stored in two other system variables:

  • The unmatched text preceding the match is stored in the $` variable.
  • The unmatched text following the match is stored in the $' variable.

For example, if the Perl interpreter searches for the /\d+/ pattern in the string qwerty1234uiop, it matches 1234, which is stored in $&. The substring qwerty, which precedes the match, is stored in $`. The rest of the string, uiop, is stored in $'.

If the beginning of a text string is matched, $` is set to the empty string. Similarly, if the last character in the string is part of the match, $' is set to the empty string.

The $+ Variable

The $+ variable matches the last subpattern enclosed in parentheses. For example, when the following pattern is matched, $+ matches the digits after the decimal point:


/(\d+)\.(\d+)/

This variable is useful when the last part of a pattern is the only part you really need to look at.

File System Variables

Several system variables are associated with file variables. One copy of each file system variable is defined for each file that is referenced in your Perl program. Many of these system variables were first introduced on Chapter 11. The variables mentioned there are redefined here for your convenience.

The Default Print Format: $~

When the write statement sends formatted output to a file, it uses the value of the $~ system variable for that file to determine the print format to use.

When a program starts running, the default value of $~ for each file is the same as the name of the file variable that represents the file. For example, when you write to the file represented by the file variable MYFILE, the default value of $~ is MYFILE. This means that write normally uses the MYFILE print format. (For the standard output file, this default print format is named STDOUT.)

If you want to specify a different print format, change the value of $~ before calling the write function. For example, to use the print format MYFORMAT when writing to the standard output file, use the following code:


select (STDOUT);  # making sure you are writing to STDOUT

$~ = "MYFORMAT";

write;

This call to write uses MYFORMAT to format its output.

Remember that one copy of $~ is defined for each file variable. Therefore, the following code is incorrect:
$~ = "MYFORMAT";
select (MYFILE);
write;
In this example, the assignment to $~ changes the default print format for whatever the current output file happens to be. This assignment does not affect the default print format for MYFILE because MYFILE is selected after $~ is assigned. To change the default print format for MYFILE, select it first:
select (MYFILE);
$~ = "MYFORMAT";
write;
This call to write now uses MYFORMAT to write to MYFILE

Specifying Page Length: $=

The $= variable defines the page length (number of lines per page) for a particular output file. $= is normally initialized to 60, which is the value that the Perl interpreter assumes is the page length for every output file. This page length includes the lines left for page headers, and it is the length that works for most printers.

If you are directing a particular output file to a printer with a nonstandard page length, change the value of $= for this file before writing to it:


select ("WEIRDLENGTH");

$= = 72;

This code sets the page length for the WEIRDLENGTH file to 72.

$= is set to 60 by default only if a page header format is defined for the page. If no page header is defined, $= is set to 9999999 because Perl assumes that you want your output to be a continuous stream.
If you want paged output without a page header, define an empty page header for the output file

Lines Remaining on the Page: $-

The $- variable associated with a particular file variable lists the number of lines left on the current page of that file. Each call to write subtracts the number of lines printed from $-. If write is called when $- is zero, a new page is started. (If $- is greater than zero, but write is printing more lines than the value of $-, write starts a new page in the middle of its printing operation.)

When a new page is started, the initial value of $- is the value stored in $=, which is the number of lines on the page.

The program in Listing 17.15 displays the value of $-.


Listing 17.15. A program that displays $-.

1:  #!/usr/local/bin/perl

2:  

3:  open (OUTFILE, ">outfile");

4:  select ("OUTFILE");

5:  write;

6:  print STDOUT ("lines to go before write: $-\n");

7:  write;

8:  print STDOUT ("lines to go after write: $-\n");

9:  format OUTFILE =

10:  This is a test.

11: .

12: format OUTFILE_TOP =

13: This is a test.

14: .



$ program17_15

lines to go before write: 58

lines to go after write: 57

$

Line 3 opens the output file outfile and associates the file variable OUTFILE with this file. Line 4 then calls select, which sets the default output file to OUTFILE.

Line 5 calls write, which starts a new page. Line 6 then sends the value of $- to the standard output file, STDOUT, by specifying STDOUT in the call to print. Note that the copy of $- printed is the copy associated with OUTFILE, not STDOUT, because OUTFILE is currently the default output file.

Line 7 calls write, which sends a line of output to OUTFILE and decreases the value of $- by one. Line 8 prints this new value of $-.

NOTE
If you want to force your next output to appear at the beginning of a new page, you can set $- to zero yourself before calling write.
When a file is opened, the copy of $- for this file is given the initial value of zero. This technique ensures that the first call to write always starts a page (and generates the header for the page)

The Page Header Print Format: $^

When write starts a new page, you can specify the page header that is to appear on the page. To do this, define a page header print format for the output file to which the page is to be sent.

The system variable $^ contains the name of the print format to be used for printing page headers. If this format is defined, page headers are printed; if it does not exist, no page headers are printed.

By default, the copy of $^ for a particular file is set equal to the name of the file variable plus the string _TOP. For example, for the file represented by the file variable MYFILE, $^ is given an initial value of MYFILE_TOP.

To change the page header print format for a particular file, set the default output file by calling select, and then set $^ to the print format you want to use. For example:


select (MYFILE);

$^ = "MYHEADER";

This code changes the default output file to MYFILE and then changes the page header print format for MYFILE to MYHEADER. As always, you must remember to select the file before changing $^ because each file has its own copy of $^.

Buffering Output: $|

When you send output to a file using print or write, the operating system might not write it right away. Some systems first send the output to a special array known as a buffer; when the buffer becomes full, it is written all at once. This process of output buffering is usually a more efficient way to write data.

In some circumstances, you might want to send output straight to your output file without using an intervening buffer. (For example, two processes might be sending output to the standard output file at the same time.)

The $| system variable indicates whether a particular file is buffered. By default, the Perl interpreter defines a buffer for each output file, and $| is set to 0. To eliminate buffering for a particular file, select the file and then set the $| variable to a nonzero value. For example, the following code eliminates buffering for the MYFILE output file:


select ("MYFILE");

$| = 1;

These statements set MYFILE as the default output file and then turn off buffering for it.

If you want to eliminate buffering for a particular file, you must set $| before writing to the file for the first time because the operating system creates the buffer when it performs the first write operation

The Current Page Number: $%

Each output file opened by a Perl program has a copy of the $% variable associated with it. This variable stores the current page number. When write starts a new page, it adds one to the value of $%. Each copy of $% is initialized to 0, which ensures that $% is set to 1 when the first page is printed. $% often is displayed by page header print formats.

Array System Variables

The system variables you've seen so far have all been scalar variables. The following sections describe the array variables that are automatically defined for use in Perl programs. All of these variables, except for the @_ variable, are global variables: their value is the same throughout a program.

The @_ Variable

The @_ variable, which is defined inside each subroutine, is a list of all the arguments passed to the subroutine.

For example, suppose that the subroutine my_sub is called as shown here:


&my_sub("hello", 46, $var);

The values hello and 46, plus the value stored in $var, are combined into a three-element list. Inside my_sub, this list is stored in @_.

In a subroutine, the @_ array can be referenced or modified, just as with any other array variable. Most subroutines, however, assign @_ to locally defined scalar variables using the local function:


sub my_sub {

        local ($arg1, $arg2, $arg3) = @_;

        # more stuff goes here

}

Here, the local statement defines three local variables, $arg1, $arg2, and $arg3. $arg1 is assigned the first element of the list stored in @_, $arg2 is assigned the second, and $arg3 is assigned the third.

For more information on subroutines, refer to Chapter 9, "Using Subroutines."

NOTE
If the shift function is called inside a subroutine with no argument specified, the @_ variable is assumed, and its first element is removed

The @ARGV Variable

When you run a Perl program, you can specify values that are to be passed to the program by including them on the command line. For example, the following command calls the Perl program myprog and passes it the values hello and 46:


$ myprog "hello" 46

Inside the Perl program, these values are stored in a special built-in array named @ARGV. In this example, @ARGV contains the list ("hello", 46).

Here is a simple statement that prints the values passed on the command line:


print ("@ARGV\n");

The @ARGV array also is associated with the <> operator. This operator treats the elements in @ARGV as filenames; each file named in @ARGV is opened and read in turn. Refer to Chapter 6 for a description of the <> operator.

NOTE
If the shift function is called in the main body of a program (outside a subroutine) and no arguments are passed with it, the Perl interpreter assumes that the @ARGV array is to have its first element removed.
The following loop assigns each element of @ARGV, in turn, to the variable $var:
while ($var = shift) {
# stuff
}

The @F Variable

In Perl, if you specify the -n or -p option, you can also supply the -a option. This option tells the Perl interpreter to break each input line into individual words (throwing away all tabs and spaces). These words are stored in the built-in array variable @F. After an input line has been (automatically) read, the @F array variable behaves like any other array variable.

For more information on the -a, -n, or -p options, refer to Chapter 16, "Command-Line Options."

NOTE
When the -a option is specified and an input line is broken into words, the original input line can still be accessed because it is stored in the $_ system variable

The @INC Variable

The @INC array variable contains a list of directories to be searched for files requested by the require function. This list consists of the following items, in order from first to last:

  • The directories specified by the -I option
  • The Perl library directory, which is normally /usr/local/bin/perl
  • The current working directory (represented by the . character)

Like any array variable, @INC can be added to or modified.

For more information on the require function, refer to Chapter 19.

The %INC Variable

The built-in associative array %INC lists the files requested by the require function that have already been found.

When require finds a file, the associative array element $INC{file} is defined, in which file is the name of the file. The value of this associative array element is the location of the actual file.

When require requests a file, the Perl interpreter first looks to see whether an associative array element has already been created for this file. This action ensures that the interpreter does not try to include the same code twice.

The %ENV Variable

The %ENV associative array lists the environment variables defined for this program and their values. The environment variables are the array subscripts, and the values of the variables are the values of the array elements.

For example, the following statement assigns the value of the environment variable TERM to the scalar variable $term:


$term = $ENV{"TERM"};

The %SIG Variable

In the UNIX environment, processes can send signals to other processes. These signals can, for example, interrupt a running program, trigger an alarm in the program, or kill off the program.

You can control how your program responds to signals it receives. To do this, modify the %SIG associative array. This array contains one element for each available signal, with the signal name serving as the subscript for the element. For example, the INT (interrupt) signal is represented by the $SIG{"INT"} element.

The value of a particular element of %SIG is the action that is to be performed when the signal is received. By default, the value of an array element is DEFAULT, which tells the program to do what it normally does when it receives this signal.

You can override the default action for some of the signals in two ways: you can tell the program to ignore the signal, or you can define your own signal handler. (Some signals, such as KILL, cannot be overridden.)

To tell the program to ignore a particular type of signal, set the value of the associative array element for this signal to IGNORE. For example, the following statement indicates that the program is to ignore any INT signals it receives:


$SIG{"INT"} = "IGNORE";

If you assign any value other than DEFAULT or IGNORE to a signal array element, this value is assumed to be the name of a function that is to be executed when this signal is received. For example, the following statement tells the program to jump to the subroutine named interrupt when it receives an INT signal:


$SIG{"INT"} = "interrupt";

Subroutines that can be jumped to when a signal is received are called interrupt handlers, because signals interrupt normal program execution. Listing 17.16 is an example of a program that defines an interrupt handler.


Listing 17.16. A program containing an interrupt handler.

1:  #!/usr/local/bin/perl

2:  

3:  $SIG{"INT"} = "wakeup";

4:  sleep();

5:  

6:  sub wakeup {

7:          print ("I have woken up!\n");

8:          exit();

9:  }



$ program17_16

I have woken up!

$

Line 3 tells the Perl interpreter that the program is to jump to the wakeup subroutine when it receives the INT signal. Line 4 tells the program to go to sleep. Because no argument is passed to sleep, the program will sleep until a signal wakes it up.

To wake up the process, get the process ID using the ps command, and then send an INT signal to the process using the kill command. (See the manual page for kill, and the related documentation for signal handling, to see how to perform this task in your environment.)

When the program receives the INT signal, it executes the wakeup subroutine. This subroutine prints the following message and then exits:


I have woken up!

If desired, you can use the same subroutine to handle more than one signal. The signal actually sent is passed as an argument to the called subroutine, which ensures that your subroutine can determine which signal triggered it:


sub interrupt {

        local ($signal) = @_;



        print ("Interrupted by the $signal signal.\n");

}

If a subroutine exits normally, the program returns to where it was executing when it was interrupted. If a subroutine calls exit or die, the program execution is terminated.

NOTE
When a program continues executing after being interrupted, the element of %SIG corresponding to the received signal is reset to DEFAULT. To ensure that repeated signals are trapped by your interrupt handler, redefine the appropriate element of %SIG

Built-In File Variables

Perl provides several built-in file variables, most of which you have previously seen. The only file variables that have not yet been discussed are DATA and _ (underscore). The others are briefly described here for the sake of completeness.

STDIN, STDOUT, and STDERR

The file variable STDIN is, by default, associated with the standard input file. Using STDIN with the <> operator, as in <STDIN>, normally reads data from your keyboard. If your shell has used < or some equivalent redirection operator to specify input from a file, <STDIN> reads from that file.

The file variable STDOUT normally writes to the standard output file, which is usually directed to your screen. If your shell has used > or the equivalent to redirect standard output to a file, writing to STDOUT sends output to that file.

STDERR represents the standard error file, which is almost always directed to your screen. Writing to STDERR ensures that you see error messages even when you have redirected the standard output file.

You can associate STDIN, STDOUT, or STDERR with some other file using open:


open (STDIN, "myinputfile");

open (STDOUT, "myoutputfile");

open (STDERR, "myerrorfile");

Opening a file and associating it with STDIN overrides the default value of STDIN, which means that you can no longer read from the standard input file. Similarly, opening a file and associating it with STDOUT or STDERR means that writing to that particular file variable no longer sends output to the screen.

To associate a file variable with the standard input file after you have redirected STDIN, specify a filename of -:


open (MYSTDIN, "-");

To associate a file variable with the standard output file, specify a filename of >-:


open (MYSTDOUT, ">-");

You can, of course, specify STDIN with - or STDOUT with >- to restore the original values of these file variables.

ARGV

ARGV is a special file variable that is associated with the current input file being read by the <> operator. For example, consider the following statement:


$line = <>;

This statement reads from the current input file. Because ARGV represents the current input file, the preceding statement is equivalent to this:


$line = <ARGV>;

You normally will not need to access ARGV yourself except via the <> operator.

DATA

The DATA file variable is used with the __END__ special value, which can be used to indicate the end of a program. Reading from DATA reads the line after __END__, which enables you to include a program and its data in the same file.

Listing 17.17 is an example of a program that reads from DATA.


Listing 17.17. An example of the DATA file variable.

1:  #!/usr/local/bin/perl

2:  

3:  $line = <DATA>;

4:  print ("$line");

5:  __END__

6:  This is my line of data.



$ program17_17

This is my line of data.

$

The __END__ value in line 5 indicates the end of the program. When line 3 reads from the DATA file variable, the first line after __END__ is read in and is assigned to $line. (Subsequent requests for input from DATA read successive lines, if any exist.) Line 6 then prints this input line.

NOTE
For more information on __END__ and methods of indicating the end of the program, refer to Chapter 20, "Miscellaneous Features of Perl.

The Underscore File Variable

The _ (underscore) file variable represents the file specified by the last call to either the stat function or a file test operator. For example:


$readable = -r "/u/jqpublic/myfile";

$writeable = -w _;

Here, the _ file variable used in the second statement refers to /u/jqpublic/myfile because this is the filename that was passed to -r.

You can use _ anywhere that a file variable can be used, provided that the file has been opened appropriately:


if (-T $myoutfile) {

        print _ ("here is my output\n");

}

Here, the file whose name is stored in $myoutfile is associated with _ because this name was passed to -T (which tests whether the file is a text file). The call to print writes output to this file.

The main benefit of _ is that it saves time when you are using several file-test operators at once:


if (-r "myfile" || -w _ || -x _) {

        print ("I can read, write, or execute myfile.\n");

}

Using _ rather than myfile saves time because file test operators normally call the UNIX system function stat. If you specify _, the Perl interpreter is told to use the results of the preceding call to the UNIX stat function and to not bother calling it again.

Specifying System Variable Names as Words

As you have seen, the system variables defined by Perl normally consist of a $, @ or % followed by a single non-alphanumeric character. This ensures that you cannot define a variable whose name is identical to that of a Perl system variable.

If you find Perl system variable names difficult to remember or type, Perl 5 provides an alternative for most of them. If you add the statement


use English;

at the top of your program, Perl defines alternative variable names that more closely resemble English words. This makes it easier to understand what your program is doing. Table 17.1 lists these alternative variable names.

Table 17.1. Alternative names for Perl system variables.

VariableAlternative name(s)
$_$ARG
$0$PROGRAM_NAME
$<$REAL_USER_ID or $UID
$>$EFFECTIVE_USER_ID or $EUID
$($REAL_GROUP_ID or $GID
$)$EFFECTIVE_GROUP_ID or $EGID
$]$PERL_VERSION
$/$INPUT_RECORD_SEPARATOR or $RS
$\$OUTPUT_RECORD_SEPARATOR or $ORS
$,$OUTPUT_FIELD_SEPARATOR or $OFS
$"$LIST_SEPARATOR
$#$OFMT
$@$EVAL_ERROR
$?$CHILD_ERROR
$!$OS_ERROR or $ERRNO
$.$INPUT_LINE_NUMBER or $NR
$*$MULTILINE_MATCHING
$[none (deprecated in Perl 5)
$;$SUBSCRIPT_SEPARATOR or $SUBSEP
$:$FORMAT_LINE_BREAK_CHARACTERS
$$$PROCESS_ID or $PID
$^A$ACCUMULATOR
$^D$DEBUGGING
$^F$SYSTEM_FD_MAX
$^I$INPLACE_EDIT
$^L$FORMAT_FORMFEED
$^P$PERLDB
$^T$BASETIME
$^W$WARNING
$^X$EXECUTABLE_NAME
$&$MATCH
$'$PREMATCH
$'$POSTMATCH
$+$LAST_PAREN_MATCH
$~$FORMAT_NAME
$=$FORMAT_LINES_PER_PAGE
$-$FORMAT_LINES_LEFT
$^$FORMAT_TOP_NAME
$|$OUTPUT_AUTOFLUSH
$%$FORMAT_PAGE_NUMBER

Summary

ToChapter you learned about the built-in system variables available within every Perl program. These system variables are divided into five groups:

  • Global scalar variables, which are defined everywhere in the program and contain a single scalar value
  • Pattern system variables, which are defined immediately after a pattern-matching or substitution operation has been performed
  • File system variables, which are defined for each input or output file accessible from the program
  • Array system variables, each of which contains a list
  • Built-in file variables, which are associated with files that are automatically open or automatically available

You also learned how to specify English-language equivalents for Perl system variables.

Q&A

Q:Why do some system variables use special characters rather than letters in their names?
A:To distinguish them from variables that you define and to ensure that the reset function (described in the next chapter) cannot affect them.
Q:Why do some functions use $_ as the default, whereas others do not?
A:The functions that use $_ as the default are those that are likely to appear in Perl programs specified on the command line using the -e option.
Q:What is the current line number when $. is used with the <> operator?
A:Effectively, the <> operator treats its input files as if they are a single file. This means that $. contains the total number of lines seen, not the line number of the current input file. (If you want $. to contain the line number of the current file, set $. to zero each time eof returns true.)
Q:Are pattern system variables local or global?
A:Each pattern system variable is defined only in the current subroutine or block of statements.
Q:Why does Perl define both the $" and the $, system variables?
A:Some programs like to treat the following statements differently:
print ("@array");
print (@array);
(In fact, by default, the first statement puts a space between each pair of elements in the array, and the second does not.) The $" and $, variables handle these two separate cases.

Workshop

The Workshop provides quiz questions to help you solidify your understanding of the material covered, and exercises to provide you with experience in using what you've learned.

Quiz

  1. List the functions and operators that use $_ by default.
  2. What do the following variables contain?
    a.    $=
    b.    $/
    c.$    ?
    d.    $!
    d.    @_
  3. Explain the differences between ARGV, $ARGV, and @ARGV.
  4. Explain the difference between @INC and %INC.
  5. Explain the difference between $0 and $1.

Exercises

  1. Write a program that reads lines of input, replaces multiple blanks and tabs with a single space, converts all uppercase letters to lowercase, and prints the resulting lines. Use no explicit variable names in this program.
  2. Write a program that uses $' and $_ to remove all extra spaces from input lines.
  3. Write a program that prints the directories in your PATH environment variable, one per line.
  4. Write a program that prints numbers, starting with 1 and continuing until interrupted by an INT signal.
  5. Write a program whose data consists of one or more numbers per input line. Put the input lines in the program file itself. Add the numbers and print their total.
  6. BUG BUSTER: What is wrong with the following statement?
    if ($line =~ /abc/) {
    $' =~ s/ +/ /;
    }