Perl Free Tutorial

Web based School

Chapter 5

Lists and Array Variables


CONTENTS

The Perl programs you have seen so far deal with scalar values, which are single units of data, and scalar variables, which can store one piece of information.

Perl also enables you to define an ordered collection of values, known as a list; this collection of values can be stored in variables known as array variables.

ToChapter's lesson describes lists and array variables, and it shows you what you can do with them. ToChapter, you learn about the following:

  • What lists are
  • The relationship between scalar variables and lists
  • Storing lists in array variables
  • Accessing an element of an array variable or list
  • How to use list ranges
  • Assigning to array variables
  • Assigning to scalar variables from array variables
  • Retrieving the length of a list
  • Using array slices
  • Using an array to store input
  • Sorting a list or array variable
  • Reversing a list or array variable
  • Creating a string from a list
  • Creating a list from a string

Introducing Lists

A list is a sequence of scalar values enclosed in parentheses. The following is a simple example of a list:


(1, 5.3, "hello", 2)

This list contains four elements, each of which is a scalar value: the numbers 1 and 5.3, the string hello, and the number 2.

Lists can be as long as needed, and they can contain any scalar value. A list can have no elements at all, as follows:


()

This list also is called an empty list.

NOTE
A list with one element and a scalar value are different entities. For example, the list
(43.2)
and the scalar value
43.2
are not the same thing. This is not a severe limitation because one can be converted to or assigned to the other. See the section titled "Assigning to Scalar Variables from Array Variables" later toChapter.

Scalar Variables and Lists

A scalar variable name can always be included as part of a list. In this case, the current value of the scalar variable becomes the list element value. For example:


(17, $var, "a string")

If $var has been assigned the value 26, the second element of the list becomes 26. (It remains 26 even if a different value is assigned to $var.)

Similarly, you can use the value of an expression as an element of a list. For example:


(17, 26 << 2)

This list contains two elements: 17 and 104 (which is 26 left-shifted two places). Expressions in lists, like other expressions, can contain scalar variables.


(17, $var1 + $var2)

Here, the expression $var1 + $var2 is evaluated and its value becomes the second element of the list.

Lists and String Substitution

Because character strings are scalar values, they can be used in lists, as follows:


("my string", 24.3, "another string")

You can substitute for scalar variable names in character strings in lists, as follows:


($value, "The answer is $value")

This list contains two elements: the value of the scalar variable $value, and a string containing the name of $value. If the current value of $value is 26, the two elements of the list are 26 and The answer is 26.

Storing Lists in Array Variables

Perl enables you to store lists in special variables designed for that purpose. These variables are called array variables (or arrays for short).

The following is an example of a list being assigned to an array variable:


@array = (1, 2, 3);

Here, the list (1, 2, 3) is assigned to the array variable @array.

Note that the name of the array variable starts with the character @. This enables Perl to distinguish array variables from other kinds of variables-for example, scalar variables, which start with the character $. As with scalar variables, the second character of the variable name must be a letter, while subsequent characters of the name can be letters, numbers, or underscores. Array variable names can be as long as you want.

The following are legal array-variable names:


@my_array

@list2

@a_very_long_array_name_with_lots_of_underscores

The following are not legal array-variable names:


@1array         # can't start with a number

@_array         # can't start with an underscore

@a.new.array    # . is not a legal variable-name character

When an array variable is first created (that is, seen for the first time), it is assumed to contain the empty list () unless it is assigned to.

NOTE
Because Perl uses @ and $ to distinguish array variables from scalar variables, the same name can be used in an array variable and in a scalar variable. For example:
$var = 1;
@var = (11, 27.1, "a string");
Here, the name var is used in both the scalar variable $var and the array variable @var. These are two completely separate variables.
Normally, you won't want to use the same name in both an array and a scalar variable, because this is confusing.

Accessing an Element of an Array Variable

After you have assigned a list to an array variable, you can refer to any element of the array variable as if it is a scalar variable.

For example, to assign the first element of the array variable @array to the scalar variable $scalar, use the following statement:


$scalar = $array[0];

The character sequence [0] is an example of a subscript. A subscript indicates a particular element of an array. In this case, 0 refers to the first element of the array. Similarly, the subscript 1 refers to the second element of the array, as follows:


$scalar = $array[1];

Here, the second element of the array @array is assigned to $scalar. The general rule is this:

An array subscript n, where n is any non-negative integer, always refers to array element n+1.

This notation is employed to ensure compatibility with the C programming language, which also starts its array subscripting with 0.

You can assign a scalar value to an individual array element in the same way:


@array = (1, 2, 3, 4);

$array[3] = 5;

After the second assignment, the value of @array becomes


(1, 2, 3, 5)

This is because the fourth element of the array has been replaced.

NOTE
If you try to access an array element that does not exist, the Perl interpreter uses the null string (which is equivalent to zero).
@array = (1, 2, 3, 4);
$scalar = $array[4];
Here, $array[4] refers to the fifth element of @array, which does not exist. In this case, $scalar is assigned the null string.

NOTE
The same thing happens when the subscript is a negative number, as follows:
$scalar = $array[-1];
Once again, the null string is assigned to $scalar.
Note also that arrays automatically grow when a previously unreferenced element is assigned to for the first time:
@array = (1, 2, 3, 4);
$array[6] = 17;
Because the seventh element of @array is assigned 17, the value of @array is now
(1, 2, 3, 4, "", "", 17)
The missing fifth and sixth elements now contain the null string.

You can use the value of a scalar variable as a subscript, as follows:


$index = 1;

$scalar = $array[$index];

Here, the value of $index, 1, becomes the subscript. This means that the second element of @array is assigned to $scalar.

When you use a scalar variable as a subscript, make sure that the value stored in the scalar variable corresponds to an array element that exists. For example:
@array = (1, 2, 3, 4);
$index = 4;
$scalar = $array[$index];
Here, the third statement tries to access the fifth element of @array, which does not exist. In this case, $scalar is assigned the null string, and the Perl interpreter doesn't tell you that anything went wrong.

More Details on Array Element Names

Note that the first character of an array-element variable name is the $ character, not the @ character. For example, to refer to the first element of the array @potato, use


$potato[0]

and not


@potato[0]

The basic rule is as follows:

Things that reference one value-such as scalar variables and array elements-must start with a $.

NOTE
Even though references to elements of array variables start with a $, the Perl interpreter still has no trouble distinguishing scalar variables from array-variable elements. For example, if you have defined a scalar variable $potato and an array variable @potato, the Perl interpreter uses the subscript to distinguish between the scalar variable and the array-variable element.
$result = $potato; # the scalar variable $potato
$result = $potato[0]; # the first element of @potato

Using Lists and Arrays in Perl Programs

Now that you have seen how lists and array variables work, it's time to take a look at a simple program that uses them. Listing 5.1 is a simple program that prints the elements of a list.


Listing 5.1. A program that prints the elements of a list.

1:  #!/usr/local/bin/perl

2:  

3:  @array = (1, "chicken", 1.23, "\"Having fun?\"", 9.33e+23);

4:  $count = 1;

5:  while ($count <= 5) {

6:          print ("element $count is $array[$count-1]\n");

7:          $count++;

8:  }



$ program5_1

element 1 is 1

element 2 is chicken

element 3 is 1.23

element 4 is "Having fun?"

element 5 is 9.3300000000000005+e23

$

Line 3 assigns a list containing five elements to the array variable @array.

Line 5 tests whether $count is less than or equal to 5. This conditional expression ensures that the while statement loops five times.

Line 6 prints the current value of $count and the corresponding element of @array. Note that the expression used in the subscript is $count-1, not $count, because subscripting starts from 0. For example, when count is 3, the subscript is 2, which means that the third element of @array is printed.

When you examine line 6, you see that Perl lets you substitute for array elements in character strings. When the Perl interpreter sees $array[$count-1] in the character string, it replaces this array element name with its corresponding value.

Listing 5.2 is another example of a program that uses arrays. This one is a little more interesting; it uses the built-in functions rand and int to generate random integers between 1 and 10.


Listing 5.2. A program that generates random integers between 1 and 10.

1:  #!/usr/local/bin/perl

2:  

3:  # collect the random numbers

4:  $count = 1;

5:  while ($count <= 100) {

6:          $randnum = int( rand(10) ) + 1;

7:          $randtotal[$randnum] += 1;

8:          $count++;

9:  }

10: 

11: # print the total of each number

12: $count = 1;

13: print ("Total for each number:\n");

14: while ($count <= 10) {

15:         print ("\tnumber $count: $randtotal[$count]\n");

16:         $count++;

17: }



$ program5_2

Total for each number:

        number 1: 11

        number 2: 8

        number 3: 13

        number 4: 6

        number 5: 10

        number 6: 9

        number 7: 12

        number 8: 11

        number 9: 11

        number 10: 9

$

This program is divided into two parts: the first part collects the random numbers, and the second part prints them.

Line 5 ensures that the loop iterates (is performed) 100 times. You can just as easily have the program generate any other quantity of random numbers just by changing the value in this conditional expression.

Line 6 generates a random number between 1 and 10 and assigns it to the scalar variable $randnum. To see how it does this, first note that the code fragment


int ( rand (10) )

actually is two function calls, one inside another. When the Perl interpreter sees this, it first calls the inner one, which is rand. The value returned by rand becomes the argument to the library function int.

Here's how line 6 generates a random number:

  1. First, it calls the Perl library function rand. This function generates a floating-point random number between 0 and 1 and then multiplies it by the argument it is passed. In this program, rand is passed 10, which means that the random number is multiplied by 10 and is now a floating-point number that is greater than 0 and less than 10.
  2. The value returned by rand is then passed to the library function int, which takes a floating-point number and gets rid of the non-integer part. This operation is known as truncation. The integer produced by this truncation operation becomes the return value of the function. For example, the following returns 5:
    int (5.7)
    In this program, int truncates the random number returned by rand and returns the resulting integer, which is now a random number between 0 and 9.
  3. The value 1 is added to the number returned by int, resulting in a random number between 1 and 10.
  4. This number is assigned to the scalar variable $randnum.

Line 7 now adds 1 to the element of the array @randtotal corresponding to the number generated. For example, if the random number is 7, the array element $randtotal[7] has 1 added to it.

NOTE
As you can see, line 7 works even though @randtotal is not initialized. When the program refers to an array element for the first time, the Perl interpreter assumes that the element has an initial value of the null string "". This null string is converted to 0, which means that adding 1 for the first time produces the result 1, which is what you want.

The second part of the program, which prints the total of each random number, starts with lines 12 and 13. These lines get things started by resetting the counter variable $count to 1 and printing an introductory message.

The conditional expression in line 14 ensures that the loop iterates 10 times-once for each possible random number.

Line 15 prints the total for a particular random number.

Using Brackets and Substituting for Variables

As you have just seen, Perl lets you substitute for array-element variable names in strings, as follows:


print ("element $count is $array[ $count-1]\n");

This might lead to problems if you want to include the characters [ and ] in character strings. For example, suppose that you have defined the scalar variable $var and the array variable @var. The character string


"$var[0]"

substitutes the value of the first element of @var in the string. To substitute the value of $var and keep the [0] as it is, you must use one of the following:


"${var}[0]"

"$var\[0]"

"$var" . "[0]"

The character string


"${var}[0]"

uses the brace characters { and } to keep var and [ separate; this tells the Perl interpreter to substitute for the variable $var, not $var[0]. After the substitution, the brace characters are not included in the string.

NOTE
To include a brace character after a $, use a backslash, as follows:
"$\{var}"
This character string contains the text ${var}.

The character string


"$var\[0]"

uses \ to indicate that the [ character is to be given a different meaning than normal; in this case, this means that [ is to be treated as a printable character and not as part of the variable name to be substituted.

The expression


"$var" . "[0]"

consists of two character strings joined together by the . operator. Here, the Perl interpreter replaces the first character string with the current value of $var.

Using List Ranges

Suppose that you want to define a list consisting of the numbers 1 through 10, inclusive. You can do this by typing each of the numbers in turn.


(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

However, there is a simpler way to do it: Use the list-range operator, which is .. (two consecutive period characters). The following is an example of a list created using the list-range operator:


(1..10)

This tells Perl to define a list that has a first value of 1, a second value of 2, and so on up to 10.

The list-range operator can be used to define part of a list.


(2, 5..7, 11)

This list consists of five elements: the numbers 2, 5, 6, 7, and 11.

List-range operators can be used with floating-point values. For example:


(2.1..5.3)

This list consists of four elements: 2.1, 3.1, 4.1, and 5.1. Each element of the list is one greater than the previous element, and the last element of the list is the largest possible number less than or equal to the number to the right of the .. operator. Here, 5.1 is less than 5.3, so it is included in the list; however, 6.1 is greater than 5.3, so it is not included.

NOTE
If the value to the left of the .. operator is greater than the value to the right, an empty list is created.
(4.5..1.6)
Because 4.5 is greater than 1.6, this list is empty.
If the two values are equal, a one-element list is created.
(3..3)
This is equivalent to the list (3).

List-range operators can specify ranges of strings. For example, the list ("aaa", "aab", "aac", "aad") can be expressed as ("aaa".."aad"). Similarly, the list ("BCY", "BCZ", "BDA", "BDB") is equivalent to ("BCY".."BDB"), and the statement @alphabet = ("a".."z"); creates a list consisting of the 26 lowercase letters of the alphabet and assigns this list to the array variable @alphabet.

List ranges also enable you to use strings to specify numbers that contain leading zeros.


@Chapter_of_month = ("01".."31");

This statement creates a list consisting of the strings 01, 02, 03 and so on, up to 31, and then assigns this list to @Chapter_of_month. Because each string contains two characters, this array is suitable for use when you are printing a date in a format such as 08-June-1960.

Expressions and List Ranges

The values that define the range of a list-range operator can be expressions, and these expressions can contain scalar variables. For example:


($var1..$var2+5)

This list consists of all values between the current value of $var1 and the current value of the expression $var2+5.

Listing 5.3 is an example of a program that uses list ranges. This program asks for a start number and an end number, and it prints all the numbers between them.


Listing 5.3. A program that uses list ranges to print a list of numbers.

1:  #!/usr/local/bin/perl

2:  

3:  print ("Enter the start number:\n");

4:  $start = <STDIN>;

5:  chop ($start);

6:  print ("Enter the end number:\n");

7:  $end = <STDIN>;

8:  chop ($end);

9:  @list = ($start..$end);

10: $count = 0;

11: print ("Here is the list:\n");

12: while ($list[$count] != 0 || $list[$count-1] == -1 ||

13:         $list[$count+1] == 1) {

14:         print ("$list[$count]\n");

15:         $count++;

16: }



$ program5_3

Enter the start number:

-2

Enter the end number:

2

Here is the list:

-2

-1

0

1

2

$

Lines 3 through 5 retrieve the start of the range to be printed. Line 3 retrieves the number from the standard input file. Line 4 assigns the resulting number to the scalar variable $start. Line 5 chops the trailing newline character.

Lines 6 through 8 repeat the same process for the end of the range, assigning the end of the range to the scalar variable $end.

Line 9 creates a list that consists of the numbers between $start and $end, and stores the list in the array variable @list.

Line 10 initializes the counter variable $count to 0.

Line 11 is a print statement that indicates that the list is about to be printed.

Lines 12 and 13 are the start of the loop that prints the range. The conditional expression to be evaluated consists of three subexpressions that are operands for the logical or operator ||. If any of these subexpressions are true, the loop continues.

The first subexpression tests for the end of the range. To do this, it takes advantage of the fact that an unidentified list element is equal to the null string and that the null string is equivalent to 0. When the list element $list[$count] is undefined, the following subexpression is false:


$list[$count] != 0

The second and third subexpressions cover the cases in which 0 is actually a part of the list. If the list to be printed contains 0, one or both of the following conditions must be true:

  • The number 1 must be the next element in the list.
  • The number -1 must be the previous element in the list.

The second and third subexpressions test for these conditions. If either or both of these conditions is true, at least one of the following subexpressions also must be true:


$list[$count-1] == -1

$list[$count+1] == 1

This ensures that the loop continues. Of course, this doesn't cover the case in which the list consists of just 0; however, that's not a meaningful case. (If you want to be finicky, you can add a special chunk of code that prints 0 if $start and $end are both 0, but that's not really worth bothering with.)

After this, the rest of the program is straightforward. Line 14 prints a number in the range, line 15 adds one to the counter variable $count, and line 16 ends the while statement.

TIP
One of the problems with Perl is that it is sometimes difficult to distinguish the following scalar variable or array-element values:
  • The null string "", which is converted to 0 in numeric expressions
  • An undefined variable or element, which defaults to the null string, which in turn is converted to 0 in numeric expressions
  • The string 0, which is converted to the number 0 in numeric expressions
  • A non-numeric string such as string, which is converted to 0 in numeric expressions
There are several ways of dealing with this confusion:
  1. Retrieve the length of the list stored in an array variable before processing it. This ensures that you don't go past the end of the list. See the section titled "Retrieving the Length of a List" later in toChapter's lesson for more details on how to do this.
  2. Compare the value with the string 0 rather than the number 0, as follows:
    if ($value eq "0") ...
    This handles the strings that convert to 0 in numeric expressions that are not 0 itself. (It doesn't handle strings such as 0000 or 0.0, which you might want your program to consider equivalent to 0; to deal with these, see the discussion of the split function later in toChapter's lesson.)
  3. Initialize the scalar variable or array element to a value other than 0 that you know is not going to appear naturally in your program, such as -99999.
Which particular method is best depends on the program you want to write, the input it expects, and how "bulletproof" the program needs to be.

More on Assignment and Array Variables

So far, you've seen that you can assign lists to array variables.


@array = (1, 2, 3, 4, 5);

You've also seen that you can assign an element of an array to a scalar variable.


$scalar = $array[3];

The following sections describe the other ways you can use assignment with lists and array variables.

Copying from One Array Variable to Another

You also can assign one array variable to another.


@result = @original;

Here, the list currently stored in the array variable @original is copied to the array variable @result. Each element of the new array @result is the same as the corresponding element of the array @original. Listing 5.4 shows that this is true.


Listing 5.4. A program that copies an array and compares the elements of the two arrays.

1:  #!/usr/local/bin/perl

2:  

3:  @array1 = (14, "cheeseburger", 1.23, -7, "toad");

4:  @array2 = @array1;

5:  $count = 1;

6:  while ($count <= 5) {

7:          print("element $count: $array1[$count-1] ");

8:          print("$array2[$count-1]\n");

9:          $count++;

10: }



$ program5_4

element 1: 14 14

element 2: cheeseburger cheeseburger

element 3: 1.23 1.23

element 4: -7 -7

element 5: toad toad

$

Line 3 assigns the list


(14, "cheeseburger", 1.23, -7, "toad")

to the array variable @array1. Line 4 then copies this array into a second array variable, @array2.

The rest of the program prints the elements of each array, as follows:

  • Line 5 initializes the counter variable $count to 1.
  • The conditional expression in line 6 ensures that the loop is performed five times.
  • Lines 7 and 8 print the matching element of each array. (Note that the subscript is $count-1, not $count, because the subscript 0 is the first element of the array.)
  • Line 9 adds one to the counter variable $count.

NOTE
You can assign to multiple arrays in one statement. For example:
@array1 = @array2 = (1, 2, 3);
This assigns a copy of the list (1, 2, 3) to both @array1 and @array2.

Using Array Variables in Lists

As you've already seen, lists can contain scalar variables. For example:


@list = (1, $scalar, 3);

Here, the value of the scalar variable $scalar becomes the second element of the list assigned to @list.

You also can specify that the value of an array variable is to appear in a list, as follows:


@list1 = (2, 3, 4);

@list2 = (1, @list1, 5);

Here, the value of the array variable @list1-the list (2, 3, 4)-is substituted for the name @list1, and the resulting list (1, 2, 3, 4, 5) is assigned to @list2.

Listing 5.5 shows an example of a list being contained in another list.


Listing 5.5. A program that assigns a list as part of another list.

1:  #!/usr/local/bin/perl

2:  

3:  @innerlist = " never ";

4:  @outerlist = ("I", @innerlist, "fail!\n");

5:  print @outerlist;



$ program5_5

I never fail!

$

Although this program is quite simple, it contains a couple of new tricks. The first of these is in line 3. Here, a scalar value, " never " (note the surrounding spaces), is assigned to the array variable @innerlist. This works because the Perl interpreter automatically converts the scalar value into a one-element list before assigning it to the array variable.

Line 4 assigns a list to the array variable @outerlist. This list is assembled by taking the following list:


("I", @innerlist, "fail!\n")

and substituting in the current value of the array variable @innerlist. As a result, the list assigned to @outerlist is


("I", " never ", "fail!\n")

Line 5 prints the list. To do this, it calls the library function print and passes it the array variable @outerlist. When print is given an array variable or a list to print, it prints each element in turn. This means that the following is written to the standard output file:


I never fail!

Note that print doesn't leave any spaces between the elements of the list when it prints them. The only reason the output is readable is because the character string contains spaces around never. This means that print isn't usually used to print a list of numbers in this way:


@list = (1, 2, 3);

print @list;

This prints the following, which isn't quite what you want:


123

TIP
In Listing 5.5, the argument passed to the print function is not enclosed in parentheses. This is perfectly acceptable. In Perl, the parentheses enclosing arguments to functions are optional. For example, when you call the library function chop, instead of writing
chop ($number);
you can write
chop $number;
Although this saves a few extra keystrokes, it makes things a little less readable (in this author's opinion)
Besides, eliminating the parentheses can lead to problems. Consider the following example
$fred = "Fred";
print (("Hello, " . $fred . "!\n") x 2);
This code prints
Hello, Fred!
Hello, Fred!
In this case, the parentheses enclosing the arguments to print are absolutely necessary. Without them, you have
print ("Hello, " . $fred . "!\n") x 2;
When the Perl interpreter sees this statement, it assumes that print is being called with the following argument, which is not what you want:
"Hello, " . $fred . "!\n"
As always in programming, the basic rule to follow is this: Do whatever makes your program easier to work with, and use your best judgment.

Substituting for Array Variables in Strings

As you have seen, Perl does not leave spaces if you pass an array variable to print:


@array = (1, 2, 3);

print (@array, "\n");

This prints the following on your screen:


123

To get around this problem, put the array you want to print into a string:


print ("@array\n");

When the Perl interpreter sees the array variable inside the string, it substitutes the values of the list assigned to the array variables, and leaves a space between each pair of elements. For example:


@array = (1, 2, 3);

print ("@array\n");

This prints the following on your screen:


1 2 3

Assigning to Scalar Variables from Array Variables

Consider the following assignment, which you've already seen:


@array = ($var1, $var2);

Here, the values of the scalar variables $var1 and $var2 are used to form a two-element list that is assigned to the array variable @array.

Perl also enables you to take the current value of an array variable and assign its components to a group of scalar variables. For example:


@array = (5, 7);

($var1, $var2) = @array;

Here, the first element of the list currently stored in @array, 5, is assigned to $var1. The second element, 7, is assigned to $var2.

Additional elements in an array, if they exist, are ignored. For example:


@array = (5, 7, 11);

($var1, $var2) = @array;

Here, 5 is assigned to $var1, 7 is assigned to $var2, and 11 is not assigned to anything.

If there are more scalar variables than elements in an array variable, the excess scalar variables are assigned the null string, as follows:


@array = (5, 7);

($var1, $var2, $var3) = @array;

This assigns 5 to $var1 and 7 to $var2. Because there are not enough elements in @array to assign anything to $var3, $var3 is assigned the null string "".

NOTE
You also can assign to several scalar variables using a list. For example:
($var1, $var2, $var3) = ("one", "two", "three");
This assigns one to $var1, two to $var2, and three to $var3.
As with array variables, extra values in the list are ignored and extra scalar variables are assigned the null string, as follows:
($var1, $var2) = (1, 2, 3); # 3 is ignored
($var1, $var2, $var3) = (1, 2); # $var3 is now ""

Retrieving the Length of a List

As you've seen, lists and array variables can be any length you want. As a consequence, Perl provides a way of determining the length of the list assigned to an array variable.

Here's how it works: If an array variable (or list) appears anywhere that a scalar value is expected, the Perl interpreter obtains a scalar value by calculating the length of the list assigned to the array variable.

Consider the following example:


@array = (1, 2, 3);

$scalar = @array;

In the assignment to $scalar, the Perl interpreter replaces @array with the length of the list currently assigned to @array, which is 3. $scalar, therefore, is assigned the value 3.

NOTE
Note that the following two statements are not equivalent:
$scalar = @array;
($scalar) = @array;
In the first statement, the length of the list in @array is assigned to $scalar. In the second statement, the first element of @array is assigned to $scalar.
It is always important to remember that $scalar and ($scalar) are not the same thing. $scalar is a scalar variable, and ($scalar) is a one-element list containing $scalar.

Being able to access the length of an array is useful if you want to write a loop that performs an operation on every element of an array. Listing 5.6 is an example of a program that does just that.


Listing 5.6. A program that prints every element of an array.

1:  #!/usr/local/bin/perl

2:  

3:  @array = (14, "cheeseburger", 1.23, -7, "toad");

4:  $count = 1;

5:  while ($count <= @array) {

6:          print("element $count: $array[$count-1]\n");

7:          $count++;

8:  }



$ program5_6

element 1: 14

element 2: cheeseburger

element 3: 1.23

element 4: -7

element 5: toad

$

The only new feature of this program is line 5, which compares the counter variable $count to the length of the array @array. Because the list assigned to @array contains five elements, the conditional expression


$count <= @array

ensures that the loop iterates five times.

Once again, note that the subscript in line 6 is $count-1, not $count. This caution bears repeating: It is very easy to forget to subtract 1 when you use a value as a subscript.

If you like, you can write your loop in a different way and use $count as a subscript. For example:


$count = 0;

while ($count < @array) {

        print ("element $count+1: $array[ $count]\n");

}

As you can see, this isn't any easier to follow because you now have to remember these two things:

  1. The conditional expression now must use the < operator, not the <= operator. If you use <= here, the loop iterates six times, not five.
  2. The value of $count is now not the same as the element you are referring to. For example, if you are printing the third element of the array, $count has the value 2. This means that references to $count, such as
    element $count+1:
    must add one to the value of $count to get the result you want.

As you can see, there is no intuitive or obvious way of writing programs that loop through arrays. Generally, it's best to pick the way that is easiest for you to remember.

You cannot retrieve the length of a list without first assigning the list to an array variable. For example:
@array = (10, 20, 30);
$scalar = @array;
This assigns 3 to $scalar. Compare this with the following statement:
$scalar = (10, 20, 30);
This statement actually assigns 30 to $scalar, not 3. In this statement, the subexpression
(10, 20, 30)
is treated as three scalar values separated by comma operators.
For more details on the comma operator, refer to "The Comma Operator" in Chapter 4.

Using Array Slices

As you've seen, array subscripting enables you to change or access one element of an array. For example:


$var = $array[2];

$array[2] = $var;

Perl enables you to access more than one element of an array at a time in much the same way. Following is a simple example:


@subarray = @array[0,1];

Here, the code fragment


@array[0,1]

refers to the first two elements of the list stored in the array variable. This portion of the array is known as an array slice. An array slice is treated just like any other list. In the statement


@subarray = @array[0,1];

the list consisting of the first two elements of @array is assigned to the array variable @subarray.

Here is another example:


@slice = @array[1,2,3];

This statement assigns the array slice consisting of the second, third, and fourth elements of @array to the array variable @slice.

Although single elements of an array are referenced using the $ character, array slices are referenced using @:
$var = $array[0];
@subarray = @array[0,1];
The basic rules are as follows:
  • References to single items, such as scalar variables or single array elements, start with a $.
  • References to array variables or array slices, which refer to lists, start with a @.

Listing 5.7 shows a simple example of an array slice.


Listing 5.7. A program that demonstrates the use of an array slice.

1:  #!/usr/local/bin/perl

2:  

3:  @array = (1, 2, 3, 4);

4:  @subarray = @array[1,2];

5:  print ("The first element of subarray is $subarray[0]\n");

6:  print ("The second element of subarray is $subarray[1]\n");



$ program5_7

The first element of subarray is 2

The second element of subarray is 3

$

Line 3 of this program assigns the following list to the array variable @array:


(1, 2, 3, 4)

Line 4 assigns a slice of the array variable @array to the array variable @subarray. The array slice


@array[1,2]

specifies that the second and third elements of the array are to be treated as a list and assigned to @subarray.

NOTE
In array slices, as in references to single elements of an array, subscripts start from zero. For example, the array slice
@array[1,2]
refers to the second and third elements of an array.

The final two lines of the program print the two elements of the array variable @subarray. As you can see, these elements are identical to the second and third elements of @array.

Using List Ranges in Array-Slice Subscripts

Perl provides a convenient way to refer to large array slices. Instead of writing


@array[0,1,2,3,4]

to refer to the first five elements of array @array, you can use the list range operator, as follows:


@array[0..4]

This enables you to assign large array slices easily:


@subarray = @array[0..19];

This assigns the first 20 elements of @array to @subarray.

Using Variables in Array-Slice Subscripts

You can use the value of a scalar variable in a list range in an array slice subscript. The following is an example:


$endrange = 19;

@subarray = @array[0..$endrange];

Here, the scalar variable $endrange contains the upper limit of the array slice, which in this case is 19. This means that the array slice to assign is


@array[0..19]

which assigns the first 20 elements of @array to @subarray.

You can also use the list stored in an array variable to define an array slice. Listing 5.8 shows how this works.


Listing 5.8. A program that uses an array variable as an array-slice subscript.

1:  #!/usr/local/bin/perl

2:  

3:  @array = ("one", "two", "three", "four", "five");

4:  @range = (1, 2, 3);

5:  @subarray = @array[@range];

6:  print ("The array slice is: @subarray\n");



$ program5_8

The array slice is: two three four

$

Line 3 of this program assigns the following list to the array variable @array:


("one", "two", "three", "four", "five")

Line 4 assigns the list (1, 2, 3) to the array variable @range, which is to serve as the list range.

Line 5 uses the value of @range as the array subscript for an array slice. Because @range contains (1, 2, 3), the slice of @array that is selected consists of the second, third, and fourth elements. These elements are then assigned to the array variable @subarray.

Line 6 prints the selected array slice. When the Perl interpreter sees the variable name @subarray in the character string to be printed, it substitutes the value of @subarray for its name. Because @subarray is inside a character string, the Perl interpreter leaves a space between each pair of elements when printing.

Compare line 6 with the following:


print (@subarray, "\n");

Here, print leaves no spaces between the elements of @subarray, which means that it prints


twothreefour

Which outcome you want depends, of course, on what you want your program to do.

Assigning to Array Slices

You can assign to array slices using the notation you have just seen. The following is an example:


@array[0,1] = ("string", 46);

Here, the first two elements of the array @array become string and 46, respectively.

You can use list-range operators and variables when you assign to array slices as well. The following is an example:


@array[0..3] = (1, 2, 3, 4);

@array[0..$endrange] = (1, 2, 3, 4);

If there are more items in the array slice than in the list, the extra items in the array slice are assigned the null string, as follows:


@array[0..2] = ("string1", "string2");

The third element of @array now holds the null string.

If there are fewer items in the array slice than in the list, the extra items in the list are ignored, as in the following:


@array[0..2] = (1, 2, 3, 4);

In this assignment, the fourth element in the list, 4, is not assigned to anything.

When an array slice is assigned to, the remainder of the array is not changed. Listing 5.9 shows how this works.


Listing 5.9. A program that assigns to an array slice.

1:  #!/usr/local/bin/perl

2:  

3:  @array = ("old1", "old2", "old3", "old4");

4:  @array[1,2] = ("new2", "new3");

5:  print ("@array\n");



$ program5_9

old1 new2 new3 old4

$

In the preceding program, the only statement that did not appear in previous programs is line 4, which assigns the list ("new2", "new3") to the array slice of @array consisting of the second and third elements. This assignment changes the value of @array from


("old1", "old2", "old3", "old4")

to


("old1", "new2", "new3", "old4")

Line 5 then prints the changed array.

Overlapping Array Slices

As you've seen, Perl enables you to use array slices on either side of an assignment statement. The following is an example:


@newarray = @array[2,3,4];

@array[2,3,4] = @newarray;

This means that you can assign from one array slice to another, even if the two slices overlap, as in the following:


@array[1,2,3] = @array[2,3,4];

The Perl interpreter has no problem with this statement because it copies the list stored
in @array[2,3,4] into a temporary location (invisible to you) before assigning it to @array[1,2,3].

Listing 5.10 provides an example of overlapping array slices in use.


Listing 5.10. A program containing overlapping array slices.

1:  #!/usr/local/bin/perl

2:  

3:  @array = ("one", "two", "three", "four", "five");

4:  @array[1,2,3] = @array[2,3,4];

5:  print ("@array\n");



$ program5_10

one three four five five

$

Line 4 is an example of an assignment with overlapping array slices. At the time of assignment, the array slice @array[2,3,4] contains the list


("three", "four", "five")

This list consists of the last three elements of @array. Assigning this list to @array[1,2,3] means that the list stored in @array changes from


("one", "two", "three", "four", "five")

to


("one", "three", "four", "five", "five")


NOTE
Overlapping array slices of varying lengths are dealt with in the same way as other array slice assignments of non-matching lengths. For example:
@array = (1, 2, 3, 4, 5);
@array[0..2] = @array[3,4];
This assignment assigns the array slice @array[3,4], which is the list (4, 5), to the array slice @array[0..2]. After this assignment, the value of @array is the list
(4, 5, "", 4, 5)
The third element of @array is now the null string because there are only two elements in the array slice being assigned.

Using the Array-Slice Notation as a Shorthand

So far, I've been using the following array-slice notation to refer to consecutive elements of an array:


@array[0,1]

In Perl, however, there is no real difference between an array slice and a list containing consecutive elements of the same array. For example, the following statements are equivalent:


@subarray = @array[0,1];

@subarray = ($array[0], $array[1]);

Because of this, you can use the array-slice notation to refer to any elements of an array, regardless of whether they are in order. For example, the following two statements are equivalent:


@subarray = ($array[4], $array[1], $array[3]);

@subarray = @array[4,1,3];

In both cases, the array variable @subarray is assigned a list consisting of three elements: the fifth, second, and fourth elements of @array.

You can use this array-slice notation in a variety of ways. For example, you can assign one element of an array multiple times:


@subarray = @array[0,0,0];

This creates a list consisting of three copies of the first element of @array, and then assigns this list to @subarray.

The array-slice notation provides an easy way to swap elements in a list. The following is an example:


@array[1,2] = @array[2,1];

This statement swaps the second and third elements of @array. As with the overlapping array slices you saw earlier, the Perl interpreter copies @array[2,1] into a temporary location before assigning it, which ensures that the assignment takes place properly.

For an example of a program that swaps array elements, look at Listing 5.11, which sorts the elements in an array using a simple sort algorithm.


Listing 5.11. A program that sorts an array.

1:  #!/usr/local/bin/perl

2:  

3:  # read the array from standard input one item at a time

4:  print ("Enter the array to sort, one item at a time.\n");

5:  print ("Enter an empty line to quit.\n");

6:  $count = 1;

7:  $inputline = <STDIN>;

8:  chop ($inputline);

9:  while ($inputline ne "") {

10:         @array[$count-1] = $inputline;

11:         $count++;

12:         $inputline = <STDIN>;

13:         chop ($inputline);

14: }

15: 

16: # now sort the array

17: $count = 1;

18: while ($count < @array) {

19:         $x = 1;

20:         while ($x < @array) {

21:                 if ($array[$x - 1] gt $array[$x]) {

22:                         @array[$x-1,$x] = @array[$x,$x-1];

23:                 }

24:                 $x++;

25:         }

26:         $count++;

27: }

28: 

29: # finally, print the sorted array

30: print ("@array\n");



$ program5_11

Enter the array to sort, one item at a time.

Enter an empty line to quit.

foo

baz

dip

bar



bar baz dip foo

$

This program is divided into three parts:

  • Reading the array
  • Sorting the array
  • Printing the array

Lines 3-14 read the array into the variable @array. The conditional expression in line 9, $inputline ne "", is true as long as the line is not empty. (Recall that an empty line consists of just the newline character, which the library function chop removes.) In this example, the list foo baz dip bar is read into the array variable @array.

Lines 17-27 perform the sort. The sort consists of two loops, one inside the other. The inner loop works like this:

  • Line 21 compares the first item in the list with the item next to it. If the first item is greater, line 22 swaps the two items. Otherwise, the two items are left where they are. In this example, foo is greater than baz, so foo becomes the second element in the list. At this point, the list is
    baz foo dip bar
  • The program then loops back to line 21, which now compares the second pair in the list (the second and third elements). The new second element, foo, is compared to dip. foo is greater, so foo becomes the new third element, and dip becomes the second element:
    baz dip foo bar
  • Line 20 terminates the loop when the last pair is compared. (Note that the conditional expression compares the inner counting variable $x with the length of the array variable @array. When $x becomes equal to @array, every pair of elements in the list has been compared.)

At this point, the largest element in the list is at the far end of the list:


baz dip bar foo

The largest value in the list, foo, has been moved to the far right end of the list, where it belongs. The other elements have been displaced to make room.

Lines 17-19 and 26-27 contain the outer loop. This outer loop just makes sure that the inner loop is repeated n-1 times, where n is the number of elements in the list. When the inner loop is repeated a second time, the second-largest element moves up to the second position from the right:


baz bar dip foo

The final pass through the inner loop sorts the final two elements:


bar baz dip foo

Line 30 then prints the sorted list.

NOTE
You'll never need to write a program that sorts values in a list because Perl has a library function, sort, that does it for you. See the section "Array Library Functions" later toChapter for more details.

Reading an Array from the Standard Input File

In the programs you have seen so far, single lines of input are read from the standard input file and stored in scalar variables. For example:


$var = <STDIN>;

In this case, every appearance of <STDIN> means that another line of input is obtained from the standard input file.

Perl also provides a quicker approach: If you assign <STDIN> to an array variable instead of a scalar variable, the Perl interpreter reads in all of the data from the standard input file at once and assigns it. For example, the statement


@array = <STDIN>;

reads everything typed in and assigns it all to the array variable @array. The variable @array now contains a list; each element of the list is a line of input.

Listing 5.12 is an example of a simple program that reads its input data into an array.


Listing 5.12. A program that reads data into an array and writes the array.

1:  #!/usr/local/bin/perl

2:  

3:  @array = <STDIN>;

4:  print (@array);



$ program5_12

Here is my first line of data.

Here is another line.

Here is the last line.

^D

Here is my first line of data.

Here is another line.

Here is the last line.

$

As you can see, this program is very short. Line 3 reads the input from the standard input file. In this example, the input that is entered consists of the three lines


Here is my first line of data.

Here is another line.

Here is the last line.

followed by the Ctrl+D key combination. Ctrl+D produces a special character that indicates end of file; when the Perl interpreter sees this, it knows that there is no more input.

NOTE
A blank line is perfectly acceptable input and does not terminate the reading of input from the standard input file. Only the Ctrl+D character can do that.
Also note that the Ctrl+D character is a non-printing character. When you type it, nothing appears on the screen. In the examples in this guide, control characters that are part of the input, such as Ctrl+D, are represented by the ^ character followed by the letter typed. For example, Ctrl+D is represented as
^D
This representation is the standard one used in the computing world.

After line 3 is executed, the array variable @array contains a list comprising three elements: the three lines of input you just entered. The last character of each input line is the newline character (because you didn't call chop to get rid of it).

Line 4 prints the lines of input you just read. Note that you do not need to separate the lines with spaces or newline characters because each line in @array is terminated by a newline character.

When you use the following statement:
@array = <STDIN>;
every line of input you enter is stored in @array all at once. If you enter a lot of input, @array can get very large.
Use this statement only when you really need to work with the entire input file at once.

Array Library Functions

Perl provides a number of library functions that work on lists and array variables. You can use them to do the following:

  • Sort array elements in alphabetical order
  • Reverse the elements of an array
  • Remove the last character from all elements of an array
  • Merge the elements of an array into a single string
  • Split a string into array elements

The following sections describe these array library functions.

Sorting a List or Array Variable

The library function sort sorts the elements of an array in alphabetical order and returns the sorted list.

The syntax for the sort library function is


retlist = sort (array);

In this syntax, array is the list to sort, and retlist is the sorted list.

Here are some examples:


@array = ("this", "is", "a", "test");

@array2 = sort (@array);

After sort is called, the value of @array2 is the list


("a", "is", "test", "this")

Note that sort does not modify the original list. The statement


@array2 = sort (@array);

does not change the value of @array. To replace the contents of an array variable with the sorted list, put the array variable on both sides of the assignment, as follows:


@array = sort (@array);

Here, the sorted list is put back in @array.

The sorted list must be assigned to an array variable in order to be used. The statement
sort (@array);
doesn't do anything useful because the sorted list is not assigned to anything.

Note that sort treats its items as strings, not integers; items are sorted in alphabetical, not numeric, order. For example:


@array = (70, 100, 8);

@array = sort (@array);

In this case, sort produces


(100, 70, 8)

not


(8, 70, 100)

Because sort is treating the elements of the list as strings, the strings to be sorted are 70, 100, and 8. When sorting characters that are not alphabetic, sort looks at the internal representation of the characters to be sorted. If you are not familiar with ASCII (which will be described shortly), this might sound complicated, but it's not too difficult to understand.

Here's how it works: When Perl (or any other programming language) stores a character such as r or 1, what it actually does is store a unique eight-bit number that corresponds to this character. For example, the letter r is represented by the number 114, and 1 is represented by the number 49. Every possible character has its own unique number.

The sort function uses these unique numbers to determine how to sort character strings. When sorting 70, 100, and 8, sort looks at the unique numbers corresponding to 7, 1, and 8, which are the first characters in each of the strings. As it happens, the unique number for 1 is less than that for 7, which is less than that for 8 (which makes sense when you think of it). This means that 100 is "less than" 70, and 70 is "less than" 8.

Of course, if two strings have identical first characters, sort then compares the second characters. For example, when sort sorts 72 and 7$, the first characters are identical; sort then compares the unique number representing 2 with the number representing $. As it happens, the number for $ is smaller, so 7$ is "less than" 72.

NOTE
The set of unique numbers that correspond to the characters understood by the computer is known as the ASCII character set.
Most computers toChapter use the ASCII character set, with a couple of exceptions as follows:
  • Some IBM computers use an IBM-developed character set called EBCDIC. EBCDIC works the same way as ASCII. In both cases, a character such as r or 1 is translated into a number that represents it. The only difference between EBCDIC and ASCII is that the translated numbers are different.
  • Computers that print a variety of spoken languages, or which deal with languages such as Japanese or Chinese, use a more complicated 16-bit code to represent the wide variety of characters they understand.
You don't really need to worry about what character set your machine uses, except to take note of the sorting order. A complete listing of the ASCII characters can be found in Appendix B, "ASCII Character Set."

Using Other Sort Keys

Normally, sort sorts in alphabetical order. You can tell the Perl interpreter to sort using any criterion you like. To learn more about sort keys, refer to Chapter 9, "Using Subroutines."

Reversing a List or Array Variable

The library function reverse reverses the order of the elements of a list or array variable, and returns the reversed list.

The syntax for the reverse library function is


retlist = reverse (array);

array is the list to reverse, and retlist is the reversed list.

Here is an example:


@array = ("backwards", "is", "array", "this");

@array2 = reverse(@array);

The value assigned to @array2 is the list


("this", "array", "is", "backwards")

As with sort, reverse does not change the original array.

If you like, you can sort and reverse the same list by passing the list returned by sort to reverse. Listing 5.13 shows an example of this. It reads lines of data from the standard input file and sorts them in reverse order.


Listing 5.13. A program that sorts input lines in reverse order.

1:  #!/usr/local/bin/perl

2:  

3:  @input = <STDIN>;

4:  @input = reverse (sort (@input));

5:  print (@input);



$ program5_13

foo

bar

dip

baz

^D

foo

dip

baz

bar

$

Line 3 reads all the input lines from the standard input file into the array variable @input. Each element of input consists of a single line of input terminated with a newline character.

Line 4 sorts and reverses the input line. First, sort is called to sort the input lines in alphabetical order. (Recall that when one library function appears inside another, the innermost one is called first.) The list returned by sort is then passed to reverse, which reverses the order of the elements of the list. The result is a list sorted in reverse order, which is then assigned to @input.

Line 5 prints the sorted lines. Because each line is terminated by a newline character, no extra spaces or newline characters need to be added to make the output readable.

NOTE
If you like, you can omit the parentheses to the call to reverse. This gives you the following statement:
@input = reverse sort (@input);
Here is a case where eliminating a set of parentheses actually makes the code more readable; it is obvious that the statement sorts @input in reverse order.

Using chop on Array Variables

As you've seen, the chop library function removes the last character from a character string. The following is an example:


$var = "bathe";

chop ($var);      # $var now contains "bath"

The chop function also can work on lists in array variables. If you pass an array variable to chop, it removes the last character from every element in the list stored in the array variable. For example:


@list = ("rabbit", "12345", "quartz");

chop (@list);

After chop is called, the list stored in @list is


("rabbi", "1234", "quart")

The chop function often is used on arrays read from the standard input file, as shown in the following:


@array = <STDIN>;

chop (@array);

This call to chop removes the newline character from each input line. In the following section, you will see programs in which this is helpful.

Creating a Single String from a List

The library function join creates a single string from a list of strings, which then can be assigned to a scalar variable.

The syntax for the join library function is


string = join (array);

array is the list to join together, and string is the resulting character string.

The following is an example using join:


$string = join(" ", "this", "is", "a", "string");

The first element of the list supplied to join contains the characters that are to be used to join the parts of the created string together. In this example, $string becomes this is a string.

join can specify other join strings besides " ". For example, the following statement uses a pair of colons to join the strings:


$string = join("::", "words", "and", "colons");

In this statement, $string becomes words::and::colons.

You can use any list or array variable as part or all of the argument to join. For example:


@list = ("here", "is", "a");

$string = join(" ", @list, "string");

This assigns here is a string to $string.

Listing 5.14 is a simple program that uses join. It joins together all the input lines from the standard input file.


Listing 5.14. A program that takes its input and joins it into a single string.

1:  #!/usr/local/bin/perl

2:  

3:  @input = <STDIN>;

4:  chop (@input);

5:  $string = join(" ", @input);

6:  print ("$string\n");



$ program5_14

This

is

my

input

^D

This is my input

$

Line 3 reads all of the input lines into the array variable @input. Each element of @input is a single line of input terminated by a newline character.

Line 4 passes the array variable @input to the library function chop, which removes the last character from each element of the list stored in @input. This removes all of the trailing newline characters.

Line 5 calls join, which joins all the input lines into a single string. The first argument passed to join is " ", which tells join to put one space between each pair of lines. This turns the list


("This", "is", "my", "input")

into the string


This is my input

Line 6 prints the string produced by join. Note that the call to print has to specify a newline character because all the newline characters in the input lines have been removed by the call to chop.

Splitting a String into a List

As you've seen, the library function join creates a character string from a list. To undo the effects of join-to split a character string into separate items-call the function split.

The syntax for the library function split is


array = split (string);

string is the character string to split, and array is the resulting array.

The following is a simple example of the use of split:


$string = "words::separated::by::colons";

@array = split(/::/, $string);

The first argument passed to split tells it where to break the string into separate parts. In this example, the first argument is :: (two colons); because there are three pairs of colons in the string, split breaks the string into four separate parts. The result is the list


("words", "separated", "by", "colons")

which is assigned to the array variable @array.

NOTE
The / characters surrounding the :: in the call to split indicate that the :: is a pattern to be matched. Perl supports a wide variety of special pattern-matching sequences, which you will learn about on Chapter 7, "Pattern Matching."

The split function is used in a variety of applications. Listing 5.15 uses split to count the number of words in the standard input file.


Listing 5.15. A simple word-count program.

1:  #!/usr/local/bin/perl

2:  

3:  $wordcount = 0;

4:  $line = <STDIN>;

5:  while ($line ne "") {

6:          chop ($line);

7:          @array = split(/ /, $line);

8:          $wordcount += @array;

9:          $line = <STDIN>;

10: }

11: print ("Total number of words: $wordcount\n");



$ program5_15

Here is some input.

Here are some more words.

Here is my last line.

^D

Total number of words: 14

$

When you enter a Ctrl+D (End-of-File) character and read it using <STDIN>, the resulting line is the null string. Line 5 of this program tests for this null string.

Note that line 5 has no problem distinguishing the end of file from a blank input line because a blank input line contains the newline character, and chop has not yet been called. Once the Perl interpreter knows that the program is not at the end of file, line 6 can be called; it chops the newline character off the end of the input line.

Line 7 splits the input line into words. The first argument to split, / /, indicates that the line is to be broken whenever the Perl interpreter sees a space. The resulting list is stored in @array.

Because each element of the list in @array is one word in the input line, the total number of words in the line is equivalent to the number of elements in the array. Line 8 takes advantage of this to count the number of words in the input line. Here's how line 8 works:

  • When an array variable appears in a place where the Perl interpreter normally expects a scalar value, the number of elements in the list stored in the array variable is substituted for the variable name. In this program, when the Perl interpreter sees @array, it replaces it with the number of elements in @array.
  • Because the number of elements in the array is the same as the number of words in the input line, the statement
    $wordcount += @array;
    actually adds the number of words in the line to $wordcount.

NOTE
Listing 5.15 does not work properly if an input line contains more than one space between words. The following is an example:
This is a line
Because there are two spaces between This and is, the split function breaks
This is
into three words: This, an empty word "", and is. Because of this, the line
This is a line
appears to contain five words when it really contains only four.
To get around this problem, what you need is a pattern that matches one or more spaces. To learn about special patterns such as this, see Chapter 7.

Listing 5.16 is an example of a program that uses split, join, and reverse to reverse the word order of the input read from the standard input file.


Listing 5.16. A program that reverses the word order of the input file.

1:  #!/usr/local/bin/perl

2:  

3:  @input = <STDIN>;

4:  chop (@input);

5:  

6:  # first, reverse the order of the words in each line

7:  $currline = 1;

8:  while ($currline <= @input) {

9:          @words = split(/ /, $input[$currline-1]);

10:         @words = reverse(@words);

11:         $input[$currline-1] = join(" ", @words, "\n");

12:         $currline++;

13: }

14: 

15: # now, reverse the order of the input lines and print them

16: @input = reverse(@input);

17: print (@input);



$ program5_16

This sentence

is in

reverse order.

^D

order. reverse

in is

sentence This

$

Line 3 reads all of the standard input file into the array @input. Line 4 then removes the trailing newline characters from the input lines.

Lines 7-13 reverse each individual line. Line 7 compares the current line number, stored in $currline, with the number of lines of input. (Recall that the number of elements in the list is used whenever an array variable appears where a scalar value is expected.)

Line 9 splits a line of input into words. The first argument to split, / /, indicates that a split is to occur every time a space is seen. The list of words is stored in the array variable @words.

Line 10 reverses the order of the list of words stored in @words. After the list has been reversed, line 11 joins the input line back together again. Note that line 11 appends a newline character to the input line.

Now that the words in each individual line have been reversed, all that the program needs to do is reverse the order of the lines themselves. Line 16 accomplishes this.

Line 17 prints the reversed input file. Note that the period character (.) appears at the end of the first word; this is because the reversing program isn't smart enough to detect and get rid of it. (You can use split to get rid of this, too, if you want.)

Other List-Manipulation Functions

Perl provides several other list-manipulation functions also. To learn about these, refer to Chapter 14, "Scalar-Conversion and List-Manipulation Functions."

Summary

In toChapter's lesson, you learned about lists and array variables. A list is an ordered collection of scalar values. A list can consist of any number of scalar values.

Lists can be stored in array variables, which are variables whose names begin with the character @.

Individual elements of array variables can be accessed using subscripts. The subscript 0 refers to the first element of the list stored in the array variable, the subscript 1 refers to the second element, and so on. If an array element is not defined, it is assumed to hold the null string "". If a previously undefined array element is assigned to, the array grows appropriately.

The list-range operator provides a convenient way to create a list containing consecutive numbers.

You can copy lists from one array variable to another. In addition, you can include an array variable in a list, which means that the list stored in the array variable is copied into the list containing the array-variable name.

Array-variable names can appear in character strings; in this case, the elements of the list are included in place of the variable name, with a space separating each pair of elements.

You can assign values to scalar variables from array variables, and vice versa.

If an array variable appears in a place where a scalar variable is expected, the length of the list stored in the array variable is used.

You can access any part of a list stored in an array variable by using the array-slice notation. You can assign values to array slices, and they can be used anywhere a list is expected.

The entire contents of the standard input file can be stored in a single array variable.

The library functions sort and reverse sort and reverse lists, respectively. The function chop removes the last character from each element of a list. The function split breaks a single string into a collection of list elements. The function join takes a collection of list elements and joins them into a single string.

Q&A

Q:How can I tell whether a reference to an array variable such as @array refers to the stored list or to the length of the list?
A:It's usually pretty easy to tell. In a lot of places, using a list makes no sense:
$result = $number + @array;
For example, it makes no sense here to add a list to $number, so the length of the list stored in @array is used.
Q:Why do array elements use $ for the first character of the element name, and not @? Wouldn't it make more sense to refer to an array element as
@array[2]
because we all know that the @ indicates an array variable?
A:This relates to the first question. The Perl interpreter needs to know as soon as possible whether a variable reference is a scalar value or a list. The $ indicates right away that the upcoming item is a scalar value.
Eventually, you'll get used to this notation.
Q:Is there a difference between an undefined array variable and an array variable containing the empty list?
A:No. By default, all array variables contain the empty list. Note, however, that the empty list is not the same as a list containing the null string:
@array = ("");
This list contains one element, which happens to be a null string.
Q:How large an input file can I read in using the following statement?
@array = <STDIN>;
A:Perl imposes no limit on the size of arrays. Your computer, however, has a finite amount of memory, which limits how large your arrays can be.
Q:Why does Perl add spaces when you substitute for an array variable in a string?
A:The most common use of string substitution is in the print statement. Normally, when you print a list you don't want to have the elements of the list running together, because you want to see where one element stops and the next one starts.
To print the elements of a string without spaces between them, pass the list to print without enclosing it in a string, as follows:
print ("Here is my list", @list, "\n");
Q:Why does $ appear before 1 in the ASCII character set?
A:The short answer is: Just because. (This reasoning occurs more often in computing than you might think.)
Here's a more detailed explanation: On early machines that used the ASCII character set, performance was more efficient if there was a relationship between, for instance, the location of the uppercase alphabetic characters and the lowercase alphabetic characters. (In fact, if you add 0x20, or 20 hexadecimal, to the ASCII representation of an uppercase letter, you get the corresponding lowercase letter.)
Establishing relationships such as these meant that gaps existed between, for example, the representation of Z (which is 90) and the representation of a (which is 97). These gaps are filled by printable non-alphanumeric characters; for example, the representation of [ is 91.
As for why $ appears before 1, as opposed to ?, which appears after 1, the explanation is: Just because.

Workshop

The Workshop provides quiz questions to help you solidify your understanding of the material covered and exercises to give you experience in using what you've learned. Try and understand the quiz and exercise answers before you go on to tomorrow's lesson.

Quiz

  1. Define the following terms:
    a.   list
    b.   empty list
    c.   array variable
    d.   subscript
    e.   array slice
  2. Assume the following assignments have been performed:
    @list = (1, 2, 3);
    $scalar1 = "hello";
    $scalar2 = "there";
    What is assigned to the array variable @newlist in each of the following cases?
    a.   @newlist = @list;
    b.   @newlist = reverse(@list[1,2]);
    c.   @newlist = ($scalar1, @list[1,1]);
    d.   ($dummy, @newlist) = @list;
    e.   @newlist[2,1,3] = @list[1,2,1];
    f.   @newlist = <STDIN>;
  3. Assume that the following assignments have been performed:
    @list1 = (1, 2, 3, 4);
    @list2 = ("one", "two", "three");
    What is the value of $result in each of the following cases?
    ($dummy, $result) = @list1;
    $result = @list1;
    ($result) = @list2;
    ($result) = @list1[1..2];
    $result = $list2[$list1[$list1[0]]];
    $result = $list2[3];
  4. What is the difference between a list and an array variable?
  5. How does the Perl interpreter distinguish between an array element and a scalar variable?
  6. How can you ensure that the @, $, and [ characters are not substituted for in strings?
  7. How can you obtain the length of a list stored in an array variable?
  8. What happens when you refer to an array element that has not yet been defined?
  9. What happens when you assign to an array element that is larger than the current length of the array?

Exercises

  1. Write a program that counts all occurrences of the word the in the standard input file.
  2. Write a program that reads lines of input containing numbers, each of which is separated by exactly one space, and prints out the following:
    a.   The total for each line
    b.   The grand total
  3. Write a program that reads all input from the standard input file and sorts all the words in reverse order, printing out one word per line with duplicates omitted.
  4. BUG BUSTER: What is wrong with the following statement?
    $result = @array[4];
  5. BUG BUSTER: What is wrong with the following program? (See if you can figure out what's wrong without checking the listings in toChapter's lesson.)
    #!/usr/local/bin/perl

    @input = <STDIN>;
    $currline = 1;
    while ($currline < @input) {
    @words = split(/ /, $input[$currline]);
    @words = sort(@words);
    $input[$currline] = join(" ", @words);
    $currline++;
    }
    print (@input);