Previous Up Next

This HTML version of Think Perl 6 is provided for convenience, but it is not the best format of the book. You might prefer to read the PDF version.

Chapter 9  Arrays and Lists

This chapter presents some of Perl’s most useful built-in types, arrays and lists.

9.1  Lists and Arrays Are Sequences

Like strings, lists and arrays are sequences of values. In a string, the values are characters; in a list or in an array, they can be any type. The values in a list or in an array are called elements or sometimes items.

There are several important differences between lists and arrays. The main ones are that lists are ordered and immutable collections of items: you can’t change the number of items in a list and you can’t change the individual items either. Arrays, by contrast, are variables and are generally mutable: you can add elements to an array, or remove elements from it. And you can access the individual elements of an array and modify them. For this to be possible, arrays usually have a name (as do other variables), although some arrays are anonymous, which means that they have no name, but have some other ways of accessing them.

A list is also ephemeral (unless it is assigned to a variable or some other thing): it ceases to exist as soon as it has been used, usually as soon as the program control flow goes to the next code line. An array, on the other hand, has some form of persistence: you may be able to use it somewhere else in the program if the variable containing it is still within scope.

There are several ways to create a new list; the simplest is to enumerate its values, separated by commas:

> 3, 4, 5
(3 4 5)
> say (3, 4, 5).WHAT;
(List)
say $_ for 1, 2, 3;
1
2
3

You don’t need parentheses to create a list, but they are often useful to delimit it, i.e., to stipulate where it starts and where it ends, and, in some cases, to override precedence.

We used lists earlier in this book. If we write:

> print "$_ " for 1, 3, 5, 9, "\n";
1 3 5 9
 >
> print "$_ " for 1..10;
1 2 3 4 5 6 7 8 9 10 >

we are basically creating and using a list of integers (from the standpoint of the type hierarchy of Perl; this observation is not entirely accurate technically for the second example, since 1..10 has a Range type, and it gets transformed into a Seq type, but this approximation is good enough for our purposes here).

Arrays are variables whose names start with the sigil @. Named arrays need to be declared before they are used, just as any other variable we’ve seen so far (except the topical variable, $_). One of the easiest ways to create an array is to assign a list to it:

> my @odd_digits = 1, 3, 5, 7, 9;
[1 3 5 7 9]
> say @odd_digits.WHAT;
(Array)
> my @single_digit_numbers = 0..9;
[0 1 2 3 4 5 6 7 8 9]

Under the Perl REPL, an array is displayed between square brackets ([ and ]), while lists are displayed between round parentheses.

If the items don’t contain any space characters, it is quite handy to construct a list (and assign it to an array if needed) using the <...> quote-word operator:

> my @weekdays = <mon tue wed thu fri>;
[mon tue wed thu fri]
> my @weekend = <sat sun>;
[sat sun]

The advantage of this construction is that there is no need to separate the items with commas and no need to insert them between quotes when the items are strings. Basically, the quote-word operator breaks up its content on whitespace and returns a list of words, which can then be used in a loop or assigned to an array as in the example above.

Most of the rest of this chapter will be devoted to arrays rather than lists, but keep in mind that many of the array functions and operators we will study here also work on lists (at least most of those that would not violate the immutability property of lists).

The items of an array (or a list) don’t need to be of the same type:

> my @heterogeneous-array = 1, 2.3, pi, "str", (1, 2, 4);
[1 2.3 3.14159265358979 str (1 2 4)]

Here, the array is composed of an integer, a rational, a float (Num type), a string, and a list of three integers. It may not be advisable for the sake of the developer’s mental sanity to use an array with such wildly heterogeneous items, but Perl will not complain about that: it is up to you to make sense of your data.

The array above even contains a list of items. If you iterate over the elements of this array for example with a for loop statement, this list will arrive as one distinct element; it will not get “flattened” as three elements of the array. Similarly, elems is a method to count the number of items of an array (or of a list). Using it on the above array produces the following result:

> say @heterogeneous-array.elems;
5

As you can see, the (1, 2, 4) list “counts” as one single array element.

A list within another list is nested.

An array that contains no elements is called an empty array; you can create one with empty parentheses, ():

> my @empty = ();
[]

This code is really assigning an empty list to the array. But this syntax is usually not needed when creating a new empty array, since just declaring an array without defining it has the same effect:

> my @empty;
[]

So using the empty parentheses (i.e., assigning an empty list) would be needed essentially for resetting an existing array to an empty array.

9.2  Arrays Are Mutable

The syntax for accessing the elements of an array or a list uses the square brackets operator. The expression inside the brackets specifies the index or subscript, which can be a literal integer (or some value that can be coerced into an integer), a variable containing a numerical value, a list or a range of numerical values, an expression or a piece of code returning a numerical value, etc. Indices are offsets compared to the beginning of the array or the list (much in the same way as the values returned by the index function on strings), so that they start at 0. Thus, the first item of an array has the index 0, the second item the index 1, and so on:

say <sat sun>[1];             # -> sun  (accessing a list item)
my @weekdays = <mon tue wed thu fri>;     # assigning an array
say "The third day is @weekdays[2]";      # -> The third day is wed

You may also use ranges or lists of indices to access slices of an array or a list:

> my @even-digits = 0, 2, 4, 6, 8;
[0 2 4 6 8]
> my @small-even_digits = @even-digits[0..2];
[0 2 4]
> my @min-max-even-digits = @even-digits[0, 4]
[0 8]

If you need a slice in the opposite order, you can use the reverse function to reverse the range:

> my @reverse-small-even_digits = @even-digits[reverse 0..2];
[4 2 0]

or reverse the data returned by the slice expression:

> my @reverse-small-even_digits = reverse @even-digits[0..2];
[4 2 0]

Unlike lists, arrays are mutable. When the bracket operator appears after an array on the left side of an assignment, it identifies the element of the array that will be assigned:

> my @even-digits = 0, 2, 2, 6, 8;   # Oops, error on the second 2
[0 2 2 6 8]
> @even-digits[2] = 4; # fixing the faulty third digit
4
> say @even-digits
[0 2 4 6 8]

The third element of -digits, which was (presumably by mistake) 2, is now 4. If the index corresponds to an item which does not exist yet in the array, the array will be expanded to include the new element:

> my @odds = 1, 3, 5;
[1 3 5]
> @odds[3] = 7;
7
> say @odds;
[1 3 5 7]

The elems function or method returns the number of elements of an array. The end function or method returns the index of the last elements of an array:

my @nums = 1..5;      # -> [1 2 3 4 5]
say @nums.elems;      # -> 5
say elems @nums;      # -> 5
say @nums.end;        # -> 4

The end method returns the result of the elems method minus one because, since indices start at 0, the index of the last element is one less than the number of elements.

The unique function or method returns a sequence of unique elements of the input list or array (i.e., it returns the original list without any duplicate values):

> say < a b d c a f d g>.unique;
(a b d c f g)

If you know that the input is sorted (and that, therefore, duplicates are adjacent), use the squish function instead of unique, as this is likely to be more efficient. The squish function removes adjacent duplicates.

To know whether two arrays are identical (structurally the same, with the same type and same values), use the eqv equivalence operator. To know whether they just contain the same elements, use the ~~ smart match operator. Between two arrays or lists, the == numeric equality operator will return True if the arrays have the same number of elements and False otherwise, because == coerces its arguments to numeric type, so that it compares the number of elements:

> my @even1 = 0, 2, 4, 6, 8;
[0 2 4 6 8]
> my @even2 = reverse 8, 6, 4, 2, 0;
[0 2 4 6 8]
> say @even1 eqv @even2           # same items, structurally the same
True
> say <1 2 3 4 5> eqv 1..5;       # same items, structurally different
False
> say <1 2 3 4 5> ~~ 1..5;        # same items, True
True
> my @array = 1..5;               
[1 2 3 4 5]
>  say <1 2 3 4 5> ~~ @array;     # same elements, True
True
>  say <1 2 3 4 6> ~~ @array;     # not the same elements
False
> say <1 2 3 4 5> == <5 6 7 8 9>; # compares the numbers of items
True

The <1 2 3 4 5> eqv 1..5 statement returns False because, although they have the same items, the arguments are structurally different entities (one is a list and the other one a range).

9.3  Adding New Elements to an Array or Removing Some

We’ve just seen that assigning an item to an index that does not exists yet will expand the array. There are other ways of expanding an array.

Perl has operators to add elements to, or remove one element from, an array:

  • shift: removes the first item of an array and returns it;
  • pop: removes the last item of an array and returns it;
  • unshift: adds an item or a list of items to the beginning of an array;
  • push: adds an item or a list of items to the end of an array;

These are a few examples for each:

> my @numbers = <2 4 6 7>;
[2 4 6 7]
> push @numbers, 8, 9;
[2 4 6 7 8 9]
> unshift @numbers, 0, 1;
[0 1 2 4 6 7 8 9]
> my $num = shift @numbers
0
> $num = pop @numbers
9
> say @numbers
[1 2 4 6 7 8]

As you might expect by now, these routines also come with a method invocation syntax. For example:

> my @numbers = <2 4 6 7>;
[2 4 6 7]
> @numbers.push(8, 9)
[2 4 6 7 8 9]

Note, however, that if you push or unshift an array onto another array, you’ll get something different than what you might expect:

> my @numbers = <2 4 6 7>;
[2 4 6 7]
> my @add-array = 8, 10;
[8 10]
> @numbers.push(@add-array);
[2 4 6 7 [8 10]]

As you can see, when @add-array is added as an entity to the @numbers array, @add-array becomes the new last item of the original array. If you want to add the items of @add-array to the original array, you may use the append method instead of push:

> my @numbers = <2 4 6 7>;
[2 4 6 7]
> @numbers.append(@add-array);
[2 4 6 7 8 10]

Or you can use the “|” prefix operator to flatten the added array into a list of arguments:

> my @numbers = <2 4 6 7>;
[2 4 6 7]
> @numbers.push(|@add-array);
[2 4 6 7 8 10]

There is also a prepend method that can replace unshift to add individual items of an array at the beginning of an existing array (instead of adding the array as a single entity).

9.4  Stacks and Queues

Stacks and queues are very commonly used data structures in computer science.

A stack is a last in / first out (LIFO) data structure. Think of piled-up plates. When you put a clean plate onto the stack, you usually put it on top; when you take out one, you also take it from the top. So the first plate that you take is the last one that you added. A CS stack implements the same idea: you use it when the first piece of data you need from a data structure is the last one you added.

A queue, by contrast, is a first in / first out (FIFO) data structure. This is the idea of people standing in a line waiting to pay at the supermarket. The first person that will be served is the first person who entered the queue.

A stack may be implemented with an array and the push and pop functions, which respectively add an item (or several) at the end of an array and take one from the end of the array. This is a somewhat simplistic implementation of a stack:

sub put-in-stack (@stack, $new_item) {
    push @stack, $new_item;
}
sub take-from-stack (@stack) {
    my $item = pop @stack;
    return $item;
}
my @a-stack = 1, 2, 3, 4, 5;
put-in-stack @a-stack, 6;
say @a-stack;
say take-from-stack @a-stack for 1..3;

This example will print this:

[1 2 3 4 5 6]
6
5
4

This stack is simplistic because, at the very least, a more robust implementation should do something sensible when you try to take-from-stack from an empty stack. It would also be wise to add signatures to the subroutines. In addition, you might want to put-in-stack more than one element in one go. Take a look at the solution to the exercise on queues below (Subsection ??) to figure out on how this stack may be improved.

You could obtain the same stack features using the unshift and shift functions instead of push and pop. The items will be added at the beginning of the array and taken from the beginning, but you will still have the LIFO behavior.

As an exercise, try to implement a FIFO queue on the same model. Hint: you probably want to use an array and the unshift and pop functions (or the push and shift functions). Solution: ??.

9.5  Other Ways to Modify an Array

The shift and pop functions remove respectively the first and the last item of an array and return that item. It is possible to do almost the same operation on any item of an array, using the delete adverb:

my @fruit = <apple banana pear cherry pineapple orange>;
my $removed = @fruit[2]:delete; 
say $removed;  # -> pear
say @fruit;    # -> [apple banana (Any) cherry pineapple orange]

Notice that the third element (“pear”) is removed and returned, but the array is not reorganized; the operation leaves a sort of “empty slot”, an undefined item, in the middle of the array. The colon (“:”) syntax used here is the operator for an adverb (we discussed adverbs in Section ?? about regexes); for the time being, you may think of it as a kind of special method operating on one element of an item collection.

We have seen how to use array slices to retrieve several items of an array or a list at a time. The same slice syntax can also be used on the left side of an assignment to modify some elements of an array:

my @digits = <1 2 3 6 5 4 7 8 9>
@digits[2..4] = 4, 5, 6
say @digits;   # -> [1 2 4 5 6 4 7 8 9]

Of course, you can’t do this with lists, since, as you remember, they are immutable.

The splice function may be regarded as the Swiss Army knife of arrays. It can add, remove, and return one or several items to or from an array. The general syntax is as follows:

my @out_array = splice @array, $start, $num_elems, @replacement;

The arguments for splice are the input array, the index of the first element on which to make changes, the number of elements to be affected by the operation, and a list of replacements for the elements to be removed1. For example, to perform the slice assignment shown just above, it is possible to do this:

my @digits = <1 2 3 6 5 4 7 8 9>
my @removed_digits = splice @digits, 3, 3, 4, 5, 6;
say @removed_digits;     # -> [6 5 4]
say @digits;             # -> [1 2 4 5 6 7 8 9]

Here, the splice statement removed three elements (6, 5, 4) and replaced them with the replacement arguments (4, 5, 6). It also returned the removed items into @removed_digits. The number of replacements does not need to be the same as the number of removed items, in which case the array size will grow or shrink. For example, if no replacement is provided, then splice will just remove and return the required number of elements and the array size will be reduced by the same number:

my @digits = 1..9;
my @removed_digits = splice @digits, 3, 2;
say @removed_digits;     # -> [4 5]
say @digits;             # -> [1 2 3 6 7 8 9]

Conversely, if the number of elements to be removed is zero, no element will be removed, an empty array will be returned, and the elements in the replacement list will be added in the right place:

my @digits = <1 2 3 6 4 7 8 9>;
my @removed_digits = splice @digits, 3, 0, 42;
say @removed_digits;     # -> []
say @digits;             # -> [1 2 3 42 6 4 7 8 9]

Assuming the shift function did not exist in Perl, you could write a my-shift subroutine to simulate it:

sub my-shift (@array) {
    my @result = splice @array, 0, 1;
    return @result[0];
}
my @letters = 'a'..'j';
my $letter = my-shift @letters;
say $letter;             # -> a
say @letters;            # -> [b c d e f g h i j]

We might raise an exception if the array passed to my-shift is empty. This could be done by modifying the subroutine as follows:

sub my-shift (@array) {
    die "Cannot my-shift from an empty array" unless @array;
    my @result = splice @array, 0, 1;
    return @result[0];
}

or by adding a nonempty constraint on the array in the subroutine signature:

sub my-shift (@array where @array > 0) {
    my @result = splice @array, 0, 1;
    return @result[0];
}    

The @array > 0 expression evaluates to True if the number of elements of the array is more than 0, i.e., if the array is not empty. It is equivalent to @array.elems > 0.

As an exercise, write subroutines using splice to simulate the pop, unshift, push, and delete built-ins. Solution: ??.

9.6  Traversing a List

The most common way to traverse the elements of a list or an array is with a for loop. The syntax for an array is the same as what we have already seen in earlier chapters for lists:

my @colors = <red orange yellow green blue indigo violet>;
for @colors -> $color {
    say $color;
}

This works well if you only need to read the elements of the list. But if you want to write or update the elements of an array, you need a doubly pointy block. For example, you might use the tc (title case) function to capitalize the first letter of each word of the array:

my @colors = <red orange yellow green blue indigo violet>;
for @colors <-> $color {$color = tc $color};
say @colors;   # -> [Red Orange Yellow Green Blue Indigo Violet]

Here the $color loop variable is a read-and-write alias on the array’s items, so that changes made to this alias will be reflected in the array. This works well with arrays, but would not work with lists, which are immutable. You would get an error with a list:

> for <red orange yellow> <-> $color { $color = tc $color}
Parameter '$color' expected a writable container, but got Str value...

You may also use the syntax of a for loop with the $_ topical variable. For example, this uses the uc (upper case) function to capitalize each word of the previous array:

for @colors { 
    $_ = $_.uc 
}
say @colors; # -> [RED ORANGE YELLOW GREEN BLUE INDIGO VIOLET]

Sometimes, you want to traverse an array and need to know the index of the elements you are visiting. A common way to do that is to use the .. range operator to iterate on the indices. For instance, to print the index and the value of each element of an array:

for 0..@colors.end -> $idx { 
    say "$idx  @colors[$idx]"; 
}

This is useful, for example, for traversing two (or more) arrays in parallel:

my @letters = 'a'..'e';
my @numbers = 1..5;
for 0..@letters.end -> $idx { 
    say "@letters[$idx] -> @numbers[$idx]"; 
}

This will print:

a -> 1
b -> 2
c -> 3
d -> 4
e -> 5

You don’t really need to specify the index range yourself, as the keys function will return a list of indices for the array or the list:

for keys @colors -> $idx { 
    say "$idx  @colors[$idx]"; 
}

Another way to iterate over the indices and values of an array is the kv (“keys values”) function or method, which returns the index and value of each array item:

for @letters.kv -> $idx, $val { 
    say "$idx $val";
}

In list context, @letters.kv simply returns an interleaved sequence of indexes and values:

my @letters = 'a'..'e';
say @letters.kv;    # -> (0 a 1 b 2 c 3 d 4 e)

It is the pointy block with two iteration variables that makes it possible to process both an index and a value at each step of the loop. You can of course have more than two iteration variables if needed.

9.7  New Looping Constructs

Since the subject of this chapter is arrays and lists, it is probably the right time to briefly study two looping constructs that I had left aside so far.

The first one uses the same for keyword as above, but with a different syntax for the iteration variable(s):

my @letters = 'a'..'e';
for @letters { 
    say $^a-letter; 
}

The ^ in the $^a-letter variable is called a twigil, i.e., sort of a secondary sigil. When there is a twigil, the first symbol (here, the $ sign) has the same meaning as usual sigils (here, it denotes a scalar variable), and the second one (here, ^) extends the variable description and usually modifies its scope. In this specific case, the second character states that the $^a-letter variable is a placeholder parameter or a self-declared positional parameter. This is a positional parameter of the current block that needs not be declared in the signature.

If the block uses more than one placeholder, they are associated to the input according to their lexicographic (alphabetic) order:

my @letters = 'a'..'e';
for @letters.kv { 
    say "$^a -> $^b"; 
}

This will print:

0 -> a
1 -> b
2 -> c
3 -> d
4 -> e

As seen just above, the kv function returns an interleaved sequence of indexes and values. Since $^a comes before $^b in the alphabetic order, $^a will be bound to the index and $^b with the value for each pair of the input.

Placeholders can also be used for subroutines:

> sub divide { $^first / $^second }
sub divide ($first, $second) { #`(Sub|230787048) ... }
> divide 6, 4
1.5

These placeholders aren’t used very often for simply traversing arrays, but we will see later how they are very useful in cases where is would be quite unpractical to have to declare the parameters.

The second new looping construct I want to introduce here uses the loop keyword and is similar to the C-style for loop (i.e., the loop of the C programming language). In this type of loop, you declare between a pair of parentheses three expressions separated by semi-colons: the iteration variable’s initial value, the condition on which the loop should terminate, and the change made to the iteration variable on each iteration:

loop (my $i = 0; $i < 5; $i++) {
    say $i, " -> " ~ @letters[$i];
}

For most common loops, the for loops seen earlier are easier to write and usually more efficient than this construct. This special loop construct should probably be used only when the exit condition or the change made to the iteration variable is quite unusual and would be difficult to express in a regular for loop. As an exception, the loop construct with no three-part specification is quite common and even idiomatic for making an infinite loop:

loop {
    # do something 
    # last if ...
}

9.8  Map, Filter and Reduce

When traversing the elements of an array (or a list) so far, we have processed so far one item at a time with a loop. We will now study ways to process all the elements in one go.

9.8.1  Reducing a List to a Value

To add up all the numbers in a list, you can use a for loop like this:

sub add_all (@numbers) {
    my $total = 0;
    for @numbers -> $x {
        $total += $x;
    }
    return $total;
}

$total is initialized to 0. Each time through the loop, $x gets one element from the list and is added to $total. As the loop runs, total accumulates the sum of the elements; a variable used this way is sometimes called an accumulator.

An operation like this that combines a sequence of elements into a single value is often called a reduction operation because its effect is to reduce all the items to one element (this is also sometimes called “folding” in some other programming languages). These ideas are derived from functional programming languages such as LISP (whose name stands for “list processing”).

Perl 6 has a reduce function, which generates a single "combined" value from a list of values, by iteratively applying to each item of a list a function that knows how to combine two values. Using the reduce function to compute the sum of the first ten numbers might look like this:

> my $sum = reduce { $^a + $^b }, 1..10;
55

Remember the factorial function of Section ??? It used a for loop to compute the product of the n first integers up to a limit. It could be rewritten as follows using the reduce function:

sub factorial (Int $num) { 
    return reduce { $^a * $^b }, 1..$num;
}
say factorial 10;   # -> 3628800

In fact, the code to compute the factorial is so short with the reduce function that it may be argued that it has become unnecessary to write a subroutine for that. You could just “inline” the code:

my $fact10 = reduce { $^a * $^b }, 1..10;     # -> 3628800

We can do many more powerful things with that, but we’ll come back to that later, as it requires a few syntactic features that we haven’t seen yet.

9.8.2  The Reduction Metaoperator

Perl 6 also has a reduction operator, or rather a reduction metaoperator. An operator usually works on variables or values; a metaoperator acts on other operators. Given a list and an operator, the [...] metaoperator iteratively applies the operator to all the values of the list to produce a single value.

For example, the following also prints the sum of all the elements of a list:

say [+] 1, 2, 3, 4;           # ->  10

This basically takes the first two values, adds them up, and adds the result to the next value, and so on. Actually, there is a form of this operator, with a backslash before the operator, which also returns the intermediate results:

say [\+] 1, 2, 3, 4;          # -> (1 3 6 10)

This metaoperator can be used to transform basically any associative infix operator2 into a list operator returning a single value.

The factorial function can now be rewritten as:

sub fact(Int $x){
    [*] 1..$x; 
}
my $factorial = fact(10);     # -> 3628800

The reduction metaoperator can also be used with relational operators to check whether the elements of an array or a list are in the correct numerical or alphabetical order:

say [<] 3, 5, 7;              # -> True
say [<] 3, 5, 7, 6;           # -> False
say [lt] <a c d f r t y>;     # -> True

9.8.3  Mapping a List to Another List

Sometimes you want to traverse one list while building another. For example, the following function takes a list of strings and returns a new list that contains capitalized strings:

sub capitalize_all(@words):
    my @result;
    push @result, $_.uc for @words;
    return @result;
}
my @lc_words = <one two three>;
my @all_caps = capitalize_all(@lc_words); # -> [ONE TWO THREE]

@result is declared as an empty array; each time through the loop, we add the next element. So @result is another kind of accumulator.

An operation like capitalize_all is sometimes called a map because it “maps” a function (in this case the uc method) to each of the elements in a sequence.

Perl has a map function that makes it possible to do that in just one statement:

my @lc_words = <one two three>;
my @all_caps = map { .uc }, @lc_words;      # -> [ONE TWO THREE]

Here, the map function applies the uc method to each item of the @lc_words array and returns them into the @all_caps array. More precisely, the map function iteratively assigns each item of the @lc_words array to the $_ topical variable, applies the code block following the map keyword to $_ in order to create new values, and returns a list of these new values.

To generate a list of even numbers between 1 and 10, we might use the range operator to generate numbers between 1 and 5 and use map to multiply them by two:

my @evens = map { $_ * 2 }, 1..5;  # -> [2 4 6 8 10]

Instead of using the $_ topical variable, we might also use a pointy block syntax with an explicit iteration variable:

my @evens = map -> $num { $num * 2 }, 1..5;  # -> [2 4 6 8 10]

or an anonymous block with a placeholder variable:

my @evens = map { $^num * 2 }, 1..5;  # -> [2 4 6 8 10]

Instead of a code block, the first argument to map can be a code reference (a subroutine reference):

sub double-sq-root-plus-one (Numeric $x) { 
    1 + 2 * sqrt $x;
}
my @results = map &double-sq-root-plus-one, 4, 16, 42;
say @results;     # -> [5 9 13.9614813968157]

The subroutine name needs to be prefixed with the ampersand sigil to make clear that it is a parameter to map and not a direct call of the subroutine.

If the name of the array on the left side and on the right side of the assignment is the same, then the modification seems to be made “in place,” i.e., it appears as if the original array is modified in the process.

This is an immensely powerful and expressive function; we will come back to it later.

9.8.4  Filtering the Elements of a List

Another common list operation is to select some elements from a list and return a sublist. For example, the following function takes a list of strings and returns a list that contains only the strings containing a vowel:

sub contains-vowel(Str $string) {
    return True if $string ~~ /<[aeiouy]>/;
}
sub filter_words_with_vowels (@strings) {
    my @kept-string;
    for @string -> $st { 
        push @kept-string, $st if contains-vowel $st;
    }
    return @kept-string;
}  

contains-vowel is a subroutine that returns True if the string contains at least one vowel (we consider “y” to be a vowel for our purpose).

The filter_words_with_vowels subroutine will return a list of strings containing at least one vowel.

An operation like filter_words_with_vowels is called a filter because it selects some of the elements and filters out the others.

Perl has a function called grep to do that in just one statement:

my @filtered = grep { /<[aeiouy]>/ }, @input;

The name of the grep built-in function used to filter some input comes from the Unix world, where it is a utility that filters the lines that match a given pattern from an input file.

In the code example above, all of @input strings will be tested against the grep block, and those matching the regex will go into the array. Just like map, the grep function iteratively assigns each item of the @input array to the $_ topical variable, applies the code block following the grep keyword to $_, and returns a list of the values for which the code block evaluates to true. Here, the code block is a simple regex applied to the $_ variable.

Just as for map, we could have used a function reference as the first argument to grep:

my @filtered = grep &contains-vowel, @input;

To generate a list of even numbers between 1 and 10, we might use the range operator to generate numbers between 1 and 10 and use grep to filter out odd numbers:

my @evens = grep { $_ %% 2 }, 1..10;  # -> [2 4 6 8 10]

As an exercise, write a program using map to produce an array containing the square of the numbers of the input list and a program using grep to keep only the numbers of an input list that are perfect squares. Solution: ??.

Most common list operations can be expressed as a combination of map, grep, and reduce.

9.8.5  Higher Order Functions and Functional Programming

Besides their immediate usefulness, the reduce, map and grep functions we have been using here do something qualitatively new. The arguments to these functions are not just data: their first argument is a code block or a function. We are not only passing to them the data that they will have to use or transform, but we are also passing the code that will process the data.

The reduce, map, and grep functions are what are often called higher-order functions, functions that manipulate not only data, but also other functions. These functions can be thought of as generic abstract functions— they perform a purely technical operation: process the elements of a list and apply to each of them a behavior defined in the code block or the function of the first parameter.

These ideas are to a large extent rooted in functional programming, a programming paradigm that is very different from what we have seen so far and that has been implemented historically in languages such as Lisp, Caml, Ocaml, Scheme, Erlang, or Haskell. Perl 6 is not a functional programming language in the same sense as these languages, because it can also use other programming paradigms, but it has incorporated most of their useful features, so that you can use the expressive power and inherent safety of this programming model, without being forced to do so if and when you would prefer a different model, and without having to learn a totally new syntax that may sometimes look somewhat abstruse or even clunky.

This is immensely useful and can give you an incredible expressive power for solving certain types of problems. But other types of problems might be better solved with the more “traditional” procedural or imperative programming model, while others may benefit from an object-oriented approach. Perl 6 lets you choose the programming model you want to use, and even makes it possible to combine seamlessly several of them in the same program.

Functional programming is so important in my opinion that a full chapter of this book will be devoted to the functional programming features of Perl (see Chapter ??). Before that, make sure to read the Subsection ?? in the array and list section of the exercise solution chapter.

9.9  Fixed-Size, Typed and Shaped Arrays

By default, arrays can contain items of any type, including items of different types, and can auto-extend as you need. Perl will take care of the underlying gory details for you, so that you don’t have to worry about them. This is very practical but also comes with a cost: some array operations might be unexpectedly slow, because Perl may have to perform quite a bit of house-cleaning behind the scene, such as memory allocation or reallocation, copying a full array within memory, etc.

In some cases, however, it is possible to know beforehand the size of an array and the data type of its items. If Perl knows about these, it might be able to work faster and to use much less memory. It might also helps you to prevent subtle bugs.

To declare the type of the elements of an array, just specify it when declaring the array. For example, to declare an array of integers:

> my Int @numbers = 1..20;
[1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20]
> @numbers[7] = 3.5;     # ERROR
Type check failed in assignment to @numbers; expected Int but got Rat
  in block <unit> at <unknown file> line 1

Similarly, you can declare the size of an array. Such arrays are sometimes called shaped arrays. There are twelve months in a year, so you might tell Perl that your @months array will never contain more that twelve items:

> my @months[12] = 1..7;
[1 2 3 4 5 6 7 (Any) (Any) (Any) (Any) (Any)]
> say @months.elems
12
> say @months[3];
4
> say @months[12];
Index 12 for dimension 1 out of range (must be 0..11)

Here, Perl has allocated 12 “slots” to the array, even though the last five are currently undefined. Perl may not need to reallocate memory when you define the tenth item of the array. And Perl tells you about your mistake if you accidentally try to access an out-of-range item.

Defining both the type of the elements and the maximal size of the array may lead to a noticeable performance gain in terms of execution speed (at least for some operations) and reduce significantly the memory usage of the program, especially when handling large arrays.

9.10  Multidimensional Arrays

The arrays we have seen so far are one-dimensional. In some languages, such arrays are called vectors. But arrays can also be multidimensional (you may then call them matrices).

For example, you might use a two-dimensional array to store a list of employees with their respective salaries:

> my @employees;
[]
> @employees[0;0] = "Liz";
Liz
> @employees[0;1] = 3000;
3000
> @employees[1] = ["Bob"; 2500];
[Bob 2500]
> @employees[2] = ["Jack"; 2000];
[Jack 2000]
> @employees[3] = ["Betty"; 1800];
[Betty 1800]
> say @employees[1;1];
2500
> say @employees[2];
[Jack 2000]
> say @employees;
[[Liz 3000] [Bob 2500] [Jack 2000] [Betty 1800]]

It is possible to have more than two dimensions. For example, we could have a tridimensional matrix to store the temperatures in a chemical reactor, measured in various locations identified by their x, y, and z coordinates:

my @temp;
@temp[1;3;4] = 80;

For this type of data, however, it is often easier to use the data structure that we will cover in the next chapter, hashes.

Multidimensional arrays can also have a fixed size. For example, this may be a declaration for two-dimensional array where the first dimension is the month in the year and the second the day in the month:

my @date[12, 31];

9.11  Sorting Arrays or Lists

Sorting data is a very common operation in computer science. Perl has a sort function that can sort an array or a list and return the sorted result:

say sort <4 6 2 9 1 5 11>;  # -> (1 2 4 5 6 9 11)

There are several types of sorts. The most common are numeric sort and lexicographic (or alphabetic) sort. They differ in the way they compare individual items to be sorted.

In alphabetic sort, you first compare the first letter of the words to be compared; a word starting with an “a” will always come before a word starting with a “b” (or any other letter) in an ascending sort, irrespective of the value or number of the other characters. You need to compare the second character of two words only if the first character of these words is the same.

Numeric sorting is very different: it is the overall value of the number that is of interest. For example, if we are sorting integers, 11 is larger than 9 because it has more digits. But alphabetic sorting of 9 and 11 would consider 11 to be smaller than 9, because the first digit is smaller.

So an alphabetic or lexicographic sort of the list of integers above would return:

(1 11 2 4 5 6 9)

The consequence is that, with many programming languages, when you want to sort data, you need to specify which type of sort you want. With consistent data (every item of the same type), Perl 6 is usually clever enough to find out which type of sort is best suited to your needs. So, for example, this code will perform the type of sort that you probably expect:

say sort <ac a bc ab abc cb ca>; # ->(a ab abc ac bc ca cb)

Even with mixed data types, sort can do a pretty good job at providing a result that may very well be what you are looking for:

say sort <1 b 11 5 cb 4 12 a ab abc 42 ac bc ca >;
         # -> (1 4 5 11 12 42 a ab abc ac b bc ca cb)

There are cases, however, where this simple use of the sort function will fail to return what you probably want:

say sort <a ab abc A bc BAC AC>; # -> (A AC BAC a ab abc bc)

Here, sort puts all strings starting with an uppercase letter before any string starting with a lower case letter, probably not what you want. It looks even worse if the strings use extended ASCII characters:

say sort <a ab àb abc Ñ A bc BAC AC>;
        # -> (A AC BAC a ab abc bc Ñ àb)

The reason is that, when sorting strings, sort uses the internal numeric encoding of letters. This was sometimes called "ASCIIbetical" order (by contrast with alphabetical order), but the term is now too limited and somewhat obsolete, because Perl 6 is using Unicode rather than ASCII.

Clearly, these are cases where more advanced sorting techniques are needed.

9.12  More Advanced Sorting Techniques

The sort routine typically takes two arguments, a code object and a list of items to be sorted, and returns a new sorted list. If no code object is specified, as in the examples we have seen above, the cmp built-in comparison operator is used to compare the elements. If a code object is provided (and if it accepts two arguments), then it is used to perform the comparison, which tells sort which of the two elements should come first in the final order.

There are three built-in comparison operators that can be used for sorting. They are sometimes called three-way comparators because they compare their operands and return a value meaning that the first operand should be considered less than, equal to or more than the second operand for the purpose of determining in which order these operands should be sorted. The leg operator coerces its arguments to strings and performs a lexicographic comparison. The <=> operator coerces its arguments to numbers (Real) and does a numeric comparison. The aforementioned cmp operator is the “smart” three-way comparator, which compares strings with string semantics and numbers with number semantics.

Most of our simple examples above worked well with strings and numbers because they implicitly used the default cmp operator, which “guesses” quite well which type of comparison to perform.

In other words, this:

say sort <4 6 2 9 1 5 11>;  # -> (1 2 4 5 6 9 11)

is equivalent to this:

say sort { $^a cmp $^b }, <4 6 2 9 1 5 11>;
     # -> (1 2 4 5 6 9 11)

The code block used here as the first argument to the sort routine uses the placeholder parameters (or self-declared parameters) seen earlier in this chapter. The cmp routine receives two arguments that are bound to $^a and $^b and returns to the sort function information about which of the two items should come first in the resulting order.

If you wanted to sort in reverse order, you could just swap the order of the two placeholder parameters:

say sort { $^b cmp $^a }, <4 6 2 9 1 5 11>;
     # -> (11 9 6 5 4 2 1)

Note that this example is given only for the purpose of explaining some features of the placeholder parameters. To sort the array we’ve presented here in descending order, it might just be easier to obtain the same result with the following code:

say reverse sort <4 6 2 9 1 5 11>;  # -> (11 9 6 5 4 2 1)

The reason sort does a good job even with mixed strings and integers is because the default comparison function, cmp, is pretty clever and guesses by looking at its arguments whether it should perform a lexicographic order or numeric order comparison.

If sorting gets too complicated for cmp, or, more generally, when a specific or custom order is required, then you have to write your own ad-hoc comparison subroutine.

For example, if we take again the example of strings with mixed-case letters, we may achieve a case-insensitive alphabetical order this way:

say sort { $^a.lc cmp $^b.lc}, <a ab abc A bc BAC AC>;
     # -> (a A ab abc AC BAC bc)

or this way:

say sort { $^a.lc leg $^b.lc}, <a ab abc A bc BAC AC>;
     # -> (a A ab abc AC BAC bc)

Here, when the comparison code block receives its two arguments, the lc method casts them to lowercase before performing the comparison. Notice that this has no impact on the case of the output, since the lowercase transformation is local to the comparison code block and has no impact on the data handled by sort. We will see shortly that there is a simpler and more efficient way of doing such a transformation before comparing the arguments.

If the comparison specification is more complicated, we may need to write it in a separated subroutine and let sort call that subroutine. Suppose we have a list of strings that are all formed of leading digits followed by a group of letters and possibly followed by other irrelevant characters, and that we want to sort the strings according to the group of letters that follows the digits.

Let’s start by writing the comparison subroutine:

sub my_comp ($str1, $str2) {
    my $cmp1 = $0 if $str1 ~~ /\d+(\w+)/; 
    my $cmp2 = $0 if $str2 ~~ /\d+(\w+)/; 
    return $cmp1 cmp $cmp2;
}

Nothing complicated: it takes two arguments, uses a regex for extracting the group of letters in each of the arguments, and returns the result of the cmp function on the extracted strings. In the real world, something might need to be done if either of the extractions fails, but we will assume for our purposes here that this will not happen.

The sorting is now quite straight forward, we just need to pass the above subroutine to the sort function:

say sort &my_comp, < 22ac 34bd 56aa3 12c; 4abc( 1ca 45bc >;
     # -> (56aa3 4abc( 22ac 45bc 34bd 12c; 1ca)

We only need to prefix the comparison subroutine with the “&” ampersand sigil and it works fine: the strings are sorted in accordance to the letter groups that follow the leading digits.

In all the examples above, the comparison subroutine accepted two parameters, the two items to be compared. The sort function may also work with a code object taking only one parameter. In that case, the code object is not a comparison code block or subroutine, but is a code object implementing the transformation to be applied to the items before using the default cmp comparison routine.

For example, if we take once more the example of strings with mixed-case letters, we may achieve a case-insensitive alphabetical order yet in a new way:

say sort { $_.lc }, <a ab abc A bc BAC AC>;
     # -> (a A ab abc AC BAC bc)

This could also be written with a placeholder parameter:

say sort { $^a.lc }, <a ab abc A bc BAC AC>;
     # -> (a A ab abc AC BAC bc)

Here, since the comparison code block takes only one argument, it is meant to transform each of the items to be compared before performing the standard cmp routine on the arguments. This not only makes things simpler, but is also probably more efficient, especially if the number of items to be sorted is large and if the transformation subroutine is relatively costly: the transformed values are actually cached (i.e., stored in memory for repeated use), so that the transformation is done only once for each item, despite the fact that the comparison routine is called many times for each item in a sort.

Similarly, we could sort numbers according to their absolute values:

say sort {$_.abs}, <4 -2 5 3 -12 42 8 -7>; # -> (-2 3 4 5 -7 8 -12 42)

If you think about it, the “more complicated” example with digits and letters requiring a separate subroutine is also applying the same transformation to both its arguments. As an exercise, write a (simpler) sorting program using a transformation subroutine and the default cmp operator on transformed items. Solution: ??.

Needless to say, the (so-called) advanced uses of the sort function presented in this section are yet more examples of the functional programming style. The comparison subroutines and the transformation subroutines are passed around as arguments to the sort function, and, more broadly, all of the functions, subroutines, and code blocks used here are higher-order functions or first-class functions.

9.13  Debugging

Careless use of arrays (and other mutable objects) can lead to long hours of debugging. Here are some common pitfalls and ways to avoid them:

  1. Some array built-in functions and methods modify their argument(s) and others don’t.

    It may be tempting to write code like this:

    @array = splice @array, 1, 2, $new_item;   # WRONG!
    

    The splice function returns the elements it has deleted from the array, not the array itself, which is modified “in place.”

    Before using array methods and operators, you should read the documentation carefully and perhaps test them in interactive mode.

    When traversing an array, for example with for or map, the $_ topical variable is an alias for the successive items of the array, and not a copy of them. This means that if you change $_, the change will be reflected in the array. There may be some cases where this is what you want, and others where you don’t care (if you no longer need the original array), but this technique is error-prone and should perhaps be avoided (or at least used only with great care).

    my @numbers = <1 2 3>;
    push @doubles, $_*=2 for @numbers;  # WRONG (probably)
    say @numbers; # -> [2 4 6]
    

    The error here is that the $_*=2 statement is modifying $_, so that the @numbers array is also modified, whereas the intent was certainly to populate the new numbers into @doubles, not to modify @numbers.

    The same code applied to a literal list instead of an array leads to a run time error because a list is immutable:

    > push @doubles, $_*=2 for <1 2 3>; # WRONG (definitely)
    Cannot assign to an immutable value
    

    The fix is quite easy in this case and consists of using an expression that does not modify $_ but returns the new desired value:

    push @doubles, $_ * 2 for @numbers; # OK
    

    The same goes for map:

    my @numbers = <1 2 3>;
    say map { ++$_}, @numbers;          # WRONG (probably)
    say @numbers; # -> [2 3 4]
    

    Here again, using an expression that does not modify $_ but instead returns the new desired value will fix the problem:

    my @numbers = <1 2 3>;
    say map { $_ + 1}, @numbers;        # -> (2 3 4)
    say @numbers; # -> [1 2 3]
    

9.14  Glossary

List
An immutable sequence of values.
Array
A variable containing a mutable sequence of values.
Element
One of the values in a list or an array (or some other sequence), also called items.
Nested array
An array that is an element of another array.
Accumulator
A variable used in a loop to add up or accumulate a result.
Reduce
A processing pattern that traverses a sequence and accumulates the elements into a single result.
Map
A processing pattern that traverses a sequence and performs an operation on each element. Also the name of a Perl built-in function that performs such a processing pattern.
Filter
A processing pattern that traverses a list and selects the elements that satisfy some criterion. grep is a Perl implementation of a filter.
Alias
A circumstance where an identifier refers directly to some variable or value, so that a change to it would lead to a change to the variable or value. It essentially means having two names for the same value, container or object.

9.15  Exercises

Exercise 1  

Write a subroutine called nested-sum that takes an array of arrays of integers and adds up the elements from all of the nested arrays. For example:

[fontshape=up]
my @AoA = [[1, 2], [3], [4, 5, 6]];
say nested-sum(@AoA);        # -> 21

Solution: ??.

Exercise 2  

Write a subroutine called cumul-sum that takes a list of numbers and returns the cumulative sum, that is, a new list where the ith element is the sum of the first i+1 elements from the original list. For example:

[fontshape=up]
my @nums = [1, 2, 3, 4];
say cumul-sum(@nums);           # -> [1, 3, 6, 10]

Solution: ??.

Exercise 3  

Write a subroutine called middle that takes a list and returns a new list that contains all but the first and last elements. For example:

[fontshape=up]
say middle(1, 2, 3, 4);      # -> (2, 3)

Solution: ??.

Exercise 4  

Write a subroutine called chop-it that takes an array, modifies it by removing the first and last elements, and returns nothing useful. For example:

[fontshape=up]
my @nums = 1, 2, 3, 4;
chop-it(@nums);
say @nums;                   # -> [2, 3]

Solution: ??.

Exercise 5   Write a subroutine called is-sorted that takes a list (or array) of numbers as a parameter and returns True if the list is sorted in ascending order and False otherwise. For example:
[fontshape=up]
> is-sorted (1, 2, 2);
True
> is-sorted (1, 2, 1);
False

Solution: ??.

Exercise 6  

Two words are anagrams if you can rearrange the letters from one to spell the other. Write a subroutine called is-anagram that takes two strings and returns True if they are anagrams.

Solution: ??.

Exercise 7  

Write a subroutine called has-duplicates that takes a list or an array and returns True if there is any element that appears more than once. It should not modify the original input.

Solution: ??.

Exercise 8  

This exercise pertains to the so-called Birthday Paradox, which you can read about at http://en.wikipedia.org/wiki/Birthday_paradox.

If there are 23 students in your class, what are the chances that two of you have the same birthday? You can estimate this probability by generating random samples of 23 birthdays and checking for duplicates. Hint: you can generate random birthdays with the rand and the int functions.

Solution: ??.

Exercise 9  

Write a subroutine that reads the file words.txt and builds a list with one element per word. Write two versions of this function, one using the push method and the other using the idiom unshift. Which one takes longer to run? Why?

Solution: ??.

Exercise 10  

To check whether a word is in our standard word list, you could check each element in turn, but it would be slow because it searches through the words in order.

If the words are in alphabetical order (which is the case of our word list), we can speed things up considerably with a bisection search (also known as binary search), which is similar to what you do when you look a word up in the dictionary. You start somewhere in the middle and check to see whether the word you are looking for comes before the word in the middle of the list. If so, you search the first half of the list the same way. Otherwise, you search the second half.

Either way, you cut the remaining search space in half. If the word list has 113,809 words, it will take at most about 17 steps to find the word or conclude that it’s not there.

Write a function called bisect that takes a sorted list and a target value and returns information about whether the target value is in the list or not.

Solution: ??

Exercise 11  

Two words are a “reverse pair” if each is the reverse of the other. For example, “depot” and “toped” form a reverse pair; other examples include “reward” and “drawer”, or “desserts” and “stressed.” Write a program that finds all the reverse pairs in the word.txt file.

Solution: ??.

Exercise 12  

Two words “interlock” if taking alternating letters from each forms a new word. For example, “shoe” and “cold” interlock to form “schooled.”

Write a program that finds in word.txt all pairs of words that interlock. Hint: don’t enumerate all pairs, there are many of them!

Solution: ??

Credit: this exercise is inspired by an example at http://puzzlers.org.


1
Notice that the splice function on arrays has almost the same syntax as the substr function on strings. This may make it easier to understand them and remember their syntax.
2
An infix operator is an operator that is placed between its two operands.

Are you using one of our books in a class?

We'd like to know about it. Please consider filling out this short survey.


Think DSP

Think Java

Think Bayes

Think Python 2e

Think Stats 2e

Think Complexity


Previous Up Next