This HTML version of Think Perl 6 is provided for convenience, but it is not the best format of the book. You might prefer to read the PDF version. Chapter 9 Arrays and ListsThis chapter presents some of Perl’s most useful built-in types, arrays and lists. 9.1 Lists and Arrays Are SequencesLike strings, lists and arrays are sequences of values. In a string, the values are characters; in a list or in an array, they can be any type. The values in a list or in an array are called elements or sometimes items. There are several important differences between lists and arrays. The main ones are that lists are ordered and immutable collections of items: you can’t change the number of items in a list and you can’t change the individual items either. Arrays, by contrast, are variables and are generally mutable: you can add elements to an array, or remove elements from it. And you can access the individual elements of an array and modify them. For this to be possible, arrays usually have a name (as do other variables), although some arrays are anonymous, which means that they have no name, but have some other ways of accessing them. A list is also ephemeral (unless it is assigned to a variable or some other thing): it ceases to exist as soon as it has been used, usually as soon as the program control flow goes to the next code line. An array, on the other hand, has some form of persistence: you may be able to use it somewhere else in the program if the variable containing it is still within scope. There are several ways to create a new list; the simplest is to enumerate its values, separated by commas: > 3, 4, 5 (3 4 5) > say (3, 4, 5).WHAT; (List) say $_ for 1, 2, 3; 1 2 3 You don’t need parentheses to create a list, but they are often useful to delimit it, i.e., to stipulate where it starts and where it ends, and, in some cases, to override precedence. We used lists earlier in this book. If we write: > print "$_ " for 1, 3, 5, 9, "\n"; 1 3 5 9 > > print "$_ " for 1..10; 1 2 3 4 5 6 7 8 9 10 > we are basically creating and using a list of integers (from
the standpoint of the type hierarchy of Perl; this observation is
not entirely accurate technically for the second example, since
Arrays are variables whose names start with the sigil > my @odd_digits = 1, 3, 5, 7, 9; [1 3 5 7 9] > say @odd_digits.WHAT; (Array) > my @single_digit_numbers = 0..9; [0 1 2 3 4 5 6 7 8 9] Under the Perl REPL, an array is displayed between square
brackets ( If the items don’t contain any space characters, it is quite handy
to construct a list (and assign it to an array if needed) using
the > my @weekdays = <mon tue wed thu fri>; [mon tue wed thu fri] > my @weekend = <sat sun>; [sat sun] The advantage of this construction is that there is no need to separate the items with commas and no need to insert them between quotes when the items are strings. Basically, the quote-word operator breaks up its content on whitespace and returns a list of words, which can then be used in a loop or assigned to an array as in the example above. Most of the rest of this chapter will be devoted to arrays rather than lists, but keep in mind that many of the array functions and operators we will study here also work on lists (at least most of those that would not violate the immutability property of lists). The items of an array (or a list) don’t need to be of the same type: > my @heterogeneous-array = 1, 2.3, pi, "str", (1, 2, 4); [1 2.3 3.14159265358979 str (1 2 4)] Here, the array is composed of an integer, a rational, a float (Num type), a string, and a list of three integers. It may not be advisable for the sake of the developer’s mental sanity to use an array with such wildly heterogeneous items, but Perl will not complain about that: it is up to you to make sense of your data. The array above even contains a list of items. If you iterate over the elements of this array for example with a for loop statement, this list will arrive as one distinct element; it will not get “flattened” as three elements of the array. Similarly, elems is a method to count the number of items of an array (or of a list). Using it on the above array produces the following result: > say @heterogeneous-array.elems; 5 As you can see, the A list within another list is nested. An array that contains no elements is
called an empty array; you can create one with empty
parentheses, > my @empty = (); [] This code is really assigning an empty list to the array. But this syntax is usually not needed when creating a new empty array, since just declaring an array without defining it has the same effect: > my @empty; [] So using the empty parentheses (i.e., assigning an empty list) would be needed essentially for resetting an existing array to an empty array. 9.2 Arrays Are MutableThe syntax for accessing the elements of an array or a list uses the square brackets operator. The expression inside the brackets specifies the index or subscript, which can be a literal integer (or some value that can be coerced into an integer), a variable containing a numerical value, a list or a range of numerical values, an expression or a piece of code returning a numerical value, etc. Indices are offsets compared to the beginning of the array or the list (much in the same way as the values returned by the index function on strings), so that they start at 0. Thus, the first item of an array has the index 0, the second item the index 1, and so on: say <sat sun>[1]; # -> sun (accessing a list item) my @weekdays = <mon tue wed thu fri>; # assigning an array say "The third day is @weekdays[2]"; # -> The third day is wed You may also use ranges or lists of indices to access slices of an array or a list: > my @even-digits = 0, 2, 4, 6, 8; [0 2 4 6 8] > my @small-even_digits = @even-digits[0..2]; [0 2 4] > my @min-max-even-digits = @even-digits[0, 4] [0 8] If you need a slice in the opposite order, you can use the reverse function to reverse the range: > my @reverse-small-even_digits = @even-digits[reverse 0..2]; [4 2 0] or reverse the data returned by the slice expression: > my @reverse-small-even_digits = reverse @even-digits[0..2]; [4 2 0] Unlike lists, arrays are mutable. When the bracket operator appears after an array on the left side of an assignment, it identifies the element of the array that will be assigned: > my @even-digits = 0, 2, 2, 6, 8; # Oops, error on the second 2 [0 2 2 6 8] > @even-digits[2] = 4; # fixing the faulty third digit 4 > say @even-digits [0 2 4 6 8] The third element of -digits, which was (presumably by mistake) 2, is now 4. If the index corresponds to an item which does not exist yet in the array, the array will be expanded to include the new element: > my @odds = 1, 3, 5; [1 3 5] > @odds[3] = 7; 7 > say @odds; [1 3 5 7] The elems function or method returns the number of elements of an array. The end function or method returns the index of the last elements of an array: my @nums = 1..5; # -> [1 2 3 4 5] say @nums.elems; # -> 5 say elems @nums; # -> 5 say @nums.end; # -> 4 The end method returns the result of the elems method minus one because, since indices start at 0, the index of the last element is one less than the number of elements. The unique function or method returns a sequence of unique elements of the input list or array (i.e., it returns the original list without any duplicate values): > say < a b d c a f d g>.unique; (a b d c f g) If you know that the input is sorted (and that, therefore, duplicates are adjacent), use the squish function instead of unique, as this is likely to be more efficient. The squish function removes adjacent duplicates. To know whether two arrays are identical (structurally the
same, with the same type and same values), use the eqv equivalence
operator. To know whether they just contain the same elements, use
the > my @even1 = 0, 2, 4, 6, 8; [0 2 4 6 8] > my @even2 = reverse 8, 6, 4, 2, 0; [0 2 4 6 8] > say @even1 eqv @even2 # same items, structurally the same True > say <1 2 3 4 5> eqv 1..5; # same items, structurally different False > say <1 2 3 4 5> ~~ 1..5; # same items, True True > my @array = 1..5; [1 2 3 4 5] > say <1 2 3 4 5> ~~ @array; # same elements, True True > say <1 2 3 4 6> ~~ @array; # not the same elements False > say <1 2 3 4 5> == <5 6 7 8 9>; # compares the numbers of items True The 9.3 Adding New Elements to an Array or Removing SomeWe’ve just seen that assigning an item to an index that does not exists yet will expand the array. There are other ways of expanding an array. Perl has operators to add elements to, or remove one element from, an array:
These are a few examples for each: > my @numbers = <2 4 6 7>; [2 4 6 7] > push @numbers, 8, 9; [2 4 6 7 8 9] > unshift @numbers, 0, 1; [0 1 2 4 6 7 8 9] > my $num = shift @numbers 0 > $num = pop @numbers 9 > say @numbers [1 2 4 6 7 8] As you might expect by now, these routines also come with a method invocation syntax. For example: > my @numbers = <2 4 6 7>; [2 4 6 7] > @numbers.push(8, 9) [2 4 6 7 8 9] Note, however, that if you push or unshift an array onto another array, you’ll get something different than what you might expect: > my @numbers = <2 4 6 7>; [2 4 6 7] > my @add-array = 8, 10; [8 10] > @numbers.push(@add-array); [2 4 6 7 [8 10]] As you can see, when > my @numbers = <2 4 6 7>; [2 4 6 7] > @numbers.append(@add-array); [2 4 6 7 8 10] Or you can use the “|” prefix operator to flatten the added array into a list of arguments: > my @numbers = <2 4 6 7>; [2 4 6 7] > @numbers.push(|@add-array); [2 4 6 7 8 10] There is also a prepend method that can replace unshift to add individual items of an array at the beginning of an existing array (instead of adding the array as a single entity). 9.4 Stacks and QueuesStacks and queues are very commonly used data structures in computer science. A stack is a last in / first out (LIFO) data structure. Think of piled-up plates. When you put a clean plate onto the stack, you usually put it on top; when you take out one, you also take it from the top. So the first plate that you take is the last one that you added. A CS stack implements the same idea: you use it when the first piece of data you need from a data structure is the last one you added. A queue, by contrast, is a first in / first out (FIFO) data structure. This is the idea of people standing in a line waiting to pay at the supermarket. The first person that will be served is the first person who entered the queue. A stack may be implemented with an array and the push and pop functions, which respectively add an item (or several) at the end of an array and take one from the end of the array. This is a somewhat simplistic implementation of a stack: sub put-in-stack (@stack, $new_item) { push @stack, $new_item; } sub take-from-stack (@stack) { my $item = pop @stack; return $item; } my @a-stack = 1, 2, 3, 4, 5; put-in-stack @a-stack, 6; say @a-stack; say take-from-stack @a-stack for 1..3; This example will print this: [1 2 3 4 5 6] 6 5 4 This stack is simplistic because, at the very least, a more robust implementation should do something sensible when you try to take-from-stack from an empty stack. It would also be wise to add signatures to the subroutines. In addition, you might want to put-in-stack more than one element in one go. Take a look at the solution to the exercise on queues below (Subsection ??) to figure out on how this stack may be improved. You could obtain the same stack features using the unshift and shift functions instead of push and pop. The items will be added at the beginning of the array and taken from the beginning, but you will still have the LIFO behavior. As an exercise, try to implement a FIFO queue on the same model. Hint: you probably want to use an array and the unshift and pop functions (or the push and shift functions). Solution: ??. 9.5 Other Ways to Modify an ArrayThe shift and pop functions remove respectively the first and the last item of an array and return that item. It is possible to do almost the same operation on any item of an array, using the delete adverb: my @fruit = <apple banana pear cherry pineapple orange>; my $removed = @fruit[2]:delete; say $removed; # -> pear say @fruit; # -> [apple banana (Any) cherry pineapple orange] Notice that the third element (“pear”) is removed and returned, but the array is not reorganized; the operation leaves a sort of “empty slot”, an undefined item, in the middle of the array. The colon (“:”) syntax used here is the operator for an adverb (we discussed adverbs in Section ?? about regexes); for the time being, you may think of it as a kind of special method operating on one element of an item collection. We have seen how to use array slices to retrieve several items of an array or a list at a time. The same slice syntax can also be used on the left side of an assignment to modify some elements of an array: my @digits = <1 2 3 6 5 4 7 8 9> @digits[2..4] = 4, 5, 6 say @digits; # -> [1 2 4 5 6 4 7 8 9] Of course, you can’t do this with lists, since, as you remember, they are immutable. The splice function may be regarded as the Swiss Army knife of arrays. It can add, remove, and return one or several items to or from an array. The general syntax is as follows: my @out_array = splice @array, $start, $num_elems, @replacement; The arguments for splice are the input array, the index of the first element on which to make changes, the number of elements to be affected by the operation, and a list of replacements for the elements to be removed1. For example, to perform the slice assignment shown just above, it is possible to do this: my @digits = <1 2 3 6 5 4 7 8 9> my @removed_digits = splice @digits, 3, 3, 4, 5, 6; say @removed_digits; # -> [6 5 4] say @digits; # -> [1 2 4 5 6 7 8 9] Here, the splice statement removed three elements (6, 5, 4)
and replaced them with the replacement arguments (4, 5, 6).
It also returned the removed items into my @digits = 1..9; my @removed_digits = splice @digits, 3, 2; say @removed_digits; # -> [4 5] say @digits; # -> [1 2 3 6 7 8 9] Conversely, if the number of elements to be removed is zero, no element will be removed, an empty array will be returned, and the elements in the replacement list will be added in the right place: my @digits = <1 2 3 6 4 7 8 9>; my @removed_digits = splice @digits, 3, 0, 42; say @removed_digits; # -> [] say @digits; # -> [1 2 3 42 6 4 7 8 9] Assuming the shift function did not exist in Perl, you could write a my-shift subroutine to simulate it: sub my-shift (@array) { my @result = splice @array, 0, 1; return @result[0]; } my @letters = 'a'..'j'; my $letter = my-shift @letters; say $letter; # -> a say @letters; # -> [b c d e f g h i j] We might raise an exception if the array passed to my-shift is empty. This could be done by modifying the subroutine as follows: sub my-shift (@array) { die "Cannot my-shift from an empty array" unless @array; my @result = splice @array, 0, 1; return @result[0]; } or by adding a nonempty constraint on the array in the subroutine signature: sub my-shift (@array where @array > 0) { my @result = splice @array, 0, 1; return @result[0]; } The
As an exercise, write subroutines using 9.6 Traversing a ListThe most common way to traverse the elements of a list or an array is with a for loop. The syntax for an array is the same as what we have already seen in earlier chapters for lists: my @colors = <red orange yellow green blue indigo violet>; for @colors -> $color { say $color; } This works well if you only need to read the elements of the list. But if you want to write or update the elements of an array, you need a doubly pointy block. For example, you might use the tc (title case) function to capitalize the first letter of each word of the array: my @colors = <red orange yellow green blue indigo violet>; for @colors <-> $color {$color = tc $color}; say @colors; # -> [Red Orange Yellow Green Blue Indigo Violet] Here the > for <red orange yellow> <-> $color { $color = tc $color} Parameter '$color' expected a writable container, but got Str value... You may also use the syntax of a for loop with the
for @colors { $_ = $_.uc } say @colors; # -> [RED ORANGE YELLOW GREEN BLUE INDIGO VIOLET] Sometimes, you want to traverse an array and need to know the
index of the elements you are visiting. A common way to do
that is to use the for 0..@colors.end -> $idx { say "$idx @colors[$idx]"; } This is useful, for example, for traversing two (or more) arrays in parallel: my @letters = 'a'..'e'; my @numbers = 1..5; for 0..@letters.end -> $idx { say "@letters[$idx] -> @numbers[$idx]"; } This will print: a -> 1 b -> 2 c -> 3 d -> 4 e -> 5 You don’t really need to specify the index range yourself, as the keys function will return a list of indices for the array or the list: for keys @colors -> $idx { say "$idx @colors[$idx]"; } Another way to iterate over the indices and values of an array is the kv (“keys values”) function or method, which returns the index and value of each array item: for @letters.kv -> $idx, $val { say "$idx $val"; } In list context, my @letters = 'a'..'e'; say @letters.kv; # -> (0 a 1 b 2 c 3 d 4 e) It is the pointy block with two iteration variables that makes it possible to process both an index and a value at each step of the loop. You can of course have more than two iteration variables if needed. 9.7 New Looping ConstructsSince the subject of this chapter is arrays and lists, it is probably the right time to briefly study two looping constructs that I had left aside so far. The first one uses the same for keyword as above, but with a different syntax for the iteration variable(s): my @letters = 'a'..'e'; for @letters { say $^a-letter; }
The If the block uses more than one placeholder, they are associated to the input according to their lexicographic (alphabetic) order: my @letters = 'a'..'e'; for @letters.kv { say "$^a -> $^b"; } This will print: 0 -> a 1 -> b 2 -> c 3 -> d 4 -> e As seen just above, the kv function returns an
interleaved sequence of indexes and values. Since
Placeholders can also be used for subroutines: > sub divide { $^first / $^second } sub divide ($first, $second) { #`(Sub|230787048) ... } > divide 6, 4 1.5 These placeholders aren’t used very often for simply traversing arrays, but we will see later how they are very useful in cases where is would be quite unpractical to have to declare the parameters. The second new looping construct I want to introduce here uses the loop keyword and is similar to the C-style for loop (i.e., the loop of the C programming language). In this type of loop, you declare between a pair of parentheses three expressions separated by semi-colons: the iteration variable’s initial value, the condition on which the loop should terminate, and the change made to the iteration variable on each iteration: loop (my $i = 0; $i < 5; $i++) { say $i, " -> " ~ @letters[$i]; } For most common loops, the for loops seen earlier are easier to write and usually more efficient than this construct. This special loop construct should probably be used only when the exit condition or the change made to the iteration variable is quite unusual and would be difficult to express in a regular for loop. As an exception, the loop construct with no three-part specification is quite common and even idiomatic for making an infinite loop: loop { # do something # last if ... } 9.8 Map, Filter and ReduceWhen traversing the elements of an array (or a list) so far, we have processed so far one item at a time with a loop. We will now study ways to process all the elements in one go. 9.8.1 Reducing a List to a ValueTo add up all the numbers in a list, you can use a for loop like this: sub add_all (@numbers) { my $total = 0; for @numbers -> $x { $total += $x; } return $total; } $total is initialized to 0. Each time through the loop, $x gets one element from the list and is added to $total. As the loop runs, total accumulates the sum of the elements; a variable used this way is sometimes called an accumulator. An operation like this that combines a sequence of elements into a single value is often called a reduction operation because its effect is to reduce all the items to one element (this is also sometimes called “folding” in some other programming languages). These ideas are derived from functional programming languages such as LISP (whose name stands for “list processing”). Perl 6 has a reduce function, which generates a single "combined" value from a list of values, by iteratively applying to each item of a list a function that knows how to combine two values. Using the reduce function to compute the sum of the first ten numbers might look like this: > my $sum = reduce { $^a + $^b }, 1..10; 55 Remember the factorial function of Section ??? It used a for loop to compute the product of the n first integers up to a limit. It could be rewritten as follows using the reduce function: sub factorial (Int $num) { return reduce { $^a * $^b }, 1..$num; } say factorial 10; # -> 3628800 In fact, the code to compute the factorial is so short with the reduce function that it may be argued that it has become unnecessary to write a subroutine for that. You could just “inline” the code: my $fact10 = reduce { $^a * $^b }, 1..10; # -> 3628800 We can do many more powerful things with that, but we’ll come back to that later, as it requires a few syntactic features that we haven’t seen yet. 9.8.2 The Reduction MetaoperatorPerl 6 also has a reduction operator,
or rather a reduction metaoperator. An operator
usually works on variables or values; a metaoperator acts
on other operators. Given a list and an operator, the
For example, the following also prints the sum of all the elements of a list: say [+] 1, 2, 3, 4; # -> 10 This basically takes the first two values, adds them up, and adds the result to the next value, and so on. Actually, there is a form of this operator, with a backslash before the operator, which also returns the intermediate results: say [\+] 1, 2, 3, 4; # -> (1 3 6 10) This metaoperator can be used to transform basically any associative infix operator2 into a list operator returning a single value. The factorial function can now be rewritten as: sub fact(Int $x){ [*] 1..$x; } my $factorial = fact(10); # -> 3628800 The reduction metaoperator can also be used with relational operators to check whether the elements of an array or a list are in the correct numerical or alphabetical order: say [<] 3, 5, 7; # -> True say [<] 3, 5, 7, 6; # -> False say [lt] <a c d f r t y>; # -> True 9.8.3 Mapping a List to Another ListSometimes you want to traverse one list while building another. For example, the following function takes a list of strings and returns a new list that contains capitalized strings: sub capitalize_all(@words): my @result; push @result, $_.uc for @words; return @result; } my @lc_words = <one two three>; my @all_caps = capitalize_all(@lc_words); # -> [ONE TWO THREE]
An operation like Perl has a map function that makes it possible to do that in just one statement: my @lc_words = <one two three>; my @all_caps = map { .uc }, @lc_words; # -> [ONE TWO THREE] Here, the map function applies the uc method to
each item of the To generate a list of even numbers between 1 and 10, we might use the range operator to generate numbers between 1 and 5 and use map to multiply them by two: my @evens = map { $_ * 2 }, 1..5; # -> [2 4 6 8 10] Instead of using the my @evens = map -> $num { $num * 2 }, 1..5; # -> [2 4 6 8 10] or an anonymous block with a placeholder variable: my @evens = map { $^num * 2 }, 1..5; # -> [2 4 6 8 10] Instead of a code block, the first argument to map can be a code reference (a subroutine reference): sub double-sq-root-plus-one (Numeric $x) { 1 + 2 * sqrt $x; } my @results = map &double-sq-root-plus-one, 4, 16, 42; say @results; # -> [5 9 13.9614813968157] The subroutine name needs to be prefixed with the ampersand sigil to make clear that it is a parameter to map and not a direct call of the subroutine. If the name of the array on the left side and on the right side of the assignment is the same, then the modification seems to be made “in place,” i.e., it appears as if the original array is modified in the process. This is an immensely powerful and expressive function; we will come back to it later. 9.8.4 Filtering the Elements of a ListAnother common list operation is to select some elements from a list and return a sublist. For example, the following function takes a list of strings and returns a list that contains only the strings containing a vowel: sub contains-vowel(Str $string) { return True if $string ~~ /<[aeiouy]>/; } sub filter_words_with_vowels (@strings) { my @kept-string; for @string -> $st { push @kept-string, $st if contains-vowel $st; } return @kept-string; } contains-vowel is a subroutine that returns True if the string contains at least one vowel (we consider “y” to be a vowel for our purpose). The An operation like Perl has a function called grep to do that in just one statement: my @filtered = grep { /<[aeiouy]>/ }, @input; The name of the grep built-in function used to filter some input comes from the Unix world, where it is a utility that filters the lines that match a given pattern from an input file. In the code example above, all of Just as for map, we could have used a function reference as the first argument to grep: my @filtered = grep &contains-vowel, @input; To generate a list of even numbers between 1 and 10, we might use the range operator to generate numbers between 1 and 10 and use grep to filter out odd numbers: my @evens = grep { $_ %% 2 }, 1..10; # -> [2 4 6 8 10] As an exercise, write a program using map to produce an array containing the square of the numbers of the input list and a program using grep to keep only the numbers of an input list that are perfect squares. Solution: ??. Most common list operations can be expressed as a combination of map, grep, and reduce. 9.8.5 Higher Order Functions and Functional ProgrammingBesides their immediate usefulness, the reduce, map and grep functions we have been using here do something qualitatively new. The arguments to these functions are not just data: their first argument is a code block or a function. We are not only passing to them the data that they will have to use or transform, but we are also passing the code that will process the data. The reduce, map, and grep functions are what are often called higher-order functions, functions that manipulate not only data, but also other functions. These functions can be thought of as generic abstract functions— they perform a purely technical operation: process the elements of a list and apply to each of them a behavior defined in the code block or the function of the first parameter. These ideas are to a large extent rooted in functional programming, a programming paradigm that is very different from what we have seen so far and that has been implemented historically in languages such as Lisp, Caml, Ocaml, Scheme, Erlang, or Haskell. Perl 6 is not a functional programming language in the same sense as these languages, because it can also use other programming paradigms, but it has incorporated most of their useful features, so that you can use the expressive power and inherent safety of this programming model, without being forced to do so if and when you would prefer a different model, and without having to learn a totally new syntax that may sometimes look somewhat abstruse or even clunky. This is immensely useful and can give you an incredible expressive power for solving certain types of problems. But other types of problems might be better solved with the more “traditional” procedural or imperative programming model, while others may benefit from an object-oriented approach. Perl 6 lets you choose the programming model you want to use, and even makes it possible to combine seamlessly several of them in the same program. Functional programming is so important in my opinion that a full chapter of this book will be devoted to the functional programming features of Perl (see Chapter ??). Before that, make sure to read the Subsection ?? in the array and list section of the exercise solution chapter. 9.9 Fixed-Size, Typed and Shaped ArraysBy default, arrays can contain items of any type, including items of different types, and can auto-extend as you need. Perl will take care of the underlying gory details for you, so that you don’t have to worry about them. This is very practical but also comes with a cost: some array operations might be unexpectedly slow, because Perl may have to perform quite a bit of house-cleaning behind the scene, such as memory allocation or reallocation, copying a full array within memory, etc. In some cases, however, it is possible to know beforehand the size of an array and the data type of its items. If Perl knows about these, it might be able to work faster and to use much less memory. It might also helps you to prevent subtle bugs. To declare the type of the elements of an array, just specify it when declaring the array. For example, to declare an array of integers: > my Int @numbers = 1..20; [1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20] > @numbers[7] = 3.5; # ERROR Type check failed in assignment to @numbers; expected Int but got Rat in block <unit> at <unknown file> line 1 Similarly, you can declare the size of an array. Such arrays
are sometimes called shaped arrays. There are
twelve months in a year, so you might tell Perl that your
> my @months[12] = 1..7; [1 2 3 4 5 6 7 (Any) (Any) (Any) (Any) (Any)] > say @months.elems 12 > say @months[3]; 4 > say @months[12]; Index 12 for dimension 1 out of range (must be 0..11) Here, Perl has allocated 12 “slots” to the array, even though the last five are currently undefined. Perl may not need to reallocate memory when you define the tenth item of the array. And Perl tells you about your mistake if you accidentally try to access an out-of-range item. Defining both the type of the elements and the maximal size of the array may lead to a noticeable performance gain in terms of execution speed (at least for some operations) and reduce significantly the memory usage of the program, especially when handling large arrays. 9.10 Multidimensional ArraysThe arrays we have seen so far are one-dimensional. In some languages, such arrays are called vectors. But arrays can also be multidimensional (you may then call them matrices). For example, you might use a two-dimensional array to store a list of employees with their respective salaries: > my @employees; [] > @employees[0;0] = "Liz"; Liz > @employees[0;1] = 3000; 3000 > @employees[1] = ["Bob"; 2500]; [Bob 2500] > @employees[2] = ["Jack"; 2000]; [Jack 2000] > @employees[3] = ["Betty"; 1800]; [Betty 1800] > say @employees[1;1]; 2500 > say @employees[2]; [Jack 2000] > say @employees; [[Liz 3000] [Bob 2500] [Jack 2000] [Betty 1800]] It is possible to have more than two dimensions. For example, we could have a tridimensional matrix to store the temperatures in a chemical reactor, measured in various locations identified by their x, y, and z coordinates: my @temp; @temp[1;3;4] = 80; For this type of data, however, it is often easier to use the data structure that we will cover in the next chapter, hashes. Multidimensional arrays can also have a fixed size. For example, this may be a declaration for two-dimensional array where the first dimension is the month in the year and the second the day in the month: my @date[12, 31]; 9.11 Sorting Arrays or ListsSorting data is a very common operation in computer science. Perl has a sort function that can sort an array or a list and return the sorted result: say sort <4 6 2 9 1 5 11>; # -> (1 2 4 5 6 9 11) There are several types of sorts. The most common are numeric sort and lexicographic (or alphabetic) sort. They differ in the way they compare individual items to be sorted. In alphabetic sort, you first compare the first letter of the words to be compared; a word starting with an “a” will always come before a word starting with a “b” (or any other letter) in an ascending sort, irrespective of the value or number of the other characters. You need to compare the second character of two words only if the first character of these words is the same. Numeric sorting is very different: it is the overall value of the number that is of interest. For example, if we are sorting integers, 11 is larger than 9 because it has more digits. But alphabetic sorting of 9 and 11 would consider 11 to be smaller than 9, because the first digit is smaller. So an alphabetic or lexicographic sort of the list of integers above would return: (1 11 2 4 5 6 9) The consequence is that, with many programming languages, when you want to sort data, you need to specify which type of sort you want. With consistent data (every item of the same type), Perl 6 is usually clever enough to find out which type of sort is best suited to your needs. So, for example, this code will perform the type of sort that you probably expect: say sort <ac a bc ab abc cb ca>; # ->(a ab abc ac bc ca cb) Even with mixed data types, sort can do a pretty good job at providing a result that may very well be what you are looking for: say sort <1 b 11 5 cb 4 12 a ab abc 42 ac bc ca >; # -> (1 4 5 11 12 42 a ab abc ac b bc ca cb) There are cases, however, where this simple use of the sort function will fail to return what you probably want: say sort <a ab abc A bc BAC AC>; # -> (A AC BAC a ab abc bc) Here, sort puts all strings starting with an uppercase letter before any string starting with a lower case letter, probably not what you want. It looks even worse if the strings use extended ASCII characters: say sort <a ab àb abc Ñ A bc BAC AC>; # -> (A AC BAC a ab abc bc Ñ àb) The reason is that, when sorting strings, sort uses the internal numeric encoding of letters. This was sometimes called "ASCIIbetical" order (by contrast with alphabetical order), but the term is now too limited and somewhat obsolete, because Perl 6 is using Unicode rather than ASCII. Clearly, these are cases where more advanced sorting techniques are needed. 9.12 More Advanced Sorting TechniquesThe sort routine typically takes two arguments, a code object and a list of items to be sorted, and returns a new sorted list. If no code object is specified, as in the examples we have seen above, the cmp built-in comparison operator is used to compare the elements. If a code object is provided (and if it accepts two arguments), then it is used to perform the comparison, which tells sort which of the two elements should come first in the final order. There are three built-in comparison operators that can be used
for sorting. They are sometimes called three-way comparators
because they compare their operands and return a value meaning
that the first operand should be considered less than, equal to
or more than the second operand for the purpose of determining
in which order these operands should be sorted. The leg
operator coerces its arguments to strings and performs a
lexicographic comparison. The Most of our simple examples above worked well with strings and numbers because they implicitly used the default cmp operator, which “guesses” quite well which type of comparison to perform. In other words, this: say sort <4 6 2 9 1 5 11>; # -> (1 2 4 5 6 9 11) is equivalent to this: say sort { $^a cmp $^b }, <4 6 2 9 1 5 11>; # -> (1 2 4 5 6 9 11)
The code block used here as the first argument to the
sort routine uses the placeholder
parameters (or self-declared parameters) seen earlier
in this chapter. The cmp routine receives two
arguments that are bound to If you wanted to sort in reverse order, you could just swap the order of the two placeholder parameters: say sort { $^b cmp $^a }, <4 6 2 9 1 5 11>; # -> (11 9 6 5 4 2 1) Note that this example is given only for the purpose of explaining some features of the placeholder parameters. To sort the array we’ve presented here in descending order, it might just be easier to obtain the same result with the following code: say reverse sort <4 6 2 9 1 5 11>; # -> (11 9 6 5 4 2 1) The reason sort does a good job even with mixed strings and integers is because the default comparison function, cmp, is pretty clever and guesses by looking at its arguments whether it should perform a lexicographic order or numeric order comparison. If sorting gets too complicated for cmp, or, more generally, when a specific or custom order is required, then you have to write your own ad-hoc comparison subroutine. For example, if we take again the example of strings with mixed-case letters, we may achieve a case-insensitive alphabetical order this way: say sort { $^a.lc cmp $^b.lc}, <a ab abc A bc BAC AC>; # -> (a A ab abc AC BAC bc) or this way: say sort { $^a.lc leg $^b.lc}, <a ab abc A bc BAC AC>; # -> (a A ab abc AC BAC bc) Here, when the comparison code block receives its two arguments, the lc method casts them to lowercase before performing the comparison. Notice that this has no impact on the case of the output, since the lowercase transformation is local to the comparison code block and has no impact on the data handled by sort. We will see shortly that there is a simpler and more efficient way of doing such a transformation before comparing the arguments. If the comparison specification is more complicated, we may need to write it in a separated subroutine and let sort call that subroutine. Suppose we have a list of strings that are all formed of leading digits followed by a group of letters and possibly followed by other irrelevant characters, and that we want to sort the strings according to the group of letters that follows the digits. Let’s start by writing the comparison subroutine: sub my_comp ($str1, $str2) { my $cmp1 = $0 if $str1 ~~ /\d+(\w+)/; my $cmp2 = $0 if $str2 ~~ /\d+(\w+)/; return $cmp1 cmp $cmp2; } Nothing complicated: it takes two arguments, uses a regex for extracting the group of letters in each of the arguments, and returns the result of the cmp function on the extracted strings. In the real world, something might need to be done if either of the extractions fails, but we will assume for our purposes here that this will not happen. The sorting is now quite straight forward, we just need to pass the above subroutine to the sort function: say sort &my_comp, < 22ac 34bd 56aa3 12c; 4abc( 1ca 45bc >; # -> (56aa3 4abc( 22ac 45bc 34bd 12c; 1ca) We only need to prefix the comparison subroutine with the “&” ampersand sigil and it works fine: the strings are sorted in accordance to the letter groups that follow the leading digits. In all the examples above, the comparison subroutine accepted two parameters, the two items to be compared. The sort function may also work with a code object taking only one parameter. In that case, the code object is not a comparison code block or subroutine, but is a code object implementing the transformation to be applied to the items before using the default cmp comparison routine. For example, if we take once more the example of strings with mixed-case letters, we may achieve a case-insensitive alphabetical order yet in a new way: say sort { $_.lc }, <a ab abc A bc BAC AC>; # -> (a A ab abc AC BAC bc) This could also be written with a placeholder parameter: say sort { $^a.lc }, <a ab abc A bc BAC AC>; # -> (a A ab abc AC BAC bc) Here, since the comparison code block takes only one argument, it is meant to transform each of the items to be compared before performing the standard cmp routine on the arguments. This not only makes things simpler, but is also probably more efficient, especially if the number of items to be sorted is large and if the transformation subroutine is relatively costly: the transformed values are actually cached (i.e., stored in memory for repeated use), so that the transformation is done only once for each item, despite the fact that the comparison routine is called many times for each item in a sort. Similarly, we could sort numbers according to their absolute values: say sort {$_.abs}, <4 -2 5 3 -12 42 8 -7>; # -> (-2 3 4 5 -7 8 -12 42) If you think about it, the “more complicated” example with digits and letters requiring a separate subroutine is also applying the same transformation to both its arguments. As an exercise, write a (simpler) sorting program using a transformation subroutine and the default cmp operator on transformed items. Solution: ??. Needless to say, the (so-called) advanced uses of the sort function presented in this section are yet more examples of the functional programming style. The comparison subroutines and the transformation subroutines are passed around as arguments to the sort function, and, more broadly, all of the functions, subroutines, and code blocks used here are higher-order functions or first-class functions. 9.13 DebuggingCareless use of arrays (and other mutable objects) can lead to long hours of debugging. Here are some common pitfalls and ways to avoid them:
9.14 Glossary
9.15 ExercisesExercise 1 Write a subroutine called [fontshape=up] my @AoA = [[1, 2], [3], [4, 5, 6]]; say nested-sum(@AoA); # -> 21 Solution: ??. Exercise 2
Write a subroutine called cumul-sum that takes a list of numbers and returns the cumulative sum, that is, a new list where the ith element is the sum of the first i+1 elements from the original list. For example: [fontshape=up] my @nums = [1, 2, 3, 4]; say cumul-sum(@nums); # -> [1, 3, 6, 10] Solution: ??. Exercise 3 Write a subroutine called [fontshape=up] say middle(1, 2, 3, 4); # -> (2, 3) Solution: ??. Exercise 4 Write a subroutine called [fontshape=up] my @nums = 1, 2, 3, 4; chop-it(@nums); say @nums; # -> [2, 3] Solution: ??. Exercise 5
Write a subroutine called is-sorted that takes
a list (or array) of numbers as a parameter and returns
True if the list is sorted in ascending order and
False otherwise. For example:
[fontshape=up] > is-sorted (1, 2, 2); True > is-sorted (1, 2, 1); False Solution: ??. Exercise 6
Two words are anagrams if you can rearrange the letters
from one to spell the other. Write a subroutine called
Solution: ??. Exercise 7
Write a subroutine called Solution: ??. Exercise 8 This exercise pertains to the so-called Birthday Paradox, which you can read about at http://en.wikipedia.org/wiki/Birthday_paradox. If there are 23 students in your class, what are the chances that two of you have the same birthday? You can estimate this probability by generating random samples of 23 birthdays and checking for duplicates. Hint: you can generate random birthdays with the rand and the int functions. Solution: ??. Exercise 9 Write a subroutine that reads the file words.txt and builds a list with one element per word. Write two versions of this function, one using the push method and the other using the idiom unshift. Which one takes longer to run? Why? Solution: ??. Exercise 10
To check whether a word is in our standard word list, you could check each element in turn, but it would be slow because it searches through the words in order. If the words are in alphabetical order (which is the case of our word list), we can speed things up considerably with a bisection search (also known as binary search), which is similar to what you do when you look a word up in the dictionary. You start somewhere in the middle and check to see whether the word you are looking for comes before the word in the middle of the list. If so, you search the first half of the list the same way. Otherwise, you search the second half. Either way, you cut the remaining search space in half. If the word list has 113,809 words, it will take at most about 17 steps to find the word or conclude that it’s not there. Write a function called Solution: ?? Exercise 11
Two words are a “reverse pair” if each is the reverse of the other. For example, “depot” and “toped” form a reverse pair; other examples include “reward” and “drawer”, or “desserts” and “stressed.” Write a program that finds all the reverse pairs in the word.txt file. Solution: ??. Exercise 12
Two words “interlock” if taking alternating letters from each forms a new word. For example, “shoe” and “cold” interlock to form “schooled.” Write a program that finds in word.txt all pairs of words that interlock. Hint: don’t enumerate all pairs, there are many of them! Solution: ?? Credit: this exercise is inspired by an example at http://puzzlers.org. |
Are you using one of our books in a class?We'd like to know about it. Please consider filling out this short survey.
|