Previous Up Next

This HTML version of Think Perl 6 is provided for convenience, but it is not the best format of the book. You might prefer to read the PDF version.

Chapter 14  Functional Programming in Perl

Functional programming is a programming paradigm that treats computation as the evaluation of mathematical functions and avoids changing-state and mutable data. It is a declarative programming paradigm, which means programming is done with expressions or declarations instead of statements. In functional code, the output value of a function depends only on the arguments that are input to the function, so calling a function twice with the same argument will produce the same result each time. Eliminating side effects, i.e., changes in state that do not depend on the function inputs, can make it much easier to understand and predict the behavior of a program, which is one of the key motivations for the development of functional programming.

Perl is not a functional language in the sense that it also uses several other programming paradigms that we have seen abundantly throughout this book. It does however offer extensive functional programming features and capabilities, some of which have been introduced in various sections of this book and will be briefly reviewed here before we get further.

14.1  Higher-Order Functions

As early as Chapter ?? on functions and subroutines, in Section ?? (p. ??), we have seen that functions, subroutines, and other code objects are first-class objects or first-class citizens in Perl, which means that they can be passed around as values. A Perl 6 function is a value you can assign to a variable or pass around as an argument to another function or a return value from another function.

14.1.1  A Refresher on Functions as First-Class Objects

Our initial very simple example of a higher-order function was something like this:

sub do-twice(&code) {
    &code(); 
    &code();
}
sub greet {
    say "Hello World!";
}
do-twice &greet;

in which the greet subroutine was passed as an argument to the do-twice subroutine, with the effect of printing the greeting message twice. A function that is passed as an argument to another function is often called a callback function.

The & sigil placed before the greet subroutine name in the argument list (as well as before the code parameter in the signature and in the body of the do-twice subroutine) tells Perl that you are passing around a subroutine or some other callable code object.

In computer science, a subroutine that can take another subroutine as an argument is sometimes called a higher-order function.

More interesting examples of higher-order function are found with the reduce, map, and grep functions studied in Section ?? (p. ??), as well as the sort function (Section ?? and Section ??).

Let’s consider for example the task of sorting by date records consisting of an identifier followed by a date in the DD-MM-YYYY format, such as “id1;13-05-2015” or “id2;17-04-2015.” The records need quite a bit of transformation before we can compare them for the sake of finding the chronological order in which they should be sorted, so we might write a separate comparison function:

sub compare ($rec1, $rec2) {
    my $cmp1 = join ",", reverse split /<[;-]>/, $rec1;
    my $cmp2 = join ",", reverse split /<[;-]>/, $rec2;
    return $cmp1 cmp $cmp2;
}   

Each modified record is constructed by chaining three functions. These lines should be read from right to left: first, the input value is split into four items; these items are then reversed and then joined, so that the result for “id1;13-05-2015” is “2015,05,13,id1,” which is adapted for a comparison with the cmp operator. We will come back soon to this form of pipeline programming and others ways of performing these operations.

We can now pass the compare subroutine to the sort function:

.say for sort &compare, <id1;13-05-2015 id2;17-04-2015 
                         id3;21-02-2016 id4;12-01-2015>;

This displays:

id4;12-01-2015
id2;17-04-2015
id1;13-05-2015
id3;21-02-2016

Please note that this is provided as an example of callback functions used with the sort built-in function. We will see at the end of the next subsection a simpler way to accomplish the same type of sort using an anonymous subroutine.

14.1.2  Anonymous Subroutines and Lambdas

We have also seen that a subroutine does not need to have a name and can be anonymous. For example, it may be stored directly in a scalar variable:

my $greet = sub {
    say "Hello World!";
};
do-twice $greet;                 # prints "Hello World" twice

We don’t even need to store the code of the anonymous function in the $greet variable; we can pass it directly as an argument to the do-twice subroutine:

do-twice(sub {say "Hello World!"} );

Since our anonymous subroutine does not take any argument and does not return a useful value, we can simplify the syntax further and pass a simple anonymous code block to do-twice:

do-twice {say "Hello World!"};   # prints "Hello World" twice

You’ve already seen several useful examples of anonymous subroutines in this book (see Section ?? for explanations or details):

  • With the reduce function to compute the sum of the first 20 integers:
    my $sum = reduce { $^a + $^b }, 1..20; # -> 210
    
  • With the map function to capitalize the first letter of a list of cities (using the tc built-in):
    > .say for map {.tc}, <london paris rome washington madrid>;
    London
    Paris
    Rome
    Washington
    Madrid
    
  • With the grep function to generate a list of even numbers by filtering out odd numbers:
    my @evens = grep { $_ %% 2 }, 1..17; # -> [2 4 6 8 10 12 14 16]
    

The example with reduce is of special interest. In principle, contrary to a subroutine, you cannot easily pass arguments to a code block (because it has no signature). But the use of the self-declared positional parameters (or placeholders) with the $^ twigil makes it possible to use parameters within the block.

Because of this possibility, the anonymous code block becomes what is commonly called a lambda in computer science (and in mathematics), i.e., a kind of nameless function. Lambda calculus, a mathematical theory invented in the 1930s by Alonzo Church, is the root of most of today’s functional programming languages.

Actually, the two other examples above using the $_ topical variable are also lambdas. Although we haven’t mentioned it at the time, some other constructs we saw earlier are also lambdas. In particular, consider the “pointy block” syntax used twice in the following for loops displaying the multiplication tables:

for 1..9 -> $mult {
    say "Multiplication table for $mult";
    for 1..9 -> $val {
        say "$mult * $val = ", $mult * $val;
    }
}

This is another form of lambda where the “function” parameter is defined by the pointy block loop variable.

The sorting example presented in Subsection (??) just above may also be rewritten with an anonymous code block (taking advantage of the sort syntax using a code block with a single argument described in Section ??):

my @in = <id1;13-05-2015 id2;17-04-2015 id3;21-02-2016>;
.say for sort { join ",", reverse split /<[;-]>/, $_ }, @in;

Here again, the somewhat long code block passed as an argument to the sort function is a lambda.

14.1.3  Closures

In computer programming, a closure (or lexical closure) is a function that can access to variables that are lexically available where the function is defined, even if those variables are no longer in scope in the code section where the function is called.

Consider the following example:

sub create-counter(Int $count) {
    my $counter = $count;
    sub increment-count {
        return $counter++
    }
    return &increment-count;
}
my &count-from-five = create-counter(5);
say &count-from-five() for 1..6; # prints numbers 5 to 10

The create-counter subroutine initializes the $counter variable to the value of the received parameter, defines the increment-count subroutine, and returns this subroutine. The main code calls create-counter to dynamically create the &count-from-five code reference (and could call it many times to create other counters counting from 6, 7, and so on). Then, &count-from-five is called six times and prints out numbers between 5 and 10, each on its own line.

The magical thing here is that the $counter variable is out of scope when &count-from-five is called, but &count-from-five can access it, return its value, and increment it because $counter was within the lexical scope at the time the increment-count was defined. It is commonly said that increment-count “closes over” the $counter variable. The increment-count subroutine is a closure.

The above example is a bit contrived and its syntax somewhat awkward because I wanted to show an example of a named closure (increment-count is a named subroutine). It is usually simpler and more idiomatic to use anonymous closures and to rewrite the example as follows:

sub create-counter(Int $count) {
    my $counter = $count;
    return sub {
        return $counter++
    }
}
my &count-from-five = create-counter(5);
say &count-from-five() for 1..6; # prints numbers 5 to 10

You could even simplify create-counter further with implicit return statements:

sub create-counter(Int $count) {
    my $counter = $count;
    sub { $counter++ }
}

but this is arguably less clear because the code’s intent is less explicit.

The last create-fifo example in the solution to the FIFO queue exercise (Subsection ??) is another example of the same mechanism:

sub create-fifo {
    my @queue;
    return (
        sub {return shift @queue;}, 
        sub ($item) {push @queue, $item;}
        ) ;
}
my ($fifo-get, $fifo-put) = create-fifo();
$fifo-put($_) for 1..10;
print " ", $fifo-get() for 1..5; # ->  1 2 3 4 5

In Perl 6, all subroutines are closures, which means that all subroutines can access to lexical variable that existed in the environment at the time of their definition, but they don’t necessarily act as closures.

In fact, all code objects, even simple anonymous code blocks, can act as closures, which means that they can reference lexical variables from the outer scope, and this is in effect what is going on with the loop variable of a pointy block or in the following map block:

my $multiplier = 7;
say  map {$multiplier * $_}, 3..6; # -> (21 28 35 42)

Here the block passed to map references the variable $multiplier from the outer scope, making the block a closure.

Languages without closures cannot easily provide higher-order functions that are as easy to use and powerful as map.

Here is yet another example of a block acting as a closure for a counter implementation:

my &count;
{
    my $counter = 10;
    &count = { say $counter++ };
}
&count() for 1..5;  

This closure saves a reference to the $counter variable when the closure is created. The call to the &count code block successfully displays and updates $counter, even though that variable is no longer in lexical scope when the block is executed.

14.2  List Processing and Pipeline Programming

Quite often, a computation can be expressed as a series of successive transformations on a list of values. Perl provides functions able to work on the items of a list and apply simple actions, callback functions, or code blocks to these items. We have already seen and used abundantly several such functions:

  • map applies a transformation to each item of a list.
  • grep is a filter that keeps only the items for which the function or code block associated with grep evaluates to true.
  • reduce uses each item of a list to compute a single scalar value.
  • sort sorts the elements of a list in accordance to the rules defined in the passed code block or subroutine.

We have discussed several examples where these functions can be used together in a sort of data pipeline in which the data produced by each step of the pipeline is fed to the next step. For example, earlier in this chapter (Subsection ??), we used this:

    my $cmp1 = join ",", reverse split /<[;-]>/, $rec1;

As we said, this type of code should be read from right to left (and from bottom to top if written over several code lines): $rec1 is fed to split, which splits the data item into four pieces; the pieces are then reversed and fed to join to reconstruct a single data item where the pieces are now in reversed order.

Similarly, we could produce a list of pet animals belonging to single women living in Kansas with the following code chaining several methods:

my @single-kansas-women-pets =
    map  {  .pets },
    grep { !.is-married },
    grep {  .gender eq "Female" },
    grep {  .state eq "Kansas" },
         @citizens;

This one should be read bottom to top. It takes a list of all citizens, filters those from Kansas who are female, filters those who are not married, and finally generates the list of pets of such persons. Note that .pets may return one animal, a list of several animals, or an empty list. map “flattens” the lists thus produced, so the final result going into the array is a flat list of animals (and not a nested list of lists).

These pipelines are very powerful and expressive, and can get a lot done in a few code lines.

14.2.1  Feed and Backward Feed Operators

In the previous examples, the pipeline steps were laid out in reverse order; you may consider this inconvenient, although it is easy to get used to.

Perl 6 provides the ==> feed operator (sometimes called pipe in other languages) that makes it possible to write the various pipeline steps in a “more natural,” left to right and top to bottom, order.

For example, reusing the example of sorting records by dates from earlier in this chapter, you could rewrite it like so:

"id1;13-05-2015" 
    ==> split(/<[;-]>/) 
    ==> reverse() 
    ==> join(",") 
    ==> my @out;  # @out is now: [2015,05,13,id1]

By the way, if you’re using such pipeline operations on a large input, and depending on your platform architecture, Perl 6 may be able to run these various operations in parallel on different CPUs or cores, thereby improving significantly the performance of the overall process.

There is also a backward feed operator, <==, enabling to write the pipeline in reverse order:

my $out <== join(",") 
        <== reverse() 
        <== split(/<[;-]>/) 
        <== "id1;13-05-2015";

14.2.2  The Reduction Metaoperator

We already met this metaoperator in Section ??. A metaoperator acts on other operators. Given a list and an operator, the [...] reduction metaoperator applies the operator iteratively to all the values of the list to produce a single value.

For example, the following prints the sum of all the elements of a list or a range:

say [+] 1..10;      # -> 55

Similarly, we can write a factorial function as:

sub fact (Int $n where $n >= 0) {
    return [*] 1..$n;
}
say fact 20;        # -> 2432902008176640000
say fact 0;         # -> 1

(Note that this yields the correct result even for the edge case of factorial 0, which is defined mathematically to be 1.)

14.2.3  The Hyperoperator

A hyperoperator applies the specified operator to each item of a list (or two lists in parallel) and returns a modified list (somewhat similarly to the map function). It uses the so-called French or German quote marks, « » (Unicode codepoints U+00AB and U+00BB), but you can use their ASCII-equivalent double angle brackets, << >>, if you prefer (or don’t know how to enter those Unicode characters with your editor).

Our first example will multiply each element of a list by a given number (5):

my @b = 6..10;
my @c = 5 «*» @b;
say @c;             # prints 30 35 ... 50 (5*6, 5*7, ...)

We can also combine two lists and, for example, add respective values:

my @a = 1..5;
my @b = 6..10;
my @d = @a >>+<< @b;
say @d;             # -> [7 9 11 13 15]

You can also use hyperoperators with a unary operator:

my @a = 2, 4, 6;
say -« @a;          # prints:  -2 -4 -6

Hyperoperators with unary operators always return a list the same size as the input list. Infixed hyperoperators have a different behavior depending on the size of their operands:

@a >>+<< @b;   # @a and @b must have the same size
@a <<+<< @b;   # @a can be smaller
@a >>+>> @b;   # @b can be smaller
@a <<+>> @b;   # Either can be smaller, Perl will do 
               # probably what you mean (DWIM principle)

Hyperoperators also work with modified assignment operators:

@x >>+=<< @y   # Same as: @x = @x >>+<< @y

14.2.4  The Cross (X) and Zip (Z) Operators

The cross operator uses the capital letter X. It takes two or more lists as arguments and returns a list of all lists that can be constructed combining the elements of each list (a form of “Cartesian product”):

my @a = 1, 2;
my @b = 3, 4;
my @c = @a X @b;       # -> [(1,3), (1,4), (2,3), (2,4)]

The cross operator may also be used as a metaoperator and apply the operator that it modifies to each item combination derived from its operands:

my @a = 3, 4;
my @b = 6, 8;
say @a X* @b;    # -> 18 24 24 32

If no additional operator is supplied (as in the first example above), X acts as if a comma is provided as default additional operator.

The Z zip operator interleaves the lists like a zipper:

say 1, 2 Z <a b c > Z 9, 8;   # -> ((1 a 9) (2 b 8))

The Z operator also exists as a metaoperator, in which case, instead of producing nested inner lists as in the example above, the zip operator will apply the supplied additional operator and replace these nested list with the values thus generated. In the following example, the   concatenate operator is used to merge the inner lists produced by the zip operator into strings:

say 1, 2, 3 Z~ <a b c > Z~ 9, 8, 7; # -> (1a9 2b8 3c7)

14.2.5  List Operators, a Summary

The list operators above are powerful and can be combined together to produce incredibly expressive constructs.

As an exercise, try to solve the following small quizzes (please don’t read on until you have tried them, and try to do them with the operators we’ve just seen):

  • Given that the lcm built-in function returns the least common multiple of two numbers, write a program which displays the smallest positive number divisible by all numbers between 1 and 20.
  • Write a program that computes the sum of all digits of factorial 100.
  • Find the difference between the square of the sum of the 100 first integers and the sum of the squares of the 100 first integers.

Please, again, don’t read further until you have tried to solve these quizzes (and hopefully succeeded).

The reduction operator makes it possible to apply an operator to all elements of a list. Thus, using it with the lcm function gives the LCM of numbers between 1 and 20:

say [lcm] 1..20;                           # -> 232792560

For the sum of the digits of factorial 100, we use the [] reduction metaoperator twice, once with the multiplication operator to compute factorial 100, and another time with the addition operator to sum the digits of the result:

say [+] split '', [*] 2..100;              # -> 648

For the square of the sum minus the sum of the squares, it is easy to compute the sum of the 100 first integers with the reduction operator. The <<...>> hyperoperator easily supplies a list of the squares of these integers, and another application of the reduction operator reduces this list to a sum:

say ([+] 1..100)**2 - [+] (1..100) «**» 2; # -> 25164150

14.2.6  Creating New Operators

We have briefly seen (Section ??) that you can build new operators or redefine existing ones for new types.

The example we gave was to define the minus sign as an infix operator between two hashes in order to perform a kind of mathematical set subtraction, i.e., to find all keys of the first hash that are not in the second hash.

In the previous paragraph, the word infix means that this is a binary operator (two operands) that will be placed between the two operands.

There are other flavors of operators:

  • Prefix: a unary operator placed before the operand, for example the minus sign in the expression −1
  • Postfix: a unary operator placed after the operand, for example the exclamation mark used as a mathematical symbol for the factorial: 5!
  • Circumfix: an operator made of two symbols placed around the operand(s), for example the parentheses (...) or the angle brackets <...>
  • Postcircumfix: an operator made of two symbols placed after an operand and around another one, for example the square brackets in @a[1]

To declare a new operator, you usually need to specify its type (prefix, postfix, etc.), followed by a colon, followed by the operator symbol or name between angle brackets, followed by the function signature and body defining the operator. For example, we could define a prefix % operator as follows:

multi sub prefix:<%> (Int $x) {   # double operator
    2 *  $x;
}
say % 21;   # -> 42

This is just an example to show how operator construction works; % would probably not be a good name for a double operator. The interesting point, though, is that we have reused an existing operator (the modulo operator), but the compiler does not get confused because modulo is an infix operator and our new operator is defined as a prefix operator.

A better naming example might be to use an exclamation mark (!) as a postfix factorial operator, just as in mathematical notation:

multi sub postfix:<!> (Int $n where $n >= 0) {
    [*] 2..$n;
}
say 5!;                     # -> 120

Note that the exclamation mark used as a prefix operator (i.e., placed before its operand) is a negation operator, but there is usually no possible confusion between the two operators because one is a prefix operator and our new operator is a postfix operator (although you might have to be a bit careful on where to put whitespace if your expression is somewhat complicated). The multi keyword isn’t strictly required here, but it is probably good practice to put it anyway, just to cover the cases where it is needed.

As another example, you could define the Σ (sum) operator as follows:

multi sub prefix:<Σ> (@*num-list) {
    [+] @num-list;
}
say Σ (10, 20, 12);         # -> 42

The benefit of using the Σ operator over using [+] directly may not be very obvious, but it is sometimes very useful to create a “domain-specific language”(DSL), i.e., a sublanguage specifically adapted for a specific context or subject matter (e.g., math or chemistry), which allows a particular type of problem or solution to be expressed more clearly than the existing core language would allow. In Perl 6, the grammars and the ease of creating new operators make this creation of DSL quite an easy task.

The new operator does not have to be declared between angle brackets. The following declarations could all be used to define an addition operator:

infix:<+>
infix:<<+>>
infix:«+»
infix:('+')
infix:("+")

You can also specify the precedence of your new operators (relative to existing ones). For example:

multi sub infix:<mult> is equiv(&infix:<*>) { ... }
multi sub infix:<plus> is equiv(&infix:<+>) { ... }
mutli sub infix:<zz> is tighter(&infix:<+>) { ... }
mutli sub infix:<yy> is  looser(&infix:<+>) { ... }

In one of his articles (“Structured Programming with go to statements”, December 1974), Donald Knuth, a very famous computer scientist, uses the :=: symbol as a pseudocode operator to express the variable interchange (or swap) of two values, i.e., the following operation:

# Caution: this is pseudocode, not working code, at this point
my $a = 1; my $b = 2;
$a :=: $b; 
say "$a $b";  # -> 2 1 

In Knuth’s paper, this is just a pseudocode shortcut to discuss more easily Tony Hoare’s quicksort algorithm (described in exercise ), but we can easily implement that symbol:

multi sub infix:<:=:> ($a is rw, $b is rw) {
    ($a, $b) = $b, $a;
}

Note that this can also be written this way:

multi sub infix:<:=:> ($a is rw, $b is rw) {
    ($a, $b) .= reverse;   # equivalent to: ($a, $b) = ($a, $b).reverse 
}

We can now test it for real on the following examples:

my ($c, $d) = 2, 5;
say $c :=: $d;             # -> (5 2)
# using it to swap two array elements:
my @e = 1, 2, 4, 3, 5;
@e[2] :=: @e[3];
say @e;                    # -> [1 2 3 4 5]

Now, the pseudocode above now just works fine as real code. A sort algorithm such as the one presented below (Section ??) may typically have code lines like these to swap two elements of an array:

if $some-condition {
    my ($x, $y) = @array[$i], @array[$i + gap];
    @array[$i], @array[$i + gap] = $y, $x;
}

If the above :=: operator is defined, we could just rewrite these lines as follows:

@array[$i] :=: @array[$i + gap] if $some-condition;

A final interesting point. Suppose we want to use the ⊕ operator for the mathematical set union between two hashes. This could easily be written as follows:

multi sub infix:<⊕> (%a, %b) {
    my %c = %a;
    %c{$_} = %b{$_} for keys %b;
    return %c
}

This works fine:

my %q1 = jan => 1, feb => 2, mar => 3;
my %q2 = apr => 4, may => 5, jun => 6;
my %first_half = %q1 ⊕ %q2;
say %first_half;
# {apr => 4, feb => 2, jan => 1, jun => 6, mar => 3, may => 5}

So far, so good, nothing really new. But the new infix ⊕ operator has now become almost the same as a Perl built-in operator, so that we can use it together with the reduction metaoperator:

my %q1 = jan => 1, feb => 2, mar => 3;
my %q2 = apr => 4, may => 5, jun => 6;
my %q3 = jul => 7, aug => 8, sep => 9;
my %q4 = oct => 10, nov => 11, dec => 12;
my %year = [⊕] %q1, %q2, %q3, %q4;
say %year;
# {apr => 4, aug => 8, dec => 12, feb => 2, jan => 1, 
# jul => 7, jun => 6, mar => 3, may => 5, nov => 11, 
# oct => 10, sep => 9}

Everything works as if this new operator was part of the Perl 6 grammar. And that’s in effect what has happened here: we have extended the language with a new operator. This possibility of extending the language is key to the ability of Perl 6 to cope with future needs that we can’t even think of at present time.

14.3  Creating Your Own Map-Like Functions

We have seen in this chapter and in Section ?? (p. ??) how higher-order functions such as the reduce, map, grep, and sort functions can be powerful and expressive. There are some other such built-in functions in Perl, but we would like to be able to create our own.

14.3.1  Custom Versions of map, grep, etc.

Let’s see how we could write our own custom versions of such functions.

14.3.1.1  my-map, a pure Perl version of map

Let’s start by trying to rewrite in pure Perl the map function. It needs to take a subroutine or a code block as its first argument, to apply it to an array or a list, and to return the modified list.

sub my-map (&code, @values) { 
    my @temp;
    push @temp, &code($_) for @values;
    return @temp;
}
my @result = my-map { $_ * 2 }, 1..5; 
say @result;                   # -> [2 4 6 8 10]

This works exactly as expected on the first trial. (I have attempted the same experiment with some other languages in the past , including Perl 5; it took quite a few tries before getting it right, especially regarding the calling syntax. Here, everything falls into place naturally.) To tell the truth, the test in this code example is very limited and there may very well be some edge cases when my-map does not work the same way as map, but our aim was not to clone map exactly; the point is that it is quite simple to build a higher-order subroutine that behaves essentially the same way as map.

14.3.1.2  my-grep

Writing our pure-Perl version of grep is just about as easy:

sub my-grep (&code, @values) { 
    my @temp;
    for @values -> $val {
        push @temp, $val if &code($val);
    }
    return @temp;
}
my @even = my-grep { $_ %% 2 }, 1..10; 
say @even;                   # -> [2 4 6 8 10]

14.3.2  Our Own Version of a Sort Function

We can similarly write our own version of the sort function.

The Perl sort function implements a sort algorithm known as merge sort1. Some previous versions of the Perl language (prior to version 5.8) implemented another algorithm known as quick sort2. The main reason for this change was that, although quick sort is on average a bit faster than merge sort, there are specific cases where quick sort is much less efficient than merge sort (notably when the data is almost sorted). These cases are very rare with random data, but not in real situations: it is quite common that you have to sort a previously sorted list in which only a few elements have been added or modified.

In computing theory, it is frequently said that, for sorting n items, both merge sort and quick sort have an average complexity of O(n logn), which essentially means that the number of operations to be performed is proportional to n logn if the number of items to be sorted is n, with quick sort being usually slightly faster; but quick sort has a worst-case complexity of O(n2), whereas merge sort has a worst-case complexity of O(n logn). When the number n of items to be sorted grows large, n2 becomes very significantly larger than n logn. In other words, merge sort is deemed to be better because it remains efficient in all cases.

Suppose now that we want to implement another sorting algorithm whose performance is alleged to be better. For this example, we will use a somewhat exotic sorting algorithm known as comb sort (a.k.a. Dobosiewicz’s sort), which is described on this page of Wikipedia: https://en.wikipedia.org/wiki/Comb_sort. This algorithm is said to be in place, which means that it does not need to copy the items into auxiliary data structures, and has generally good performance (often better than merge sort), but is not very commonly used in practice because its theoretical analysis is very difficult (in particular, it seems that it has a good worst-case performance, but no one has been able to prove this formally so far). In fact, we don’t really care about the real performance of this sort algorithm; it is very unlikely that a pure Perl implementation of the comb sort will outperform the built in sort function implemented in C and probably very carefully optimized by its authors. We only want to show how a sort subroutine could be implemented.

To work the same way as the internal sort, a sort function must receive as parameters a comparison function or code block and the array to be sorted, and the comparison routine should use placeholder parameters ($^a and $^b in the code below). This is a possible basic implementation:

sub comb_sort (&code, @array) {
    my $max = @array.elems;
    my $gap = $max;
    loop {
        my $swapped = False;
        $gap = Int($gap / 1.3);    # 1.3: optimal shrink factor
        $gap = 1 if $gap < 1;
        my $lmax = $max - $gap - 1;
        for (0..$lmax) -> $i {
            my ($x, $y) = (@array[$i], @array[$i+$gap]);
            (@array[$i], @array[$i+$gap], $swapped) = ($y, $x, True)
                if &code($x, $y) ~~ More;  # or: if &code($x, $y) > 0
        }
        last if $gap == 1 and ! $swapped;
    }
}

This can be tested with the following code:

my @v;
my $max = 500;
@v[$_] = Int(20000.rand) for (0..$max);

comb_sort {$^a <=> $^b}, @v;
.say for @v[0..10], @v[493..499]; # prints array start and end
# prints (for example):
# (14 22 77 114 119 206 264 293 298 375 391)
# (19672 19733 19736 19873 19916 19947 19967)

The inner loop compares items that are distant from each other by $gap values, and swaps them if they are not in the proper order. At the beginning, $gap is large, and it is divided by a shrink factor at each iteration of the outer loop. Performance heavily depends on the value of the shrink factor. At the end, the gap is 1 and the comb sort becomes equivalent to a bubble sort. The optimal shrink factor lies somewhere between 1.25 and 1.33; I have used a shrink factor of 1.3, the value suggested by the authors of the original publications presenting the algorithm.

14.3.3  An Iterator Version of map

These my-map, my-grep, and comb_sort functions are pedagogically interesting, but they aren’t very useful if they do the same thing as their built-in counterparts (and are probably slower). However, now that we have seen how to build them, we can create our own versions that do things differently.

Say we want to create a function that acts like map in the sense that it applies a transformation on the items of the input list, but does that on the items one by one, on demand from a consumer process, and pauses when and as long as the consumer process does not need anything. This could be described as an iterator returning modified elements on demand from the source list. You might think that this does not have much to do with map, but it might also be considered as a form of map with delayed evaluation, which processes only the elements of the input lists that are necessary for the program, not more than that.

The idea of processing only what is strictly required is often called laziness, and this is a very useful idea. Lazy list processing can be very useful not only because it avoids processing data that is not needed, and therefore can contribute to better resource usage and better performance, but also because it makes it possible to consider infinite lists: so long as you can guarantee that you are only going to use a limited number of elements, you don’t have any problem considering lists that are potentially unlimited. Perl 6 provides the concepts and tools to do this.

To reflect these considerations, we will call our subroutine iter-map. Since we might want to also write a iter-grep subroutine and possibly others, we will write separately an iterator and a data transformer.

We can use a closure to manufacture an iterator:

sub create-iter(@array) {
    my $index = 0;
    return sub { @array[$index++];}
}
my $iterator = create-iter(1..200);
say $iterator() for 1..5; # -> 1, 2, 3, 4, 5

Now that the iterator returns one value at a time, we can write the iter-map subroutine:

sub iter-map (&code-ref, $iter) {
    return &code-ref($iter);
}
my $iterator = create-iter(1..200);
say iter-map { $_ * 2 }, $iterator() for 1..5; # -> 2, 4, 6, 8, 10

Since we have called the iter-map function only 5 times, it has done the work of multiplying values by 2 only 5 times, instead of doing it 200 times, 195 of which are for nothing. Of course, multiplying a number by 2 isn’t an expensive operation and the array isn’t very large, but this shows how laziness can prevent useless computations. We will come back to this idea, since Perl 6 offers native support to lazy lists and lazy processing.

As already noted, an additional advantage of using a function such as iter-map is that it is possible to use virtually infinite lists. This implementation using an infinite list works just as before:

my $iterator = create-iter(1..*);
say iter-map { $_ * 2 }, $iterator() for 1..5;
     # prints 2, 4, 6, 8, 10

14.3.4  An Iterator Version of grep

If we try to write a iter-grep subroutine on the same model:

my $iterator = create-iter(reverse 1..10);
sub iter-grep (&code_ref, $iter) {
    my $val = $iter();
    return $val if &code_ref($val);
}
# simulating ten calls
say iter-grep { $_ % 2 }, $iterator for 1..10;

it doesn’t quite work as desired, because this will print alternatively odd values (9, 7, 5, etc.) and undefined values (for the even values of the array). Although we haven’t specified it yet, we would prefer iter-grep to supply the next value for which the $code-ref returns true. This implies that iter-grep has to loop over the values returned by the iterator until it receives a proper value.

That might look like this:

my $iterator = create-iter(reverse 1..10);
sub iter-grep (&code_ref, $iter) {
    loop {
        my $val = $iter();
        return unless defined $val;  # avoid infinite loop
        return $val if &code_ref($val);
 }
}
# simulating ten calls
for 1..10 {
    my $val = iter-grep { $_ % 2 }, $iterator;
    say "Input array exhausted!" and last unless defined $val;
    say $val;
}

This now works as expected:

9
7
5
3
1
Input array exhausted!

However, we still have a problem if the array contains some undefined values (or “empty slots”). This would be interpreted as the end of the input array, whereas there might be some additional useful values in the array. This is sometimes known in computer science as the “semi-predicate” problem. Here, iter-grep has no way to tell the difference between an empty slot in the array and the end of the array. A more robust implementation therefore needs a better version of create-iter returning something different for an undefined array item and array exhaustion. For example, the iterator might return a false value when done with the array, and a pair with the array item as a value otherwise. A pair will be considered to be true, even if its value isn’t defined:

sub create-iter(@array) {
    my $index = 0;
    my $max-index = @array.end;
    return sub { 
        return False if $index >= $max-index; 
        return ("a_pair" => @array[$index++]);
    }
}
my @array = 1..5;
@array[7] = 15;
@array[9] = 17;
push @array, $_ for 20..22;
.say for '@array is now: ', @array;
my $iterator = create-iter(@array);
sub iter-grep (&code_ref, $iter) {
    loop {
        my $returned-pair = $iter();
        return unless $returned-pair;  # avoid infinite loop
        my $val = $returned-pair.value;
        return $val if defined $val and &code_ref($val);
 }
}
for 1..10 {
    my $val = iter-grep { $_ % 2 }, $iterator;
    say "Input array exhausted!" and last unless defined $val;
    say $val;
}

Running this script displays the following:

@array is now:
[1 2 3 4 5 (Any) (Any) 15 (Any) 17 20 21 22]
1
3
5
15
17
21
Input array exhausted!

This now works fully as desired.

Although iter-map did not suffer from the same problem, you might want as an exercise to modify iter-map to use our new version of create-iter.

The advantage of the iterator functions seen above is that they process only the items that are requested by the user code, so that they perform only the computations strictly required and don’t waste CPU cycles and time doing unnecessary work. We have gone through these iterating versions of the map and grep functions as a form of case study for pedagogical purposes, in order to explain in practical terms the idea of laziness.

This is what would have been necessary to implement lazy iterators in earlier versions of Perl (e.g., Perl 5), but much of this is not required with Perl 6 which has built-in support for lazy lists and lazy operators, as we will see soon.

14.4  The gather and take Construct

A useful construct for creating (possibly lazy) lists is gather { take }. A gather block acts more or less like a loop and runs until take supplies a value. This construct is also a form of iterator.

For example, the following code returns a list of numbers equal to three times each of the even numbers between 1 and 10:

my @list = gather { 
    for 1..10 {
        take 3 * $_ if $_ %% 2
    } 
};
say @list;                 # -> [6 12 18 24 30]

Here, gather loops on the values of the range and take “returns” the wanted values.

If you think about it, the code above seems to be doing a form of combination of a map and a grep.

We can indeed simulate a map. For example:

my @evens = map { $_ * 2 }, 1..5;

could be rewritten with a gather { take } block :

my @evens = gather {take $_ * 2 for 1.. 5}; # [2 4 6 8 10]

And we could simulate a grep similarly:

my @evens = gather {take $_ if $_ %% 2 for 1..10};

Since take also admits a method syntax, this could be rewritten as:

my @evens = gather {.take if $_ %% 2 for 1..10};

These code examples don’t bring any obvious advantage over their map or grep counterparts and are not very useful in themselves, but they illustrate how a gather { take } block can be thought of as a generalization of the map and grep functions. And, as already mentioned, the first example in this section actually does combine the actions of a map and a grep.

In fact, we can write a new version of my-map:

sub my-map (&coderef, @values) {
   return gather {
      take &coderef($_) for @values;
   };
}
say join " ", my-map {$_ * 2}, 1..10;
# prints: 2 4 6 8 10 12 14 16 18 20

Writing a new version of my-grep is just about as easy and left as an exercise to the reader.

Calling the take function only makes sense within the context of a gather block, but it does not have to be within the block itself (or within the lexical scope of the gather block); it can be within the dynamic scope of the gather block

Although we haven’t covered this concept before, Perl has the notion of dynamic scope: contrary to lexical scope, dynamic scope encloses not only the current block, but also the subroutines called from within the current block. Dynamic scope variables use the “*” twigil. Here is an example:

sub write-result () { say $*value; }
sub caller (Int $val) { 
    my $*value = $val * 2; 
    write-result();
}
caller 5;               # -> 10

In the code above, the $*value dynamic variable is declared and defined in the caller subroutine and used in the write-result subroutine. This would not work with a lexical variable, but it works with a dynamic variable such as $*value, because the scope of $*value extends to the write-result subroutine called by caller.

Similarly, the take function can work within the dynamic scope of the gather block, which essentially means that the take function can be called within a subroutine called from the gather block. For example:

my @list = gather {
    compute-val($_) for 1..10; 
}
sub compute-val(Numeric $x) {
    take $x * $x + 2 * $x - 6;
}
say @list[0..5];        # -> (-3 2 9 18 29 42)

As you can see, the take function is not called within the gather block, but it works fine because it is within the dynamic scope of the gather block, i.e., within the compute-val subroutine, which is itself called in the gather block.

One last example will show how powerful the gather { take } construct can be.

Let’s consider this problem posted on the Rosetta Code site (http://rosettacode.org/wiki/Same_Fringe): write a routine that will compare the leaves (“fringe”) of two binary trees to determine whether they are the same list of leaves when visited left-to-right. The structure or balance of the trees does not matter; only the number, order, and value of the leaves is important.

The solution in Perl 6 uses a gather { take } block and consists of just six code lines:

sub fringe ($tree) {
    multi sub fringey (Pair $node) {fringey $_ for $node.kv;}
    multi sub fringey ( Any $leaf) {take $leaf;}
    gather fringey $tree;
}
sub samefringe ($a, $b) { fringe($a) eqv fringe($b) }

Perl 6 is the clear winner in terms of the shortest code to solve the problem.

As a comparison, the Ada example is almost 300 lines long, the C and Java programs slightly over 100 lines. By the way, the shortest solutions besides Perl 6 (Clojure, Picolisp, Racket) run in about 20 lines and are all functional programming languages, or (for Perl 5 for example) are written using functional programming concepts. Although the number of code lines is only one of many criteria to compare programs and languages, this is in my humble opinion a testimony in favor of the functional programming paradigm and its inherent expressiveness.

14.5  Lazy Lists and the Sequence Operator

Let’s come back now to the idea of lazy lists and study how Perl 6 can handle and use them.

14.5.1  The Sequence Operator

Perl provides the ... sequence operator to build lazy lists. For example, this:

my $lazylist := (0, 1 ... 200);
say $lazylist[42];                  # -> 42

produces a lazy list of successive integers between 0 and 200. The Perl 6 compiler may or may not allocate some of the numbers (depending on the implementation), but it is not required to produce the full list immediately. The numbers that have not been generated yet may be created and supplied later, if and when the program tries to use these values.

As explained below, if you want to generate consecutive integers, you can actually simplify the lazy list definition:

my $lazylist := (0 ... 200);

If you assign a sequence to an array, it will generate all the values of the sequence immediately, since assignment to an array is eager (nonlazy). However, you can force laziness with the lazy built-in when assigning to an array:

my @lazyarray = lazy 1 ... 200;     # -> [...]
say @lazyarray.elems;               # -> Cannot .elems a lazy list
say @lazyarray[199];                # -> 200
say @lazyarray[200];                # -> (Any)
say @lazyarray.elems;               # -> 200

Here, the @lazylist array is originally lazy. Evaluating one item past the last element of the array forces Perl to actually generate the full array (and the array is no longer lazy). After that, no further elements can be generated, and .elems stays at 200 (unless you actually assign values to elements past the 200th element).

When given two integers, one for the first and the last items of a list, the sequence operator will generate a list of consecutive integers between the two supplied integers. If you supply two initial items defining implicitly a step, this will generate an arithmetic sequence:

my $odds = (1, 3 ... 15);           # (1 3 5 7 9 11 13 15)
my $evens = (0, 2 ... 42);          # (0 2 4 6 8 ... 40 42)

You may remember that, in Section ?? of the chapter on arrays and lists, we said that parentheses are usually not necessary for constructing a list, unless needed for precedence reasons. The above code is one such example: try to run that code without parentheses and observe the content of the $odds and $evens variables.

When three initial numbers in geometric progression are supplied, the sequence operator will produce a geometric sequence, as in this example producing the powers of two:

say (1, 2, 4 ... 32);              # -> (1 2 4 8 16 32)

The sequence operator may also be used to produce noninteger numbers, as shown in this example under the REPL:

> say (1, 1.1 ... 2);
(1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2)

Contrary to the .. range operator, the sequence operator can also count down:

say (10 ... 1);                    #  (10 9 8 7 6 5 4 3 2 1)

14.5.2  Infinite Lists

One of the great things about lazy lists is that, since item evaluation is postponed, they can be infinite without consuming infinite resources from the computer:

my $evens = (0, 2 ... Inf);        # (...)
say $evens[18..21];                # -> (36 38 40 42)

The Inf operand is just the so-called “Texas” or ASCII equivalent of the ∞ infinity symbol. The above could have been written:

my $evens = (0, 2 ... ∞); 
say $evens[21];                    # -> 42

The most common way to indicate an infinite lazy list, though, is to use the * whatever argument:

my $evens = (0, 2 ... *); 
say $evens[21];                    # -> 42

14.5.3  Using an Explicit Generator

The sequence operator ... is a very powerful tool for generating lazy lists. Given one number, it just starts counting up from that number (unless the end of the sequence is a lower number, in which case it counts down). Given two numbers to start a sequence, it will treat it as an arithmetic sequence, adding the difference between those first two numbers to the last number generated to generate the next one. Given three numbers, it checks to see if they represent the start of an arithmetic or a geometric sequence, and will continue it.

However, many interesting sequences are neither arithmetic nor geometric. They can still be generated with the sequence operator provided one term can be deduced from the previous one (or ones). For this, you need to explicitly provide the code block to generate the next number in the sequence. For example, the list of odd integers could also be generated with a generator as follows:

say (1, { $_ + 2 } ... 11);        # -> (1 3 5 7 9 11)

We now have yet another way of defining the factorial function:

my $a;
my @fact = $a = 1, {$_ * $a++} ... *;
say @fact[0..8];          # -> (1 1 2 6 24 120 720 5040 40320)

or, possibly more readable:

my @fact = 1, { state $a = 1; $_ * $a++} ... *;
say @fact[0..8];          # -> (1 1 2 6 24 120 720 5040 40320)

This approach is much more efficient than those we have seen before for repeated use, since it automatically caches the previously computed values in the lazy array. As you might remember from Section ?? (p. ??), caching is the idea of storing a value in memory in order to avoid having to recompute it, with the aim of saving time and CPU cycles.

And we can similarly construct a lazy infinite list of Fibonacci numbers:

my @fibo = 0, 1, -> $a, $b { $a + $b } ... *;
say @fibo[0..10];   # -> (0 1 1 2 3 5 8 13 21 34 55)

This can be rewritten in a more concise (albeit possibly less explicit and less clear) way using the * whatever placeholder parameter:

my @fibo = 0, 1, * + * ... *;
say @fibo[^10];      # -> (0 1 1 2 3 5 8 13 21 34)

Just as for factorial, this is more efficient than the implementations we’ve seen previously, because the computed values are cached in the lazy array.

Similarly the sequence of odd integers seen at the beginning of this section could be generated in a slightly more concise form with the whatever "*" parameter:

say (1, * + 2  ... 11);        # -> (1 3 5 7 9 11)

This syntax with an asterisk is called a whatever closure; we will come back to it below.

There is, however, a small caveat in using the sequence operator with an explicit generator: the end value (the upper bound) has to be one of the generated numbers for the list to stop at it. Otherwise, it will build an infinite list:

my $nums = (0, { $_ + 4 } ... 10);
say $nums[0..5];     # -> (0 4 8 12 16 20)

As you can see in this example, the generator “jumps over the end point” (it goes beyond 10), and the list is in fact infinite. This is usually not a problem in terms of the computer resources, since it is a lazy infinite list, but it is probably a bug if you expected the list not to run above 10. In this specific case, it is very easy to compute an end point that will be matched (e.g., 8 or 12), but it may be more complicated to find a valid end point. For example, it is not obvious to figure out what the largest Fibonacci number less than 10,000 might be without computing first the series of such numbers until the first one beyond 10,000.

In such cases where it is difficult to predict what the end point should be, we can define another code block to test whether the sequence should stop or continue. The sequence will stop if the block returns a true value. For example, to compute Fibonacci numbers until 100, we could use this under the REPL:

> my @fibo = 0, 1, -> $a, $b { $a + $b } ... -> $c { $c > 100}
[0 1 1 2 3 5 8 13 21 34 55 89 144]

This is better, as it does stop the series of numbers, but not quite where we wanted: we really wanted it to stop at the last Fibonacci under 100, and we’re getting one more. It would be quite easy to remove or filter out the last generated Fibonacci number, but it’s even better not to generate it at all. A slight change in the syntax will do that for us:

> my @fibo = 0, 1, -> $a, $b { $a + $b } ...^ -> $c { $c > 100}
[0 1 1 2 3 5 8 13 21 34 55 89]

Switching from ... to ...^ means the resulting list does not include the first element for which the termination test returned true.

Similarly, we can limit the whatever closure syntax seen above as follows:

> say 0, 1, * + * ...^ * > 100;
(0 1 1 2 3 5 8 13 21 34 55 89)

14.6  Currying and the Whatever Operator

Currying (or partial application) is a basic technique of functional programming, especially in pure functional programming languages such as Haskell. The “curry” name comes from the American mathematician Haskell Curry, one of the founders (with Alonzo Church) of logical mathematical theories, including lambda-calculus and others. (And, as you might have guessed, the Haskell programming language derived its name from Curry’s first name.)

To curry a function having several arguments means replacing it with a function having only one argument and returning another function (often a closure) whose role is to process the other arguments.

In some pure functional programming languages, a function can only take one argument and return one result. Currying is a technique aimed at coping with this apparent limitation. Perl does not have such a limitation, but currying can still be very useful to reduce and simplify the arguments lists in subroutine calls, notably in cases of repeated recursive calls.

14.6.1  Creating a Curried Subroutine

The standard example is an add function. Suppose we have an add mathematical function, add(x, y), taking two arguments and returning their sum.

In Perl, defining the add subroutine is very simple:

sub add (Numeric $x, Numeric $y) {return $x + $y}

A curried version of it would be another function add_y(x) returning a function adding y to its argument.

This could be done with a closure looking like this:

sub make-add (Numeric $added-val) {
    return sub ($param) {$param + $added-val;}    
    # or: return sub {$^a + $added-val;}
}
my &add_2 = make-add 2;
say add_2(3);           # -> 5
say add_2(4.5);         # -> 6.5

The &add_2 code reference is a curried version of our mathematical add function. It takes only one argument and returns a value equal to the argument plus two.

We can of course create other curried subroutines using make-add with other arguments:

my &add_3 = make-add 3;
say &add_3(6);           # -> 9

There is not much new here: the &add_2 and &add_3 are just closures that memorize the increment value passed to the make-add subroutine. This can be useful when some functions are called many times (or recursively) with many arguments, some of which are always the same: currying them makes it possible to simplify the subroutine calls.

14.6.2  Currying an Existing Subroutine with the assuming Method

If a subroutine already exists, there is often no need to create a new closure with the help of a “function factory” (such as make-add) as we’ve done just above. It is possible to curry the existing function, using the assuming method on it:

sub add (Numeric $x, Numeric $y) {return $x + $y}   
my &add_2 = &add.assuming(2);                       
add_2(5);              # -> 7                                     

The assuming method returns a callable object that implements the same behavior as the original subroutine, but has the values passed to assuming already bound to the corresponding parameters.

It is also possible to curry built-in functions. For example, the substr built-in takes normally three arguments: the string on which to operate, the start position, and the length of the substring to be extracted. You might need to make a number of extractions on the same very long string. You can create a curried version of substr always working on the same string:

my $str = "Cogito, ergo sum";                     
my &string-start-chars = &substr.assuming($str, 0);
say &string-start-chars($_) for 6, 13, 16; 

This will print:

Cogito
Cogito, ergo
Cogito, ergo sum

Note that we have “assumed” two parameters here, so that the curried subroutine “remembers” the first two arguments and only the third argument needs be passed to &string-start-chars.

You can even curry Perl 6 operators (or your own) if you wish:

my &add_2 = &infix:<+>.assuming(2);

14.6.3  Currying with the Whatever Star Parameter

A more flexible way to curry a subroutine or an expression is to use the whatever star (*) argument:

my &third = * / 3; 
say third(126);          # -> 42

The whatever star (*) is a placeholder for an argument, so that the expression returns a closure.

It can be used in a way somewhat similar to the $_ topical variable (except that it does not have to exist when the declaration is made):

> say map 'foo' x * , (1, 3, 2);
(foo foofoofoo foofoo)

It is also possible to use multiple whatever terms in the same expression. For example, the add subroutine could be rewritten as a whatever expression with two parameters:

my $add = * + *;
say $add(4, 5);          # -> 9

or:

my &add = * + *;
say add(4, 5);           # -> 9

You might even do the same with the multiplication operator:

my $mult = * * *;
say $mult(6, 7);         # -> 42

The compiler won’t get confused and will figure out correctly that the first and third asterisks are whatever terms and that the second asterisk is the multiplication operator; in other words that this is more or less equivalent to:

my $mult = { $^a * $^b };
say $mult(6, 7);         # -> 42

or to:

my $mult = -> $a, $b { $a * $b }
say $mult(6, 7);         # -> 42  

To tell the truth, the compiler doesn’t get confused, but the user might, unless she or he has been previously exposed to some functional programming languages that commonly use this type of syntactic construct.

These ideas are powerful, but you are advised to pay attention so you don’t to fall into the trap of code obfuscation.

That being said, the functional programming paradigm is extremely expressive and can make your code much shorter. And, overall, shorter code, provided it remains clear and easy to understand, is very likely to have fewer bugs than longer code.

14.7  Using a Functional Programming Style

In this chapter, we have seen how to use techniques derived from functional programming to make our code simpler and more expressive. In a certain way, though, we haven’t fully applied functional programming. All of the techniques we have seen stem from functional programming and are a crucial part of it, but the true essence of functional programming isn’t really about using higher-order functions, list processing and pipeline programming, anonymous subroutines and closures, lazy lists and currying, and so on. The true essence of functional programming is a specific mindset that treats computation as the evaluation of mathematical functions and avoids changing-state and mutable data.

Instead of simply using techniques derived from functional programming, we can go one step further and actually write code in functional programming style. If we are going to avoid changing-state and mutable data, this means that we will no longer use variables (or at least not change them, and treat them as immutable data) and do things differently.

14.7.1  The Merge Sort Algorithm

Consider the example of a classical and efficient sorting technique called the merge sort, invented by John von Neumann in 1945. It is based on the fact that if you have two sorted arrays, it is significantly faster to merge the two arrays into a single sorted array, by reading each array in parallel and picking up the appropriate item from either of the arrays, than it would be to blindly sort the data of the two arrays.

Merge sort is a “divide and conquer” algorithm which consists of recursively splitting the input unsorted array into smaller and smaller sublists, until each sublist contains only one item (at which point the sublist is sorted, by definition), and then merging the sublists back into a sorted array.

To avoid adding unnecessary complexity, we will discuss here implementations that simply sort numbers in ascending numeric order.

14.7.2  A Non-Functional Implementation of Merge Sort

Here’s how we could try to implement a merge sort algorithm using purely imperative/procedural programming:

# ATTENTION: buggy code
sub merge-sort (@out, @to-be-sorted, $start = 0, $end = @to-be-sorted.end) {
    return if $end - $start  < 2;
    my $middle = ($end + $start) div 2;
    my @first = merge-sort(@to-be-sorted, @out, $start, $middle);
    my $second = merge-sort(@to-be-sorted, @out, $middle, $end);
    merge-lists(@out, @to-be-sorted, $start, $middle, $end);
}
sub merge-lists (@in, @out, $start, $middle, $end) {
    my $i = $start;
    my $j = $middle;
    for $start..$end  -> $k {
        if $i < $middle and ($j >= $end or @in[$i] <= @in[$j]) {
            @out[$k] = @in[$i];
            $i++;
        } else {
            @out[$k] = @in[$j];
            $j++;
        } 
    }
}
my @array = reverse 1..10;
my @output = @array;
merge-sort2  @output, @array;
say @output;

This program always works on the full array (and its copy) and the sublists are not extracted; the extraction is simulated by the use of subscript ranges.

This code is not very long, but nonetheless fairly complicated. If you try to run it, you’ll find that there is a bug: the last item of the original array is improperly sorted. For example, if you try to run it on the list of 10 consecutive integers in reverse order (i.e., ordered from 10 to 1) used in the test at the end of the above code, you’ll get the following output array:

[2 3 4 5 6 7 8 9 10 1]

As an exercise, try fixing the bug before reading any further. (The fix is explained next.)

It is likely that you’ll find that identifying and correcting the bug is quite difficult, although this bug is actually relatively simple (when I initially wrote this code, I encountered some more complicated bugs before arriving at this one). It is quite hard to properly use the array subscripts and insert the data items in the right place, avoiding off-by-one and other errors.

Here’s a corrected version:

sub merge-sort (@out, @to-be-sorted, $start = 0, $end = @to-be-sorted.elems) {
    return if $end - $start < 2;
    my $middle = ($end + $start) div 2;
    merge-sort(@to-be-sorted, @out, $start, $middle);
    merge-sort(@to-be-sorted, @out, $middle, $end);
    merge-lists(@out, @to-be-sorted, $start, $middle, $end);
}
sub merge-lists (@in, @out, $start, $middle, $end) {
    my $i = $start;
    my $j = $middle;
    for $start..$end - 1  -> $k {
        if $i < $middle and ($j >= $end or @in[$i] <= @in[$j]) {
            @out[$k] = @in[$i];
            $i++;
        } else {
            @out[$k] = @in[$j];
            $j++;
        } 
    }
}
my @array = pick 20, 1..100;
my @output = @array;
merge-sort2  @output, @array;
say @output;

The main change is in the signature of the merge-sort subroutine: the default value for the $end parameter is the size (number of items) of the array, and no longer the subscript of the last elements of the array (so, the bug was an off-by-one error). Making this correction also makes it necessary to change the pointy block (for $start..$end - 1 -> ...) in the merge-lists subroutine.

For 20 randoms integers between 1 and 100, this prints out something like the following:

[11 13 14 15 19 24 25 29 39 46 52 57 62 68 81 83 89 92 94 99]

The point is that it is difficult to understand the detailed implementation of the algorithm, and fairly hard to debug, even using the Perl debugger presented in section ??.

14.7.3  A Functional Implementation of Merge Sort

Rather than modifying the entire array at each step through the process (and being confused in the management of subscripts), we can split recursively the data into actual sublists and work on these sublists.

This can lead to the following implementation:

sub merge-sort (@to-be-sorted) {
    return @to-be-sorted if @to-be-sorted < 2;
    my $middle = @to-be-sorted.elems div 2;
    my @first = merge-sort(@to-be-sorted[0 .. $middle - 1]);
    my @second = merge-sort(@to-be-sorted[$middle .. @to-be-sorted.end]);
    return merge-lists(@first, @second);
}
sub merge-lists (@one, @two) {
    my @result;
    loop {
        return @result.append(@two) unless @one;
        return @result.append(@one) unless @two;
        push @result, @one[0] < @two[0] ?? shift @one !! shift @two;
    }
} 

The code is shorter than the previous implementation, but that’s not the main point.

The merge-sort subroutine is somewhat similar to the previous implementation, except that it recursively creates the sublists and then merge the sublists.

It is the merge-lists subroutine (which does the bulk of the work in both implementations) that is now much simpler: it receives two sublists and merges them. Most of this work is done in the last code line; the two lines before it are only taking care of returning the merged list when one of the input sublists ends up being empty.

This functional version of the program captures the essence of the merge sort algorithm:

  • If the array has less than two items, it is already sorted, so return it immediately (this is the base case stopping the recursion).
  • Else, pick the middle position of the array to divide it into two sublists, and call merge-sort recursively on them;
  • Merge the sorted sublist thus generated.
  • Return the merged list to the caller.

I hope that you can see how much clearer and simpler the functional style implementation is. To give you an idea, writing and debugging this latter program took me about 15 minutes, i.e., about 10 times less than the nonfunctional version. If you don’t believe me, try to implement these two versions for yourself. (It’s a good exercise even if you do believe me.)

The exercise section of this chapter (section ?? will provide another (and probably even more telling) example of the simplicity of functional programming compared to more imperative or procedural approaches.

14.8  Debugging

This time, we will not really talk about debugging proper, but about a quite closely related activity, testing.

Testing code is an integral part of software development. In Perl 6, the standard Test module (shipped and installed together with Rakudo) provides a testing framework which enables automated, repeatable verification of code behavior.

The testing functions emit output conforming to the Test Anything Protocol or TAP (http://testanything.org/), a standardized testing format which has implementations in Perl, C, C++, C#, Ada, Lisp, Erlang, Python, Ruby, Lua, PHP, Java, Go, JavaScript, and other languages.

A typical test file looks something like this:

use v6;
use Test;      # a Standard module included with Rakudo
use lib 'lib';

# ...

plan $num-tests;

# .... tests

done-testing;  # optional with 'plan'

We ensure that we’re using Perl 6, via the use of the v6 pragma, then we load the Test module and specify where our libraries are. We then specify how many tests we plan to run (such that the testing framework can tell us if more or fewer tests were run than we expected) and when finished with the tests, we use done-testing to tell the framework we are done.

We have already seen a short example of the use of the Test module in Section ?? (solution to the exercise of Section ??).

The Test module exports various functions that check the return value of a given expression, and produce standardized test output accordingly.

In practice, the expression will often be a call to a function or method that you want to unit-test. You may want to check:

  • Truthfulness:
    ok($value, $description?); 
    nok($condition, $description?);
    

    The ok function marks a test as passed if the given $value evaluates to true in a Boolean context. Conversely, the nok function marks a test as passed if the given value evaluates to false. Both functions accept an optional $description of the test. For example:

    ok  $response.success, 'HTTP response was successful';
    nok $query.error,      'Query completed without error';
    
  • String comparison:
    is($value, $expected, $description?)
    

    The is function marks a test as passed if $value and $expected compare positively with the eq operator. The function accepts an optional description of the test.

  • Approximate numeric comparison:
    is-approx($value, $expected, $description?)
    

    is-approx marks a test as passed if the $value and $expected numerical values are approximately equal to each other. The subroutine can be called in numerous ways that let you test using relative or absolute tolerance of different values. (If no tolerance is set, it will default to an absolute tolerance of 10−5.)

  • Regex:
    like($value, $expected-regex, $description?)
    unlike($value, $expected-regex, $description?)
    

    The like function marks a test as passed if the $value matches the $expected-regex. Since we are speaking about regexes, “matches”, in the previous sentence, really means “smart matches”. The unlike function marks a test as passed if the $value does not match the $expected-regex.

    For example:

    like 'foo', /fo/, 'foo looks like fo';
    unlike 'foo', /bar/, 'foo does not look like bar';
    
  • And many other functions which you can study in the following documentation: https://docs.perl6.org/language/testing.html.

In principle you could use ok for every kind of comparison test, by including the comparison in the expression passed as a value:

ok factorial(4) == 24, 'Factorial - small integer';

However, it is better (where possible) to use one of the specialized comparison test functions, because they can print more helpful diagnostics output in case the comparison fails.

If a test fails although it appears to be successful, and you don’t understand why it fails, you may want to use the diag function to get additional feed back. For example, assume that the test:

ok $foo, 'simple test';

is failing and that you don’t have enough feedback to understand why; you may try:

diag "extensive feedback" unless
    ok $foo, 'simple test';

This might give you the additional information you need.

Suppose we want to test a subroutine to determine whether a given string is a palindrome (as discussed in several chapters in this book, see for example Exercise ?? and Subsection ??). You could perform that test by writing something like this:

# file is-palindrome.p6
use v6;

sub is-palindrome($s) { $s eq $s.flip }

multi sub MAIN( $input ) {
    if is-palindrome( $input ) {
        say "'$input' is palindrome.";
    }
    else {
        say "'$input' is not palindrome.";
    }
}

multi sub MAIN(:$test!) {
    use Test;
    plan 4;
    ok is-palindrome(''), 'empty string';
    ok is-palindrome('aba'), 'odd-sized example';
    ok is-palindrome('abba'), 'even-sized example';
    nok is-palindrome('blabba'), 'counter example';
}

Usually, tests are stored in different files placed in a “t” subdirectory. Here, for this short test, everything is in the same file, but two multi MAIN subroutines are supplied to either test whether a passed parameter is a palindrome, or to run the test plan. See Section ?? (p. ?? and Subsection ?? (p. ??) if you need a refresher on the MAIN subroutine.

You can run these tests as follows:

$ perl6 is-palindrome.p6 abba
'abba' is palindrome.
$ perl6 is-palindrome.p6 abbaa
'abbaa' is not palindrome.
$
$ perl6 is-palindrome.p6 --test
1..4
ok 1 - empty string
ok 2 - odd-sized example
ok 3 - even-sized example
ok 4 - counter example

Try this example, play with it, change some lines, add new tests, and see what happens.

Writing such unit tests may appear to be tedious work. The truth, though, is that it is manual testing that is somewhat tedious and, it you try, you’ll find that writing and using such test scenarios makes the testing work much less cumbersome. You usually write the tests once, and run them very often. And you will be surprised how many bugs you will find even if you are sure your code is correct! Also, once you’ve written a test suite for something, you might still be using it years later, for example for nonregression testing after a software change. This can be not only a time saver, but also a guarantee that you’re supplying good quality software.

Many organizations actually write their tests even before writing the programs. This process is called test-driven development and there are many areas where it is quite successful. In fact, the Rakudo/Perl 6 compiler had a very large test suite (more than 40,000 tests) long before the compiler was ready. In a way, the test suite even became the true specification of the project, so that you could use the same test suite for verifying another implementation of Perl 6.

An additional advantage of such an approach is that measuring the ratio of tests that pass may often be a better metric of software completion than the usual “wet finger in the wind” estimates, such as, say, a ratio of the number of code lines written versus the estimate of the final number of code lines.

14.9  Glossary

First-class object:
An object that can be passed around as an argument to or as a return value from a subroutine. In Perl, subroutines are first-class objects (also called first-class citizens).
Callback function
A function or subroutine that is passed as an argument to another function.
higher-order function:
A function or subroutine that takes another subroutine (or a simple code block) as an argument. The map, grep, reduce, and sort built-in functions are examples of higher-order functions.
Anonymous subroutine
A subroutine that has no name. Also commonly called a lambda. Although they are not exactly the same thing, pointy blocks can also be assimilated to anonymous subroutines.
Closure
A function that can access to variables that are lexically available where the function is defined, even if those variables are no longer in scope where the function is called.
pipeline programming:
A programming model in which pieces of data (usually lists) undergo successive transformations as in a pipeline or an assembly line.
Reduction
A process through which a list of values is reduced to a single value. For example, a list of numbers can be reduced, for example, to an average, a maximum value, or a median. Some languages call this process folding.
Metaoperator
An operator that acts on another operator to provide new functionality.
Algorithmic complexity
A rough measure of the number of computing operations (and time) needed to perform some computing on relatively large data sets, and, more precisely, a measure of how an algorithm scales when the data set grows.
Laziness
A process of delayed evaluation whereby, for example, one populates a list or processes the items of a list only on demand, when required, to avoid unnecessary processing.
Iterator
A piece of code that returns values on demand and keeps track of where it has arrived, so as to be able to know what the next value to be provided should be.
Cache
To cache a value is to store it in memory (in a variable, an array, a hash, etc.) in order to avoid the need to compute it again, thereby hopefully saving some computation time.
Currying
Currying a function that takes several arguments means to create another function that takes fewer arguments (where the missing arguments are stored within the new curried function).
test-driven development:
A development methodology where the tests are written from the specifications before the actual program, so that it becomes easier to check that the program complies with the specifications.

14.10  Exercise: Quick Sort

Exercise 1   Quick sort is a “divide and conquer” sorting algorithm invented by Tony Hoare in 1959. It relies on partitioning the array to be sorted. To partition an array, an element called a pivot is selected. All elements smaller than the pivot are moved before it and all greater elements are moved after it. The lesser and greater sublists are then recursively sorted through the same process and finally reassembled together.

One of the difficulties is to select the right pivot. Ideally it should be the median value of the array items, since this would give partitions of approximately equal sizes, thereby making the algorithm optimally efficient, but finding the median for each partition would take some time and ultimately penalize the performance. Various variants of the quick sort have been tried, with different strategies to (usually arbitrarily) select a pivot. Here, we select an element at or near the middle of the partition.

The following is a typical nonfunctional implementation of the quick sort algorithm.

[fontshape=up]
sub quicksort(@input) {
    sub swap ($x, $y) {
        (@input[$x], @input[$y]) = @input[$y], @input[$x];
    }
    sub qsort ($left, $right) {
        my $pivot = @input[($left + $right) div 2];
        my $i = $left;
        my $j = $right;
        while $i < $j {
            $i++ while @input[$i] < $pivot;
            $j-- while @input[$j] > $pivot;
            if $i <= $j {
                swap $i, $j;
                $i++;
                $j--;
            }
        }
        qsort($left, $j) if $left < $j;
        qsort($i, $right) if $j < $right;
    }
    qsort(0, @input.end)
}
my @array = pick 20, 1..100;
quicksort @array;
say @array;

The array is modified in place (which has the advantage of requiring limited memory), which means that the original array is modified.

For functional programming, internal data is immutable, so that you’re copying data fragments into new lists, rather than modifying them “in place.”

In the same spirit as what we’ve done in section ?? for the merge sort algorithm, try to write a functional style implementation of the quick sort algorithm. Hint: this can be done in about half a dozen lines of code.

Solution: ??.


1
Merge sort is presented in some details in section ??.
2
Quick sort is presented in ??

Are you using one of our books in a class?

We'd like to know about it. Please consider filling out this short survey.


Think DSP

Think Java

Think Bayes

Think Python 2e

Think Stats 2e

Think Complexity


Previous Up Next