This HTML version of Think Perl 6 is provided for convenience, but it is not the best format of the book. You might prefer to read the PDF version. Chapter 8 Case Study: Word PlayThis chapter is intended to let you practice and consolidate the knowledge you have acquired so far, rather than introducing new concepts. To help you gain experience with programming, we will cover a case study that involves solving word puzzles by searching for words that have certain properties. For example, we’ll find the longest palindromes in English and search for words whose letters appear in alphabetical order. And I will present another program development plan: reduction to a previously solved problem. 8.1 Reading from and Writing to FilesFor the exercises in this chapter, we will need our programs to read text from files. In many programming languages, this often means that we need a statement to open a file, then a statement or group of statements to read the file’s content, and finally a statement to close the file (although this last operation may be performed automatically in some circumstances). We are interested here in text files that are usually made of lines separated by logical new line characters; depending on your operating system, such logical new line characters consist of either one (Linux, Mac) or two (Windows) physical characters (bytes). The Perl built-in function open takes the path and name of the file as a parameter and returns a file handle (IO::Handle object) which you can use to read the file (or to write to it): my $fh = open("path/to/myfile.txt", :r); my $data = $fh.slurp-rest; $fh.close; The :r option is the file mode (read). $fh is a common name for a file handle. The file object provides methods for reading, such as slurp-rest which returns the full content of the file from the current position to the end (and the entire content of the file if we’ve just opened it). This is the traditional way of opening and reading files in most languages. However, Perl’s IO role (in simple terms, a role is a collection of related methods) offers simpler methods which can open a file and read it all in one single instruction (i.e., without having to first open a file handle and then close it): my $text = slurp "path/to/myfile.txt"; # or: my $text = "path/to/myfile.txt".IO.slurp; slurp takes care of opening and closing the file for you. We can also read the file line by line, which is very practical if each line contains a logical entity such as a record, and is especially useful for very large files that might not fit into memory: for 'path/to/hugefile.txt'.IO.lines -> $line { # Do something with $line } By default, the .lines method will remove the trailing newline characters from each line, so that you don’t have to worry about them. We haven’t studied arrays yet, but you can also read all
lines of a file into an array, with each line of the file
becoming an array item. For example, you can load the
myfile.txt file into the my @lines = "myfile.txt".IO.lines; Accessing any line can then be done with the bracket operator and an index. For example, to print the first and the seventh line: say @lines[0]; say @lines[6]; To write data to a file, it is possible to open a file just as when wanting to read from it, except that the :w (write) option should be used: my $fh = open("path/to/myfile.txt", :w); $fh.say("line to be written to the file"); $fh.close; Be careful with this. If the file already existed, any existing content will be clobbered. So watch out when you want to open a file in write mode. It is also possible to open the file in append mode, using the :a option. New data will then be added after the existing content. Writing to a file can be simplified using the spurt function, which opens the file, writes the data to it, and closes it: spurt "path/to/myfile.txt", "line to be written to the file\n"; Used this way, spurt will clobber any existing content in the file. It may also be used in append mode with the :append option: spurt "path/to/myfile.txt", "line to be added\n", :append; 8.2 Reading Word ListsFor the exercises in this chapter we need a list of English words. There are lots of word lists available on the web, but one of the most suitable for our purpose is one of the word lists collected and contributed to the public domain by Grady Ward as part of the Moby lexicon project (see http://wikipedia.org/wiki/Moby_Project). It is a list of 113,809 official crosswords; that is, words that are considered valid in crossword puzzles and other word games. In the Moby collection, the filename is 113809of.fic; you can download a copy, with the simpler name words.txt, from https://github.com/LaurentRosenfeld/thinkperl6/tree/master/Supplementary/words.txt. This file is in plain text (with each word of the list on its own line), so you can open it with a text editor, but you can also read it from Perl. Let’s do so in the interactive mode (with the REPL): > my $fh = open("words.txt", :r); IO::Handle<words.txt>(opened, at octet 0) > my $line = get $fh; aa > say "<<$line>>"; <<aa>> The get function reads one line from the file handle. The first word in this particular list is “aa” (a kind of lava). Printing the The file handle keeps track of what has been read from the file and what it should read next, so if you call get again, you obtain the next line: > my $line = get $fh; aah The next word is “aah,” which is a perfectly legitimate word, so stop looking at me like that. This is good and fine if we want to explore the first few lines of the words.txt file, but we are not going to read the 113 k-lines of the file this way. We need a loop to do it for us. We could insert the above get instruction into a while loop, but we have seen above an easier and more efficient way of doing it using a for loop and the IO.lines method, without the hassle of having to open or close the file: for 'words.txt'.IO.lines -> $line { say $line; } This code reads the file line by line, and prints each line to the screen. We’ll soon do more interesting things than just displaying the lines on the screen. 8.3 ExercisesThis case study consists mainly of exercises and solutions to them within the body of the chapter because solving the exercises is the main teaching material of this chapter. Therefore, solutions to these exercises are in the following sections of this chapter, not in the appendix. You should at least attempt each one before you read the solutions. Exercise 1
Write a program that reads words.txt and prints only the
words with more than 20 characters.
Exercise 2 In 1939 Ernest Vincent Wright published a 50,000-word novel called Gadsby that does not contain the letter “e”. Since “e” is the most common letter in English, that’s not easy to do. In fact, it is difficult to construct a solitary thought without using that most common letter. Such a writing in which one letter is avoided is sometimes called a lipogram. Write a subroutine called Modify your program from the previous exercise to print only the words that have no “e” and compute the percentage of the words in the list that have no “e”. (The word dictionary we are using, words.txt, is entirely in lower-case letters, so you don’t need to worry about any uppercase “E.”) Exercise 3 Write a subroutine named avoids that takes a word and a string of forbidden letters, and that returns True if the word doesn’t use any of the forbidden letters. Next, modify your program to prompt the user to enter a string of forbidden letters and then print the number of words that don’t contain any of them. Can you find a combination of five forbidden letters that excludes the smallest number of words? Exercise 4 Write a subroutine named Exercise 5 Write a subroutine named Exercise 6
Write a function called 8.4 SearchMost of the exercises in the previous section have something in common; they can be solved with the search pattern (and the index function we saw in Section ?? (p. ??). Most can also be solved using regexes. 8.4.1 Words Longer Than 20 Characters (Solution)The solution to the simplest exercise —printing all the words of words.txt that are longer than 20 characters— is: for 'words.txt'.IO.lines -> $line { say $line if $line.chars > 20 } Because the code is so simple, this is a typical example of a possible and useful one-liner (as described in Section ??). Assuming you want to know the words that are longer than 20 characters, you don’t even need to write a script, save it, and run it. You can simply type this at your operating system prompt: $ perl6 -n -e '$_.say if $_.chars > 20;' words.txt The “-e” option tells Perl that the script to be run
comes next on the command line between quotation marks. The
“-n” asks Perl to read line by line the file after the
end of the command line, to store each line in the To simplify a bit, the two options $ perl6 -ne '.say if .chars > 20' words.txt Remember that, if you’re trying this under Windows, you need to replace the single quotes with double quotes (and vice-versa if the scripts itself contains double quotes): C:\Users\Laurent>perl6 -ne ".say if .chars > 20" words.txt 8.4.2 Words with No “e” (Solution)A subroutine that returns True for words that have no “e” (Exercise ??) is also quite straight forward: sub has-no-e (Str $word) { return True unless defined index $word, "e"; return False; } The subroutine simply returns True if the index function did not find any “e” in the word received as a parameter and False otherwise. Note that this works correctly because the word list used words.txt) is entirely in lowercase letters. The above subroutine would need to be modified if it might be called with words containing uppercase letters. Since the defined function returns a Boolean value, we could shorten our subroutine to this:: sub has-no-e (Str $word) { not defined index $word, "e"; } We could also have used a regex for testing the presence of a “e” on the second line of this subroutine: return True unless $word ~~ /e/; This fairly concise syntax is appealing, but when looking for an exact literal match, the index function is likely to be slightly more efficient (faster) than a regex. Looking for the words without “e” in our word list and counting them is not very difficult: sub has-no-e (Str $word) { not defined index $word, "e"; } my $total-count = 0; my $count-no-e = 0; for 'words.txt'.IO.lines -> $line { $total-count++; if has-no-e $line { $count-no-e++; say $line; } } say "=" x 24; say "Total word count: $total-count"; say "Words without 'e': $count-no-e"; printf "Percentage of words without 'e': %.2f %%\n", 100 * $count-no-e / $total-count; The above program will display the following at the end of its output: ======================== Total word count: 113809 Words without 'e': 37641 Percentage of words without 'e': 33.07 % So, less than one third of the words of our list have no “e.” 8.4.3 Avoiding Other Letters (Solution)
The avoids subroutine is a more general version of
sub avoids (Str $word, Str $forbidden) { for 0..$forbidden.chars - 1 -> $idx { my $letter = substr $forbidden, $idx, 1; return False if defined index $word, $letter; } True; } We can return False as soon as we find a forbidden letter; if we get to the end of the loop, we return True. Since a subroutine returns the last evaluated expression, we don’t need to explicitly use a return True statement in the last code line above. I used this feature here as an example; you might find it clearer to explicitly return values, except perhaps for very simple one-line subroutines. Note that we have here implicitly two nested loops. We could reverse the outer and the inner loops: sub avoids (Str $word, Str $forbidden) { for 0..$words.chars - 1 -> $idx { my $letter = substr $words, $idx, 1; return False if defined index $forbidden, $letter; } True; } The main code calling the above subroutine is similar to the code calling has-no-e and might look like this: my $total-count = 0; my $count-no-forbidden = 0; for 'words.txt'.IO.lines -> $line { $total-count++; $count-no-forbidden++ if avoids $line, "eiou"; } say "=" x 24; say "Total word count: $total-count"; say "Words without forbidden: $count-no-forbidden"; printf "Percentage of words with no forbidden letter: %.2f %%\n", 100 * $count-no-forbidden / $total-count; 8.4.4 Using Only Some Letters (Solution)
sub uses-only (Str $word, Str $available) { for 0..$word.chars - 1 -> $idx { my $letter = substr $word, $idx, 1; return False unless defined index $available, $letter; } True; } Instead of a list of forbidden letters, we have a list of available letters. If we find a letter in word that is not in available, we can return False. And we return True if we reach the loop end. 8.4.5 Using All Letters of a List (Solution)
sub uses-all(Str $word, Str $required) { for 0..$required.chars - 1 -> $idx { my $letter = substr $word, $idx, 1; return False unless defined index $word, $letter; } return True; } Instead of traversing the letters in $word, the loop traverses the required letters. If any of the required letters do not appear in the word, we can return False. If you were really thinking like a computer scientist, you would
have recognized that sub uses-all ($word, $required) { return uses-only $required, $word; } This is an example of a program development plan called reduction to a previously solved problem, which means that you recognize the problem you are working on as an instance of a solved problem and apply an existing solution. 8.4.6 Alphabetic Order (Solution)For sub is_abecedarian ($word) { for 1..$word.chars - 1 -> $idx { my $curr-letter = substr $word, $idx, 1; return False if $curr-letter lt substr $word, $idx - 1, 1; } return True } An alternative is to use recursion: sub is-abecedarian (Str $word) { return True if $word.chars <= 1; return False if substr($word, 0, 1) gt substr($word, 1, 1); return is-abecedarian substr $word, 1; } Another option is to use a while loop: sub is_abecedarian (Str $word): my $i = 0; while $i < $word.chars -1 { if substr($word, $i, 1) gt substr ($word, $i+1, 1) { return False; } $i++; } return True; } The loop starts at $i=0 and ends when i=$word.chars -1. Each time through the loop, it compares the ith character (which you can think of as the current character) to the i+1th character (which you can think of as the next). If the next character is less than (alphabetically before) the current one, then we have discovered a break in the abecedarian trend, and we return False. If we get to the end of the loop without finding a fault, then the
word passes the test. To convince yourself that the loop ends
correctly, consider an example like 8.4.7 Another Example of Reduction to a Previously Solved Problem
Here is a version of sub one-char (Str $string, $idx) { return substr $string, $idx, 1; } sub is-palindrome (Str $word) { my $i = 0; my $j = $word.chars - 1; while $i < $j { return False if one-char($word, $i) ne one-char($word, $j); $i++; $j--; } return True; } Or we could reduce to a previously solved problem and write: sub is-palindrome (Str $word) { return is-reverse($word, $word) } using 8.5 DebuggingTesting programs is hard. The functions in this chapter are relatively easy to test because you can check the results by hand. Even so, it is somewhere between difficult and impossible to choose a set of words that test for all possible errors. Taking Within each case, there are some less obvious subcases. Among the words that have an “e”, you should test words with an “e” at the beginning, the end, and somewhere in the middle. You should test long words, short words, and very short words, like an empty string. The empty string is an example of a special case, which is one of the nonobvious cases where errors often lurk. In addition to the test cases you generate, you can also test your program with a word list like words.txt. By scanning the output, you might be able to catch errors, but be careful: you might catch one kind of error (words that should not be included, but are) and not another (words that should be included, but aren’t). In general, testing can help you find bugs, but it is not easy to generate a good set of test cases, and even if you do, you can’t be sure your program is correct. According to a legendary computer scientist: Program testing can be used to show the presence of bugs, but never to show their absence! 8.6 Glossary
8.7 ExercisesExercise 7
This question is based on a Puzzler that was broadcast on the radio program Car Talk (http://www.cartalk.com/content/puzzlers): Give me a word with three consecutive double letters. I’ll give you a couple of words that almost qualify, but don’t. For example, the word committee, c-o-m-m-i-t-t-e-e. It would be great except for the “i” that sneaks in there. Or Mississippi: M-i-s-s-i-s-s-i-p-p-i. If you could take out those i’s it would work. But there is a word that has three consecutive pairs of letters and to the best of my knowledge this may be the only word. Of course there are probably 500 more but I can only think of one. What is the word? Write a program to find it. Solution: ??. Exercise 8
Here’s another Car Talk
Puzzler (http://www.cartalk.com/content/puzzlers):
“I was driving on the highway the other day and I happened to notice my odometer. Like most odometers, it shows six digits, in whole miles only. So, if my car had 300,000 miles, for example, I’d see 3-0-0-0-0-0. Write a program that tests all the six-digit numbers and prints any numbers that satisfy these requirements. Solution: ??. Exercise 9
Here’s another Car Talk Puzzler you can solve with a
search (http://www.cartalk.com/content/puzzlers):
Recently I had a visit with my mom and we realized that the two digits that make up my age when reversed resulted in her age. For example, if she’s 73, I’m 37. We wondered how often this has happened over the years but we got sidetracked with other topics and we never came up with an answer. Write a Perl program that searches for solutions to this Puzzler. Hint: you might find the string formatting method sprintf useful. Solution: ??. |
Are you using one of our books in a class?We'd like to know about it. Please consider filling out this short survey.
|