Case Study: Word Play

This HTML version of Think Perl 6 is provided for convenience, but it is not the best format of the book. You might prefer to read the PDF version.

Chapter 8 Case Study: Word Play

This chapter is intended to let you practice and consolidate the knowledge you have acquired so far, rather than introducing new concepts. To help you gain experience with programming, we will cover a case study that involves solving word puzzles by searching for words that have certain properties. For example, we’ll find the longest palindromes in English and search for words whose letters appear in alphabetical order. And I will present another program development plan: reduction to a previously solved problem.

8.1 Reading from and Writing to Files

For the exercises in this chapter, we will need our programs to read text from files. In many programming languages, this often means that we need a statement to open a file, then a statement or group of statements to read the file’s content, and finally a statement to close the file (although this last operation may be performed automatically in some circumstances).

We are interested here in text files that are usually made of lines separated by logical new line characters; depending on your operating system, such logical new line characters consist of either one (Linux, Mac) or two (Windows) physical characters (bytes).

The Perl built-in function open takes the path and name of the file as a parameter and returns a file handle (IO::Handle object) which you can use to read the file (or to write to it):

The :r option is the file mode (read). $fh is a common name for a file handle. The file object provides methods for reading, such as slurp-rest which returns the full content of the file from the current position to the end (and the entire content of the file if we’ve just opened it).

However, Perl’s IO role (in simple terms, a role is a collection of related methods) offers simpler methods which can open a file and read it all in one single instruction (i.e., without having to first open a file handle and then close it):

We can also read the file line by line, which is very practical if each line contains a logical entity such as a record, and is especially useful for very large files that might not fit into memory:

By default, the .lines method will remove the trailing newline characters from each line, so that you don’t have to worry about them.

We haven’t studied arrays yet, but you can also read all lines of a file into an array, with each line of the file becoming an array item. For example, you can load the myfile.txt file into the @lines array:

Accessing any line can then be done with the bracket operator and an index. For example, to print the first and the seventh line:

To write data to a file, it is possible to open a file just as when wanting to read from it, except that the :w (write) option should be used:

Be careful with this. If the file already existed, any existing content will be clobbered. So watch out when you want to open a file in write mode.

It is also possible to open the file in append mode, using the :a option. New data will then be added after the existing content.

Writing to a file can be simplified using the spurt function, which opens the file, writes the data to it, and closes it:

Used this way, spurt will clobber any existing content in the file. It may also be used in append mode with the :append option:

8.2 Reading Word Lists

For the exercises in this chapter we need a list of English words. There are lots of word lists available on the web, but one of the most suitable for our purpose is one of the word lists collected and contributed to the public domain by Grady Ward as part of the Moby lexicon project (see http://wikipedia.org/wiki/Moby_Project). It is a list of 113,809 official crosswords; that is, words that are considered valid in crossword puzzles and other word games. In the Moby collection, the filename is 113809of.fic; you can download a copy, with the simpler name words.txt, from https://github.com/LaurentRosenfeld/thinkperl6/tree/master/Supplementary/words.txt.

This file is in plain text (with each word of the list on its own line), so you can open it with a text editor, but you can also read it from Perl. Let’s do so in the interactive mode (with the REPL):

Printing the $line variable between angle brackets within a string shows us that the get function removed implicitly the trailing newline characters, in this case a \r\n (carriage return and newline) character combination, since this file was apparently prepared under Windows.

The file handle keeps track of what has been read from the file and what it should read next, so if you call get again, you obtain the next line:

The next word is “aah,” which is a perfectly legitimate word, so stop looking at me like that.

This is good and fine if we want to explore the first few lines of the words.txt file, but we are not going to read the 113 k-lines of the file this way.

We need a loop to do it for us. We could insert the above get instruction into a while loop, but we have seen above an easier and more efficient way of doing it using a for loop and the IO.lines method, without the hassle of having to open or close the file:

This code reads the file line by line, and prints each line to the screen. We’ll soon do more interesting things than just displaying the lines on the screen.

8.3 Exercises

This case study consists mainly of exercises and solutions to them within the body of the chapter because solving the exercises is the main teaching material of this chapter. Therefore, solutions to these exercises are in the following sections of this chapter, not in the appendix. You should at least attempt each one before you read the solutions.

Exercise 1 Write a program that reads words.txt and prints only the words with more than 20 characters.

Exercise 2

In 1939 Ernest Vincent Wright published a 50,000-word novel called Gadsby that does not contain the letter “e”. Since “e” is the most common letter in English, that’s not easy to do. In fact, it is difficult to construct a solitary thought without using that most common letter. Such a writing in which one letter is avoided is sometimes called a lipogram.

Write a subroutine called has-no-e that returns True if the given word doesn’t have the letter “e” in it.

Modify your program from the previous exercise to print only the words that have no “e” and compute the percentage of the words in the list that have no “e”.

(The word dictionary we are using, words.txt, is entirely in lower-case letters, so you don’t need to worry about any uppercase “E.”)

Exercise 3

Write a subroutine named avoids that takes a word and a string of forbidden letters, and that returns True if the word doesn’t use any of the forbidden letters.

Next, modify your program to prompt the user to enter a string of forbidden letters and then print the number of words that don’t contain any of them. Can you find a combination of five forbidden letters that excludes the smallest number of words?

Exercise 4

Write a subroutine named uses-only that takes a word and a string of letters, and that returns True if the word contains only letters in the list. Can you make a sentence using only the letters acefhlo? Other than “Hoe alfalfa?”

Exercise 5

Write a subroutine named uses-all that takes a word and a string of required letters, and returns True if the word uses all the required letters at least once. How many words are there that use all the vowels aeiou? How about aeiouy?

Exercise 6

Write a function called is_abecedarian that returns True if the letters in a word appear in alphabetical order (double letters are ok). How many abecedarian words are there?

8.4 Search

Most of the exercises in the previous section have something in common; they can be solved with the search pattern (and the index function we saw in Section ?? (p. ??). Most can also be solved using regexes.

8.4.1 Words Longer Than 20 Characters (Solution)

The solution to the simplest exercise —printing all the words of words.txt that are longer than 20 characters— is:

Because the code is so simple, this is a typical example of a possible and useful one-liner (as described in Section ??). Assuming you want to know the words that are longer than 20 characters, you don’t even need to write a script, save it, and run it. You can simply type this at your operating system prompt:

The “-e” option tells Perl that the script to be run comes next on the command line between quotation marks. The “-n” asks Perl to read line by line the file after the end of the command line, to store each line in the $_ topical variable, and to apply the content of the script to each line. And the one-liner script is just printing the content of $_ if its length is greater than 20.

To simplify a bit, the two options -n and -e may be grouped together as perl6 -ne. In addition, the $_ topical variable can be omitted in method calls (in other words, if a method has no invocant, the invocant will be defaulted to $_). Finally, the trailing semi-colon may also be removed. The one-liner above can thus be made somewhat simpler and shorter:

Remember that, if you’re trying this under Windows, you need to replace the single quotes with double quotes (and vice-versa if the scripts itself contains double quotes):

8.4.2 Words with No “e” (Solution)

A subroutine that returns True for words that have no “e” (Exercise ??) is also quite straight forward:

The subroutine simply returns True if the index function did not find any “e” in the word received as a parameter and False otherwise.

Note that this works correctly because the word list used words.txt) is entirely in lowercase letters. The above subroutine would need to be modified if it might be called with words containing uppercase letters.

Since the defined function returns a Boolean value, we could shorten our subroutine to this::

We could also have used a regex for testing the presence of a “e” on the second line of this subroutine:

This fairly concise syntax is appealing, but when looking for an exact literal match, the index function is likely to be slightly more efficient (faster) than a regex.

Looking for the words without “e” in our word list and counting them is not very difficult:

8.4.3 Avoiding Other Letters (Solution)

The avoids subroutine is a more general version of has_no_e, but it has the same structure:

We can return False as soon as we find a forbidden letter; if we get to the end of the loop, we return True. Since a subroutine returns the last evaluated expression, we don’t need to explicitly use a return True statement in the last code line above. I used this feature here as an example; you might find it clearer to explicitly return values, except perhaps for very simple one-line subroutines.

Note that we have here implicitly two nested loops. We could reverse the outer and the inner loops:

The main code calling the above subroutine is similar to the code calling has-no-e and might look like this:

8.4.4 Using Only Some Letters (Solution)

uses-only is similar to our avoids subroutine except that the sense of the condition is reversed:

Instead of a list of forbidden letters, we have a list of available letters. If we find a letter in word that is not in available, we can return False. And we return True if we reach the loop end.

8.4.5 Using All Letters of a List (Solution)

uses-all is similar to the previous two subroutines, except that we reverse the role of the word and the string of letters:

Instead of traversing the letters in $word, the loop traverses the required letters. If any of the required letters do not appear in the word, we can return False.

If you were really thinking like a computer scientist, you would have recognized that uses-all was an instance of a previously solved problem in reverse: if word A uses all letters of word B, then word B uses only letters of word A. So, we can call the uses-only subroutine and write:

This is an example of a program development plan called reduction to a previously solved problem, which means that you recognize the problem you are working on as an instance of a solved problem and apply an existing solution.

8.4.6 Alphabetic Order (Solution)

For is_abecedarian we have to compare adjacent letters. Each time in our for loop, we define one letter as our current letter and compare it with the previous one:

The loop starts at $i=0 and ends when i=$word.chars -1. Each time through the loop, it compares the ith character (which you can think of as the current character) to the i+1th character (which you can think of as the next).

If the next character is less than (alphabetically before) the current one, then we have discovered a break in the abecedarian trend, and we return False.

If we get to the end of the loop without finding a fault, then the word passes the test. To convince yourself that the loop ends correctly, consider an example like 'flossy'. The length of the word is 6, so the last time the loop runs is when $i is 4, which is the index of the second-to-last character. On the last iteration, it compares the second-to-last character (the second “s”) to the last (the “y”), which is what we want.

8.4.7 Another Example of Reduction to a Previously Solved Problem

Here is a version of is_palindrome (see Exercise ??) that uses two indices; one starts at the beginning and goes up, while the other starts at the end and goes down:

using is-reverse from Section ?? (but you should probably choose the corrected version of the is-reversed subroutine given in the appendix: see Subsection ??).

8.5 Debugging

Testing programs is hard. The functions in this chapter are relatively easy to test because you can check the results by hand. Even so, it is somewhere between difficult and impossible to choose a set of words that test for all possible errors.

Taking has_no_e as an example, there are two obvious cases to check: words that have an “e” should return False, and words that don’t should return True. You should have no trouble coming up with one of each.

Within each case, there are some less obvious subcases. Among the words that have an “e”, you should test words with an “e” at the beginning, the end, and somewhere in the middle. You should test long words, short words, and very short words, like an empty string. The empty string is an example of a special case, which is one of the nonobvious cases where errors often lurk.

In addition to the test cases you generate, you can also test your program with a word list like words.txt. By scanning the output, you might be able to catch errors, but be careful: you might catch one kind of error (words that should not be included, but are) and not another (words that should be included, but aren’t).

In general, testing can help you find bugs, but it is not easy to generate a good set of test cases, and even if you do, you can’t be sure your program is correct.

8.6 Glossary

8.7 Exercises

Exercise 7

This question is based on a Puzzler that was broadcast on the radio program Car Talk (http://www.cartalk.com/content/puzzlers):

Give me a word with three consecutive double letters. I’ll give you a couple of words that almost qualify, but don’t. For example, the word committee, c-o-m-m-i-t-t-e-e. It would be great except for the “i” that sneaks in there. Or Mississippi: M-i-s-s-i-s-s-i-p-p-i. If you could take out those i’s it would work. But there is a word that has three consecutive pairs of letters and to the best of my knowledge this may be the only word. Of course there are probably 500 more but I can only think of one. What is the word?

Write a program to find it.

Solution: ??.

Exercise 8 Here’s another Car Talk Puzzler (http://www.cartalk.com/content/puzzlers):

“I was driving on the highway the other day and I happened to notice my odometer. Like most odometers, it shows six digits, in whole miles only. So, if my car had 300,000 miles, for example, I’d see 3-0-0-0-0-0.
“Now, what I saw that day was very interesting. I noticed that the last 4 digits were palindromic; that is, they read the same forward as backward. For example, 5-4-4-5 is a palindrome, so my odometer could have read 3-1-5-4-4-5.
“One mile later, the last 5 numbers were palindromic. For example, it could have read 3-6-5-4-5-6. One mile after that, the middle 4 out of 6 numbers were palindromic. And you ready for this? One mile later, all 6 were palindromic!
“The question is, what was on the odometer when I first looked?”

Write a program that tests all the six-digit numbers and prints any numbers that satisfy these requirements.

Solution: ??.

Exercise 9 Here’s another Car Talk Puzzler you can solve with a search (http://www.cartalk.com/content/puzzlers):

Recently I had a visit with my mom and we realized that the two digits that make up my age when reversed resulted in her age. For example, if she’s 73, I’m 37. We wondered how often this has happened over the years but we got sidetracked with other topics and we never came up with an answer.
When I got home I figured out that the digits of our ages have been reversible six times so far. I also figured out that if we’re lucky it would happen again in a few years, and if we’re really lucky it would happen one more time after that. In other words, it would have happened 8 times over all. So the question is, how old am I now?

Write a Perl program that searches for solutions to this Puzzler. Hint: you might find the string formatting method sprintf useful.

Solution: ??.

Chapter 8 Case Study: Word Play

8.1 Reading from and Writing to Files

8.2 Reading Word Lists

8.3 Exercises

8.4 Search

8.4.1 Words Longer Than 20 Characters (Solution)

8.4.2 Words with No “e” (Solution)

8.4.3 Avoiding Other Letters (Solution)

8.4.4 Using Only Some Letters (Solution)

8.4.5 Using All Letters of a List (Solution)

8.4.6 Alphabetic Order (Solution)

8.4.7 Another Example of Reduction to a Previously Solved Problem

8.5 Debugging

8.6 Glossary

8.7 Exercises

Are you using one of our books in a class?