This HTML version of Think Data Structures is provided for convenience, but it is not the best format of the book. In particular, some of the symbols are not rendered correctly.

You might prefer to read the PDF version.

Or you can buy this book on Amazon.com.

Chapter 17 Sorting

Computer science departments have an unhealthy obsession with sort algorithms. Based on the amount of time CS students spend on the topic, you would think that choosing sort algorithms is the cornerstone of modern software engineering. Of course, the reality is that software developers can go years, or entire careers, without thinking about how sorting works. For almost all applications, they use whatever general-purpose algorithm is provided by the language or libraries they use. And usually that’s just fine.

So if you skip this chapter and learn nothing about sort algorithms, you can still be an excellent developer. But there are a few reasons you might want to do it anyway:

Although there are general-purpose algorithms that work well for the vast majority of applications, there are two special-purpose algorithms you might need to know about: radix sort and bounded heap sort.
One sort algorithm, merge sort, makes an excellent teaching example because it demonstrates an important and useful strategy for algorithm design, called “divide-conquer-glue”. Also, when we analyze its performance, you will learn about an order of growth we have not seen before, linearithmic. Finally, some of the most widely-used algorithms are hybrids that include elements of merge sort.
One other reason to learn about sort algorithms is that technical interviewers love to ask about them. If you want to get hired, it helps if you can demonstrate CS cultural literacy.

So, in this chapter we’ll analyze insertion sort, you will implement merge sort, I’ll tell you about radix sort, and you will write a simple version of a bounded heap sort.

17.1 Insertion sort

We’ll start with insertion sort, mostly because it is simple to describe and implement. It is not very efficient, but it has some redeeming qualities, as we’ll see.

Rather than explain the algorithm here, I suggest you read the insertion sort Wikipedia page at http://thinkdast.com/insertsort, which includes pseudocode and animated examples. Come back when you get the general idea.

Here’s an implementation of insertion sort in Java:

public class ListSorter<T> {

    public void insertionSort(List<T> list, Comparator<T> comparator) {

        for (int i=1; i < list.size(); i++) {
            T elt_i = list.get(i);
            int j = i;
            while (j > 0) {
                T elt_j = list.get(j-1);
                if (comparator.compare(elt_i, elt_j) >= 0) {
                    break;
                }
                list.set(j, elt_j);
                j--;
            }
            list.set(j, elt_i);
        }
    }
}

I define a class, ListSorter, as a container for sort algorithms. By using the type parameter, T, we can write methods that work on lists containing any object type.

insertionSort takes two parameters, a List of any kind and a Comparator that knows how to compare type T objects. It sorts the list “in place”, which means it modifies the existing list and does not have to allocate any new space.

The following example shows how to call this method with a List of Integer objects:

        List<Integer> list = new ArrayList<Integer>(
            Arrays.asList(3, 5, 1, 4, 2));

        Comparator<Integer> comparator = new Comparator<Integer>() {
            @Override
            public int compare(Integer elt1, Integer elt2) {
                return elt1.compareTo(elt2);
            }
        };

        ListSorter<Integer> sorter = new ListSorter<Integer>();
        sorter.insertionSort(list, comparator);
        System.out.println(list);

insertionSort has two nested loops, so you might guess that its run time is quadratic. In this case, that turns out to be correct, but before you jump to that conclusion, you have to check that the number of times each loop runs is proportional to n, the size of the array.

The outer loop iterates from 1 to list.size(), so it is linear in the size of the list, n. The inner loop iterates from i to 0, so it is also linear in n. Therefore, the total number of times the inner loop runs is quadratic.

If you are not sure about that, here’s the argument:

The first time through, i=1 and the inner loop runs at most once.
The second time, i=2 and the inner loop runs at most twice.
The last time, i=n−1 and the inner loop runs at most n−1 times.

So the total number of times the inner loop runs is the sum of the series 1, 2, … , n−1, which is n (n−1) / 2. And the leading term of that expression (the one with the highest exponent) is n².

In the worst case, insertion sort is quadratic. However:

If the elements are already sorted, or nearly so, insertion sort is linear. Specifically, if each element is no more than k locations away from where it should be, the inner loop never runs more than k times, and the total run time is O(kn).
Because the implementation is simple, the overhead is low; that is, although the run time is a n², the coefficient of the leading term, a, is probably small.

So if we know that the array is nearly sorted, or is not very big, insertion sort might be a good choice. But for large arrays, we can do better. In fact, much better.

17.2 Exercise 14

Merge sort is one of several algorithms whose run time is better than quadratic. Again, rather than explaining the algorithm here, I suggest you read about it on Wikipedia at http://thinkdast.com/mergesort. Once you get the idea, come back and you can test your understanding by writing an implementation.

In the repository for this book, you’ll find the source files for this exercise:

ListSorter.java
ListSorterTest.java

Run ant build to compile the source files, then run ant ListSorterTest. As usual, it should fail, because you have work to do.

In ListSorter.java, I’ve provided an outline of two methods, mergeSortInPlace and mergeSort:

    public void mergeSortInPlace(List<T> list, Comparator<T> comparator) {
        List<T> sorted = mergeSortHelper(list, comparator);
        list.clear();
        list.addAll(sorted);
    }

    private List<T> mergeSort(List<T> list, Comparator<T> comparator) {
       // TODO: fill this in!
       return null;
    }

These two methods do the same thing but provide different interfaces. mergeSort takes a list and returns a new list with the same elements sorted in ascending order. mergeSortInPlace is a void method that modifies an existing list.

Your job is to fill in mergeSort. Before you write a fully recursive version of merge sort, start with something like this:

Split the list in half.
Sort the halves using Collections.sort or insertionSort.
Merge the sorted halves into a complete sorted list.

This will give you a chance to debug the merge code without dealing with the complexity of a recursive method.

Next, add a base case (see http://thinkdast.com/basecase). If you are given a list with only one element, you could return it immediately, since it is already sorted, sort of. Or if the length of the list is below some threshold, you could sort it using Collections.sort or insertionSort. Test the base case before you proceed.

Finally, modify your solution so it makes two recursive calls to sort the halves of the array. When you get it working, testMergeSort and testMergeSortInPlace should pass.

17.3 Analysis of merge sort

To classify the run time of merge sort, it helps to think in terms of levels of recursion and how much work is done on each level. Suppose we start with a list that contains n elements. Here are the steps of the algorithm:

Make two new arrays and copy half of the elements into each.
Sort the two halves.
Merge the halves.

Figure 17.1 shows these steps.

Figure 17.1: Representation of merge sort showing one level of recursion.

The first step copies each of the elements once, so it is linear. The third step also copies each element once, so it is also linear. Now we need to figure out the complexity of step 2. To do that, it helps to looks at a different picture of the computation, which shows the levels of recursion, as in Figure 17.2.

Figure 17.2: Representation of merge sort showing all levels of recursion.

At the top level, we have 1 list with n elements. For simplicity, let’s assume n is a power of 2. At the next level there are 2 lists with n/2 elements. Then 4 lists with n/4 elements, and so on until we get to n lists with 1 element.

On every level we have a total of n elements. On the way down, we have to split the arrays in half, which takes time proportional to n on every level. On the way back up, we have to merge a total of n elements, which is also linear.

If the number of levels is h, the total amount of work for the algorithm is O(nh). So how many levels are there? There are two ways to think about that:

How many times do we have to cut n in half to get to 1?
Or, how many times do we have to double 1 before we get to n?

Another way to ask the second question is “What power of 2 is n?”

2^h = n

Taking the log₂ of both sides yields

h = log₂ n

So the total time is O(n logn). I didn’t bother to write the base of the logarithm because logarithms with different bases differ by a constant factor, so all logarithms are in the same order of growth.

Algorithms in O(n logn) are sometimes called “linearithmic”, but most people just say “n log n”.

It turns out that O(n logn) is the theoretical lower bound for sort algorithms that work by comparing elements to each other. That means there is no “comparison sort” whose order of growth is better than n logn. See http://thinkdast.com/compsort.

But as we’ll see in the next section, there are non-comparison sorts that take linear time!

17.4 Radix sort

During the 2008 United States Presidential Campaign, candidate Barack Obama was asked to perform an impromptu algorithm analysis when he visited Google. Chief executive Eric Schmidt jokingly asked him for “the most efficient way to sort a million 32-bit integers.” Obama had apparently been tipped off, because he quickly replied, “I think the bubble sort would be the wrong way to go.” You can watch the video at http://thinkdast.com/obama.

Obama was right: bubble sort is conceptually simple but its run time is quadratic; and even among quadratic sort algorithms, its performance is not very good. See http://thinkdast.com/bubble.

The answer Schmidt was probably looking for is “radix sort”, which is a non-comparison sort algorithm that works if the size of the elements is bounded, like a 32-bit integer or a 20-character string.

To see how this works, imagine you have a stack of index cards where each card contains a three-letter word. Here’s how you could sort the cards:

Make one pass through the cards and divide them into buckets based on the first letter. So words starting with a should be in one bucket, followed by words starting with b, and so on.
Divide each bucket again based on the second letter. So words starting with aa should be together, followed by words starting with ab, and so on. Of course, not all buckets will be full, but that’s OK.
Divide each bucket again based on the third letter.

At this point each bucket contains one element, and the buckets are sorted in ascending order. Figure 17.3 shows an example with three-letter words.

Figure 17.3: Example of radix sort with three-letter words.

The top row shows the unsorted words. The second row shows what the buckets look like after the first pass. The words in each bucket begin with the same letter.

After the second pass, the words in each bucket begin with the same two letters. After the third pass, there can be only one word in each bucket, and the buckets are in order.

During each pass, we iterate through the elements and add them to buckets. As long as the buckets allow addition in constant time, each pass is linear.

The number of passes, which I’ll call w, depends on the “width” of the words, but it doesn’t depend on the number of words, n. So the order of growth is O(wn), which is linear in n.

There are many variations on radix sort, and many ways to implement each one. You can read more about them at http://thinkdast.com/radix. As an optional exercise, consider writing a version of radix sort.

17.5 Heap sort

In addition to radix sort, which applies when the things you want to sort are bounded in size, there is one other special-purpose sorting algorithm you might encounter: bounded heap sort. Bounded heap sort is useful if you are working with a very large dataset and you want to report the “Top 10” or “Top k” for some value of k much smaller than n.

For example, suppose you are monitoring a Web service that handles a billion transactions per day. At the end of each day, you want to report the k biggest transactions (or slowest, or any other superlative). One option is to store all transactions, sort them at the end of the day, and select the top k. That would take time proportional to n logn, and it would be very slow because we probably can’t fit a billion transactions in the memory of a single program. We would have to use an “out of core” sort algorithm. You can read about external sorting at http://thinkdast.com/extsort.

Using a bounded heap, we can do much better! Here’s how we will proceed:

I’ll explain (unbounded) heap sort.
You’ll implement it.
I’ll explain bounded heap sort and analyze it.

To understand heap sort, you have to understand a heap, which is a data structure similar to a binary search tree (BST). Here are the differences:

In a BST, every node, x, has the “BST property”: all nodes in the left subtree of x are less than x and all nodes in the right subtree are greater than x.
In a heap, every node, x, has the “heap property”: all nodes in both subtrees of x are greater than x.
Heaps are like balanced BSTs; when you add or remove elements, they do some extra work to rebalance the tree. As a result, they can be implemented efficiently using an array of elements.

The smallest element in a heap is always at the root, so we can find it in constant time. Adding and removing elements from a heap takes time proportional to the height of the tree h. And because the heap is always balanced, h is proportional to logn. You can read more about heaps at http://thinkdast.com/heap.

The Java PriorityQueue is implemented using a heap. PriorityQueue provides the methods specified in the Queue interface, including offer and poll:

offer: Adds an element to the queue, updating the heap so that every node has the “heap property”. Takes logn time.
poll: Removes the smallest element in the queue from the root and updates the heap. Takes logn time.

Given a PriorityQueue, you can easily sort of a collection of n elements like this:

Add all elements of the collection to a PriorityQueue using offer.
Remove the elements from the queue using poll and add them to a List.

Because poll returns the smallest element remaining in the queue, the elements are added to the List in ascending order. This way of sorting is called heap sort (see http://thinkdast.com/heapsort).

Adding n elements to the queue takes n logn time. So does removing n elements. So the run time for heap sort is O(n logn).

In the repository for this book, in ListSorter.java you’ll find the outline of a method called heapSort. Fill it in and then run ant ListSorterTest to confirm that it works.

17.6 Bounded heap

A bounded heap is a heap that is limited to contain at most k elements. If you have n elements, you can keep track of the k largest elements like this:

Initially, the heap is empty. For each element, x:

Branch 1: If the heap is not full, add x to the heap.
Branch 2: If the heap is full, compare x to the smallest element in the heap. If x is smaller, it cannot be one of the largest k elements, so you can discard it.
Branch 3: If the heap is full and x is greater than the smallest element in the heap, remove the smallest element from the heap and add x.

Using a heap with the smallest element at the top, we can keep track of the largest k elements. Let’s analyze the performance of this algorithm. For each element, we perform one of:

Branch 1: Adding an element to the heap is O(logk).
Branch 2: Finding the smallest element in the heap is O(1).
Branch 3: Removing the smallest element is O(logk). Adding x is also O(logk).

In the worst case, if the elements appear in ascending order, we always run Branch 3. In that case, the total time to process n elements is O(n logk), which is linear in n.

In ListSorter.java you’ll find the outline of a method called topK that takes a List, a Comparator, and an integer k. It should return the k largest elements in the List in ascending order. Fill it in and then run ant ListSorterTest to confirm that it works.

17.7 Space complexity

Until now we have talked a lot about run time analysis, but for many algorithms we are also concerned about space. For example, one of the drawbacks of merge sort is that it makes copies of the data. In our implementation, the total amount of space it allocates is O(n logn). With a more clever implementation, you can get the space requirement down to O(n).

In contrast, insertion sort doesn’t copy the data because it sorts the elements in place. It uses temporary variables to compare two elements at a time, and it uses a few other local variables. But its space use doesn’t depend on n.

Our implementation of heap sort creates a new PriorityQueue to store the elements, so the space is O(n); but if you are allowed to sort the list in place, you can run heap sort with O(1) space.

One of the benefits of the bounded heap algorithm you just implemented is that it only needs space proportional to k (the number of elements we want to keep), and k is often much smaller than n.

Software developers tend to pay more attention to run time than space, and for many applications, that’s appropriate. But for large datasets, space can be just as important or more so. For example:

If a dataset doesn’t fit into the memory of one program, the run time often increases dramatically, or it might not run at all. If you choose an algorithm that needs less space, and that makes it possible to fit the computation into memory, it might run much faster. In the same vein, a program that uses less space might make better use of CPU caches and run faster (see http://thinkdast.com/cache).
On a server that runs many programs at the same time, if you can reduce the space needed for each program, you might be able to run more programs on the same server, which reduces hardware and energy costs.

So those are some reasons you should know at least a little bit about the space needs of algorithms.

Are you using one of our books in a class?

We'd like to know about it. Please consider filling out this short survey.

Think Data Structures