This HTML version of Think Data Structures is provided for convenience, but it is not the best format of the book. In particular, some of the symbols are not rendered correctly.

You might prefer to read the PDF version.

Or you can buy this book on Amazon.com.

Chapter 2 Analysis of Algorithms

As we saw in the previous chapter, Java provides two implementations of the List interface, ArrayList and LinkedList. For some applications LinkedList is faster; for other applications ArrayList is faster.

To decide which one is better for a particular application, one approach is to try them both and see how long they take. This approach, which is called “profiling”, has a few problems:

Before you can compare the algorithms, you have to implement them both.
The results might depend on what kind of computer you use. One algorithm might be better on one machine; the other might be better on a different machine.
The results might depend on the size of the problem or the data provided as input.

We can address some of these problems using analysis of algorithms. When it works, algorithm analysis makes it possible to compare algorithms without having to implement them. But we have to make some assumptions:

To avoid dealing with the details of computer hardware, we usually identify the basic operations that make up an algorithm — like addition, multiplication, and comparison of numbers — and count the number of operations each algorithm requires.
To avoid dealing with the details of the input data, the best option is to analyze the average performance for the inputs we expect. If that’s not possible, a common alternative is to analyze the worst case scenario.
Finally, we have to deal with the possibility that one algorithm works best for small problems and another for big ones. In that case, we usually focus on the big ones, because for small problems the difference probably doesn’t matter, but for big problems the difference can be huge.

This kind of analysis lends itself to simple classification of algorithms. For example, if we know that the run time of Algorithm A tends to be proportional to the size of the input, n, and Algorithm B tends to be proportional to n², we expect A to be faster than B, at least for large values of n.

Most simple algorithms fall into just a few categories.

Constant time: An algorithm is “constant time” if the run time does not depend on the size of the input. For example, if you have an array of n elements and you use the bracket operator ([]) to access one of the elements, this operation takes the same number of operations regardless of how big the array is.
Linear: An algorithm is “linear” if the run time is proportional to the size of the input. For example, if you add up the elements of an array, you have to access n elements and perform n−1 additions. The total number of operations (element accesses and additions) is 2n−1, which is proportional to n.
Quadratic: An algorithm is “quadratic” if the run time is proportional to n². For example, suppose you want to check whether any element in a list appears more than once. A simple algorithm is to compare each element to all of the others. If there are n elements and each is compared to n−1 others, the total number of comparisons is n² −n, which is proportional to n² as n grows.

2.1 Selection sort

For example, here’s an implementation of a simple algorithm called selection sort (see http://thinkdast.com/selectsort):

public class SelectionSort {

    /**
     * Swaps the elements at indexes i and j.
     */
    public static void swapElements(int[] array, int i, int j) {
        int temp = array[i];
        array[i] = array[j];
        array[j] = temp;
    }

    /**
     * Finds the index of the lowest value
     * starting from the index at start (inclusive)
     * and going to the end of the array.
     */
    public static int indexLowest(int[] array, int start) {
        int lowIndex = start;
        for (int i = start; i < array.length; i++) {
            if (array[i] < array[lowIndex]) {
                lowIndex = i;
            }
        }
        return lowIndex;
    }

    /**
     * Sorts the elements (in place) using selection sort.
     */
    public static void selectionSort(int[] array) {
        for (int i = 0; i < array.length; i++) {
            int j = indexLowest(array, i);
            swapElements(array, i, j);
        }
    }
}

The first method, swapElements, swaps two elements of the array. Reading and writing elements are constant time operations, because if we know the size of the elements and the location of the first, we can compute the location of any other element with one multiplication and one addition, and those are constant time operations. Since everything in swapElements is constant time, the whole method is constant time.

The second method, indexLowest, finds the index of the smallest element of the array starting at a given index, start. Each time through the loop, it accesses two elements of the array and performs one comparison. Since these are all constant time operations, it doesn’t really matter which ones we count. To keep it simple, let’s count the number of comparisons.

If start is 0, indexLowest traverses the entire array, and the total number of comparisons is the length of the array, which I’ll call n.
If start is 1, the number of comparisons is n−1.
In general, the number of comparisons is n - start, so indexLowest is linear.

The third method, selectionSort, sorts the array. It loops from 0 to n−1, so the loop executes n times. Each time, it calls indexLowest and then performs a constant time operation, swapElements.

The first time indexLowest is called, it performs n comparisons. The second time, it performs n−1 comparisons, and so on. The total number of comparisons is

n + n−1 + n−2 + ... + 1 + 0

The sum of this series is n(n+1)/2, which is proportional to n²; and that means that selectionSort is quadratic.

To get to the same result a different way, we can think of indexLowest as a nested loop. Each time we call indexLowest, the number of operations is proportional to n. We call it n times, so the total number of operations is proportional to n².

2.2 Big O notation

All constant time algorithms belong to a set called O(1). So another way to say that an algorithm is constant time is to say that it is in O(1). Similarly, all linear algorithms belong to O(n), and all quadratic algorithms belong to O(n²). This way of classifying algorithms is called “big O notation”.

NOTE: I am providing a casual definition of big O notation. For a more mathematical treatment, see http://thinkdast.com/bigo.

This notation provides a convenient way to write general rules about how algorithms behave when we compose them. For example, if you perform a linear time algorithm followed by a constant algorithm, the total run time is linear. Using ∈ to mean “is a member of”:

If f ∈ O(n) and g ∈ O(1), f+g ∈ O(n).

If you perform two linear operations, the total is still linear:

If f ∈ O(n) and g ∈ O(n), f+g ∈ O(n).

In fact, if you perform a linear operation any number of times, k, the total is linear, as long as k is a constant that does not depend on n.

If f ∈ O(n) and k is a constant, kf ∈ O(n).

But if you perform a linear operation n times, the result is quadratic:

If f ∈ O(n), nf ∈ O(n²).

In general, we only care about the largest exponent of n. So if the total number of operations is 2n + 1, it belongs to O(n). The leading constant, 2, and the additive term, 1, are not important for this kind of analysis. Similarly, n² + 100n + 1000 is in O(n²). Don’t be distracted by the big numbers!

“Order of growth” is another name for the same idea. An order of growth is a set of algorithms whose run times are in the same big O category; for example, all linear algorithms belong to the same order of growth because their run times are in O(n).

In this context, an “order” is a group, like the Order of the Knights of the Round Table, which is a group of knights, not a way of lining them up. So you can imagine the Order of Linear Algorithms as a set of brave, chivalrous, and particularly efficient algorithms.

2.3 Exercise 2

The exercise for this chapter is to implement a List that uses a Java array to store the elements.

In the code repository for this book (see Section 0.1), you’ll find the source files you’ll need:

MyArrayList.java contains a partial implementation of the List interface. Four of the methods are incomplete; your job is to fill them in.
MyArrayListTest.java contains JUnit tests you can use to check your work.

You’ll also find the Ant build file build.xml. From the code directory, you should be able to run ant MyArrayList to run MyArrayList.java, which contains a few simple tests. Or you can run ant MyArrayListTest to run the JUnit test.

When you run the tests, several of them should fail. If you examine the source code, you’ll find four TODO comments indicating the methods you should fill in.

Before you start filling in the missing methods, let’s walk through some of the code. Here are the class definition, instance variables, and constructor.

public class MyArrayList<E> implements List<E> {
    int size;                    // keeps track of the number of elements
    private E[] array;           // stores the elements
    
    public MyArrayList() {
        array = (E[]) new Object[10];
        size = 0;
    }
}

As the comments indicate, size keeps track of how many elements are in MyArrayList, and array is the array that actually contains the elements.

The constructor creates an array of 10 elements, which are initially null, and sets size to 0. Most of the time, the length of the array is bigger than size, so there are unused slots in the array.

One detail about Java: you can’t instantiate an array using a type parameter; for example, the following will not work:

        array = new E[10];

To work around this limitation, you have to instantiate an array of Object and then typecast it. You can read more about this issue at http://thinkdast.com/generics.

Next we’ll look at the method that adds elements to the list:

    public boolean add(E element) {
        if (size >= array.length) {
            // make a bigger array and copy over the elements
            E[] bigger = (E[]) new Object[array.length * 2];
            System.arraycopy(array, 0, bigger, 0, array.length);
            array = bigger;
        } 
        array[size] = element;
        size++;
        return true;
    }

If there are no unused spaces in the array, we have to create a bigger array and copy over the elements. Then we can store the element in the array and increment size.

It might not be obvious why this method returns a boolean, since it seems like it always returns true. As always, you can find the answer in the documentation: http://thinkdast.com/colladd. It’s also not obvious how to analyze the performance of this method. In the normal case, it’s constant time, but if we have to resize the array, it’s linear. I’ll explain how to handle this in Section 3.2.

Finally, let’s look at get; then you can get started on the exercises.

    public T get(int index) {
        if (index < 0 || index >= size) {
            throw new IndexOutOfBoundsException();
        }
        return array[index];
    }

Actually, get is pretty simple: if the index is out of bounds, it throws an exception; otherwise it reads and returns an element of the array. Notice that it checks whether the index is less than size, not array.length, so it’s not possible to access the unused elements of the array.

In MyArrayList.java, you’ll find a stub for set that looks like this:

    public T set(int index, T element) {
        // TODO: fill in this method.
        return null;
    }

Read the documentation of set at http://thinkdast.com/listset, then fill in the body of this method. If you run MyArrayListTest again, testSet should pass.

HINT: Try to avoid repeating the index-checking code.

Your next mission is to fill in indexOf. As usual, you should read the documentation at http://thinkdast.com/listindof so you know what it’s supposed to do. In particular, notice how it is supposed to handle null.

I’ve provided a helper method called equals that compares an element from the array to a target value and returns true if they are equal (and it handles null correctly). Notice that this method is private because it is only used inside this class; it is not part of the List interface.

When you are done, run MyArrayListTest again; testIndexOf should pass now, as well as the other tests that depend on it.

Only two more methods to go, and you’ll be done with this exercise. The next one is an overloaded version of add that takes an index and stores the new value at the given index, shifting the other elements to make room, if necessary.

Again, read the documentation at http://thinkdast.com/listadd, write an implementation, and run the tests for confirmation.

HINT: Avoid repeating the code that makes the array bigger.

Last one: fill in the body of remove. The documentation is at http://thinkdast.com/listrem. When you finish this one, all tests should pass.

Once you have your implementation working, compare it to mine, which you can read at http://thinkdast.com/myarraylist.