This HTML version of Think Data Structures is provided for convenience, but it is not the best format of the book. In particular, some of the symbols are not rendered correctly.

You might prefer to read the PDF version.

Or you can buy this book on Amazon.com.

Chapter 4 LinkedList

This chapter presents solutions to the previous exercise and continues the discussion of analysis of algorithms.

4.1 Classifying `MyLinkedList` methods

My implementation of indexOf is below. Read through it and see if you can identify its order of growth before you read the explanation.

    public int indexOf(Object target) {
        Node node = head;
        for (int i=0; i<size; i++) {
            if (equals(target, node.data)) {
                return i;
            }
            node = node.next;
        }
        return -1;
    }

Initially node gets a copy of head, so they both refer to the same Node. The loop variable, i, counts from 0 to size-1. Each time through the loop, we use equals to see if we’ve found the target. If so, we return i immediately. Otherwise we advance to the next Node in the list.

Normally we would check to make sure the next Node is not null, but in this case it is safe because the loop ends when we get to the end of the list (assuming size is consistent with the actual number of nodes in the list).

If we get through the loop without finding the target, we return -1.

So what is the order of growth for this method?

Each time through the loop we invoke equals, which is constant time (it might depend on the size of target or data, but it doesn’t depend on the size of the list). The other operations in the loop are also constant time.
The loop might run n times, because in the worse case, we might have to traverse the whole list.

So the run time of this method is proportional to the length of the list.

Next, here is my implementation of the two-parameter add method. Again, you should try to classify it before you read the explanation.

    public void add(int index, E element) {
        if (index == 0) {
            head = new Node(element, head);
        } else {
            Node node = getNode(index-1);
            node.next = new Node(element, node.next);
        }
        size++;
    }

If index==0, we’re adding the new Node at the beginning, so we handle that as a special case. Otherwise, we have to traverse the list to find the element at index-1. We use the helper method getNode:

    private Node getNode(int index) {
        if (index < 0 || index >= size) {
            throw new IndexOutOfBoundsException();
        }
        Node node = head;
        for (int i=0; i<index; i++) {
            node = node.next;
        }
        return node;
    }

getNode checks whether index is out of bounds; if so, it throws an exception. Otherwise it traverses the list and returns the requested Node.

Jumping back to add, once we find the right Node, we create the new Node and put it between node and node.next. You might find it helpful to draw a diagram of this operation to make sure you understand it.

So, what’s the order of growth for add?

getNode is similar to indexOf, and it is linear for the same reason.
In add, everything before and after getNode is constant time.

So all together, add is linear.

Finally, let’s look at remove:

    public E remove(int index) {
        E element = get(index);
        if (index == 0) {
            head = head.next;
        } else {
            Node node = getNode(index-1);
            node.next = node.next.next;
        }
        size--;
        return element;
    }

remove uses get to find and store the element at index. Then it removes the Node that contained it.

If index==0, we handle that as a special case again. Otherwise we find the node at index-1 and modify it to skip over node.next and link directly to node.next.next. This effectively removes node.next from the list, and it can be garbage collected.

Finally, we decrement size and return the element we retrieved at the beginning.

So, what’s the order of growth for remove? Everything in remove is constant time except get and getNode, which are linear. So remove is linear.

When people see two linear operations, they sometimes think the result is quadratic, but that only applies if one operation is nested inside the other. If you invoke one operation after the other, the run times add. If they are both in O(n), the sum is also in O(n).

4.2 Comparing `MyArrayList` and `MyLinkedList`

The following table summarizes the differences between MyLinkedList and MyArrayList, where 1 means O(1) or constant time and n means O(n) or linear.


	MyArrayList	MyLinkedList

add (at the end)	1	n
add (at the beginning)	n	1
add (in general)	n	n
get / set	1	n
indexOf / lastIndexOf	n	n
isEmpty / size	1	1
remove (from the end)	1	n
remove (from the beginning)	n	1
remove (in general)	n	n

The operations where MyArrayList has an advantage are adding at the end, removing from the end, getting and setting.

The operations where MyLinkedList has an advantage are adding at the beginning and removing from the beginning.

For the other operations, the two implementations are in the same order of growth.

Which implementation is better? It depends on which operations you are likely to use the most. And that’s why Java provides more than one implementation, because it depends.

4.3 Profiling

For the next exercise I provide a class called Profiler that contains code that runs a method with a range of problem sizes, measures run times, and plots the results.

You will use Profiler to classify the performance of the add method for the Java implementations of ArrayList and LinkedList.

Here’s an example that shows how to use the profiler:

    public static void profileArrayListAddEnd() {
        Timeable timeable = new Timeable() {
            List<String> list;

            public void setup(int n) {
                list = new ArrayList<String>();
            }

            public void timeMe(int n) {
                for (int i=0; i<n; i++) {
                    list.add("a string");
                }
            }
        };

        String title = "ArrayList add end";
        Profiler profiler = new Profiler(title, timeable);

        int startN = 4000;
        int endMillis = 1000;
        XYSeries series = profiler.timingLoop(startN, endMillis);
        profiler.plotResults(series);
    }

This method measures the time it takes to run add on an ArrayList, which adds the new element at the end. I’ll explain the code and then show the results.

In order to use Profiler, we need to create a Timeable object that provides two methods: setup and timeMe. The setup method does whatever needs to be done before we start the clock; in this case it creates an empty list. Then timeMe does whatever operation we are trying to measure; in this case it adds n elements to the list.

The code that creates timeable is an anonymous class that defines a new implementation of the Timeable interface and creates an instance of the new class at the same time. If you are not familiar with anonymous classes, you can read about them here: http://thinkdast.com/anonclass.

But you don’t need to know much for the next exercise; even if you are not comfortable with anonymous classes, you can copy and modify the example code.

The next step is to create the Profiler object, passing the Timeable object and a title as parameters.

The Profiler provides timingLoop which uses the Timeable object stored as an instance variable. It invokes the timeMe method on the Timeable object several times with a range of values of n. timingLoop takes two parameters:

startN is the value of n the timing loop should start at.
endMillis is a threshold in milliseconds. As timingLoop increases the problem size, the run time increases; when the run time exceeds this threshold, timingLoop stops.

When you run the experiments, you might have to adjust these parameters. If startN is too low, the run time might be too short to measure accurately. If endMillis is too low, you might not get enough data to see a clear relationship between problem size and run time.

This code is in ProfileListAdd.java, which you’ll run in the next exercise. When I ran it, I got this output:

4000, 3
8000, 0
16000, 1
32000, 2
64000, 3
128000, 6
256000, 18
512000, 30
1024000, 88
2048000, 185
4096000, 242
8192000, 544
16384000, 1325

The first column is problem size, n; the second column is run time in milliseconds. The first few measurements are pretty noisy; it might have been better to set startN around 64000.

The result from timingLoop is an XYSeries that contains this data. If you pass this series to plotResults, it generates a plot like the one in Figure 4.1.

Figure 4.1: Profiling results: run time versus problem size for adding n elements to the end of an ArrayList.

The next section explains how to interpret it.

4.4 Interpreting results

Based on our understanding of how ArrayList works, we expect the add method to take constant time when we add elements to the end. So the total time to add n elements should be linear.

To test that theory, we could plot total run time versus problem size, and we should see a straight line, at least for problem sizes that are big enough to measure accurately. Mathematically, we can write the function for that line:

runtime = a + b n

where a is the intercept of the line and b is the slope.

On the other hand, if add is linear, the total time for n adds would be quadratic. If we plot run time versus problem size, we expect to see a parabola. Or mathematically, something like:

runtime = a + b n + c n²

With perfect data, we might be able to tell the difference between a straight line and a parabola, but if the measurements are noisy, it can be hard to tell. A better way to interpret noisy measurements is to plot run time and problem size on a log-log scale.

Why? Let’s suppose that run time is proportional to n^k, but we don’t know what the exponent k is. We can write the relationship like this:

runtime = a + b n + … + c n^k

For large values of n, the term with the largest exponent is the most important, so:

runtime ≈ c n^k

where ≈ means “approximately equal”. Now, if we take the logarithm of both sides of this equation:

log(runtime) ≈ log(c) + k log(n)

This equation implies that if we plot runtime versus n on a log-log scale, we expect to see a straight line with intercept log(c) and slope k. We don’t care much about the intercept, but the slope indicates the order of growth: if k=1, the algorithm is linear; if k=2, it’s quadratic.

Looking at the figure in the previous section, you can estimate the slope by eye. But when you call plotResults it computes a least squares fit to the data and prints the estimated slope. In this example:

Estimated slope = 1.06194352346708

which is close to 1; and that suggests that the total time for n adds is linear, so each add is constant time, as expected.

One important point: if you see a straight line on a graph like this, that does not mean that the algorithm is linear. If the run time is proportional to n^k for any exponent k, we expect to see a straight line with slope k. If the slope is close to 1, that suggests the algorithm is linear. If it is close to 2, it’s probably quadratic.

4.5 Exercise 4

In the repository for this book you’ll find the source files you need for this exercise:

Profiler.java contains the implementation of the Profiler class described above. You will use this class, but you don’t have to know how it works. But feel free to read the source.
ProfileListAdd.java contains starter code for this exercise, including the example, above, which profiles ArrayList.add. You will modify this file to profile a few other methods.

Also, in the code directory , you’ll find the Ant build file build.xml.

Run ant ProfileListAdd to run ProfileListAdd.java. You should get results similar to Figure 4.1, but you might have to adjust startN or endMillis. The estimated slope should be close to 1, indicating that performing n add operations takes time proportional to n raised to the exponent 1; that is, it is in O(n).

In ProfileListAdd.java, you’ll find an empty method named profileArrayListAddBeginning. Fill in the body of this method with code that tests ArrayList.add, always putting the new element at the beginning. If you start with a copy of profileArrayListAddEnd, you should only have to make a few changes. Add a line in main to invoke this method.

Run ant ProfileListAdd again and interpret the results. Based on our understanding of how ArrayList works, we expect each add operation to be linear, so the total time for n adds should be quadratic. If so, the estimated slope of the line, on a log-log scale, should be near 2. Is it?

Now let’s compare that to the performance of LinkedList. Fill in the body of profileLinkedListAddBeginning and use it to classify LinkedList.add when we put the new element at the beginning. What performance do you expect? Are the results consistent with your expectations?

Finally, fill in the body of profileLinkedListAddEnd and use it to classify LinkedList.add when we put the new element at the end. What performance do you expect? Are the results consistent with your expectations?

I’ll present results and answer these questions in the next chapter.