   This HTML version of Think Data Structures is provided for convenience, but it is not the best format of the book. In particular, some of the symbols are not rendered correctly.

You might prefer to read the PDF version.

Or you can buy this book on Amazon.com.

# Chapter 4  LinkedList

This chapter presents solutions to the previous exercise and continues the discussion of analysis of algorithms.

## 4.1  Classifying `MyLinkedList` methods

My implementation of `indexOf` is below. Read through it and see if you can identify its order of growth before you read the explanation.

```    public int indexOf(Object target) {
Node node = head;
for (int i=0; i<size; i++) {
if (equals(target, node.data)) {
return i;
}
node = node.next;
}
return -1;
}
```

Initially `node` gets a copy of `head`, so they both refer to the same `Node`. The loop variable, `i`, counts from 0 to `size-1`. Each time through the loop, we use `equals` to see if we’ve found the target. If so, we return `i` immediately. Otherwise we advance to the next `Node` in the list.

Normally we would check to make sure the next `Node` is not `null`, but in this case it is safe because the loop ends when we get to the end of the list (assuming `size` is consistent with the actual number of nodes in the list).

If we get through the loop without finding the target, we return `-1`.

So what is the order of growth for this method?

1. Each time through the loop we invoke `equals`, which is constant time (it might depend on the size of `target` or `data`, but it doesn’t depend on the size of the list). The other operations in the loop are also constant time.
2. The loop might run n times, because in the worse case, we might have to traverse the whole list.

So the run time of this method is proportional to the length of the list.

Next, here is my implementation of the two-parameter `add` method. Again, you should try to classify it before you read the explanation.

```    public void add(int index, E element) {
if (index == 0) {
head = new Node(element, head);
} else {
Node node = getNode(index-1);
node.next = new Node(element, node.next);
}
size++;
}
```

If `index==0`, we’re adding the new `Node` at the beginning, so we handle that as a special case. Otherwise, we have to traverse the list to find the element at `index-1`. We use the helper method `getNode`:

```    private Node getNode(int index) {
if (index < 0 || index >= size) {
throw new IndexOutOfBoundsException();
}
Node node = head;
for (int i=0; i<index; i++) {
node = node.next;
}
return node;
}
```

`getNode` checks whether `index` is out of bounds; if so, it throws an exception. Otherwise it traverses the list and returns the requested Node.

Jumping back to `add`, once we find the right `Node`, we create the new `Node` and put it between `node` and `node.next`. You might find it helpful to draw a diagram of this operation to make sure you understand it.

So, what’s the order of growth for `add`?

1. `getNode` is similar to `indexOf`, and it is linear for the same reason.
2. In `add`, everything before and after `getNode` is constant time.

So all together, `add` is linear.

Finally, let’s look at `remove`:

```    public E remove(int index) {
E element = get(index);
if (index == 0) {
head = head.next;
} else {
Node node = getNode(index-1);
node.next = node.next.next;
}
size--;
return element;
}
```

`remove` uses `get` to find and store the element at `index`. Then it removes the `Node` that contained it.

If `index==0`, we handle that as a special case again. Otherwise we find the node at `index-1` and modify it to skip over `node.next` and link directly to `node.next.next`. This effectively removes `node.next` from the list, and it can be garbage collected.

Finally, we decrement `size` and return the element we retrieved at the beginning.

So, what’s the order of growth for `remove`? Everything in `remove` is constant time except `get` and `getNode`, which are linear. So `remove` is linear.

When people see two linear operations, they sometimes think the result is quadratic, but that only applies if one operation is nested inside the other. If you invoke one operation after the other, the run times add. If they are both in O(n), the sum is also in O(n).

## 4.2  Comparing `MyArrayList` and `MyLinkedList`

The following table summarizes the differences between `MyLinkedList` and `MyArrayList`, where `1` means O(1) or constant time and n means O(n) or linear.

 MyArrayList MyLinkedList add (at the end) 1 n add (at the beginning) n 1 add (in general) n n get / set 1 n indexOf / lastIndexOf n n isEmpty / size 1 1 remove (from the end) 1 n remove (from the beginning) n 1 remove (in general) n n

The operations where `MyArrayList` has an advantage are adding at the end, removing from the end, getting and setting.

The operations where `MyLinkedList` has an advantage are adding at the beginning and removing from the beginning.

For the other operations, the two implementations are in the same order of growth.

Which implementation is better? It depends on which operations you are likely to use the most. And that’s why Java provides more than one implementation, because it depends.

## 4.3  Profiling

For the next exercise I provide a class called `Profiler` that contains code that runs a method with a range of problem sizes, measures run times, and plots the results.

You will use `Profiler` to classify the performance of the `add` method for the Java implementations of `ArrayList` and `LinkedList`.

Here’s an example that shows how to use the profiler:

```    public static void profileArrayListAddEnd() {
Timeable timeable = new Timeable() {
List<String> list;

public void setup(int n) {
list = new ArrayList<String>();
}

public void timeMe(int n) {
for (int i=0; i<n; i++) {
list.add("a string");
}
}
};

String title = "ArrayList add end";
Profiler profiler = new Profiler(title, timeable);

int startN = 4000;
int endMillis = 1000;
XYSeries series = profiler.timingLoop(startN, endMillis);
profiler.plotResults(series);
}
```

This method measures the time it takes to run `add` on an `ArrayList`, which adds the new element at the end. I’ll explain the code and then show the results.

In order to use `Profiler`, we need to create a `Timeable` object that provides two methods: `setup` and `timeMe`. The `setup` method does whatever needs to be done before we start the clock; in this case it creates an empty list. Then `timeMe` does whatever operation we are trying to measure; in this case it adds n elements to the list.

The code that creates `timeable` is an anonymous class that defines a new implementation of the `Timeable` interface and creates an instance of the new class at the same time. If you are not familiar with anonymous classes, you can read about them here: http://thinkdast.com/anonclass.

But you don’t need to know much for the next exercise; even if you are not comfortable with anonymous classes, you can copy and modify the example code.

The next step is to create the `Profiler` object, passing the `Timeable` object and a title as parameters.

The `Profiler` provides `timingLoop` which uses the `Timeable` object stored as an instance variable. It invokes the `timeMe` method on the `Timeable` object several times with a range of values of n. `timingLoop` takes two parameters:

• `startN` is the value of n the timing loop should start at.
• `endMillis` is a threshold in milliseconds. As `timingLoop` increases the problem size, the run time increases; when the run time exceeds this threshold, `timingLoop` stops.

When you run the experiments, you might have to adjust these parameters. If `startN` is too low, the run time might be too short to measure accurately. If `endMillis` is too low, you might not get enough data to see a clear relationship between problem size and run time.

This code is in `ProfileListAdd.java`, which you’ll run in the next exercise. When I ran it, I got this output:

```4000, 3
8000, 0
16000, 1
32000, 2
64000, 3
128000, 6
256000, 18
512000, 30
1024000, 88
2048000, 185
4096000, 242
8192000, 544
16384000, 1325
```

The first column is problem size, n; the second column is run time in milliseconds. The first few measurements are pretty noisy; it might have been better to set `startN` around 64000.

The result from `timingLoop` is an `XYSeries` that contains this data. If you pass this series to `plotResults`, it generates a plot like the one in Figure 4.1. Figure 4.1: Profiling results: run time versus problem size for adding n elements to the end of an `ArrayList`.

The next section explains how to interpret it.

## 4.4  Interpreting results

Based on our understanding of how `ArrayList` works, we expect the `add` method to take constant time when we add elements to the end. So the total time to add n elements should be linear.

To test that theory, we could plot total run time versus problem size, and we should see a straight line, at least for problem sizes that are big enough to measure accurately. Mathematically, we can write the function for that line:

 runtime = a + b n

where a is the intercept of the line and b is the slope.

On the other hand, if `add` is linear, the total time for n adds would be quadratic. If we plot run time versus problem size, we expect to see a parabola. Or mathematically, something like:

 runtime = a + b n + c n2

With perfect data, we might be able to tell the difference between a straight line and a parabola, but if the measurements are noisy, it can be hard to tell. A better way to interpret noisy measurements is to plot run time and problem size on a log-log scale.

Why? Let’s suppose that run time is proportional to nk, but we don’t know what the exponent k is. We can write the relationship like this:

 runtime = a + b n + … + c nk

For large values of n, the term with the largest exponent is the most important, so:

 runtime ≈ c nk

where ≈ means “approximately equal”. Now, if we take the logarithm of both sides of this equation:

 log(runtime) ≈ log(c) + k log(n)

This equation implies that if we plot runtime versus n on a log-log scale, we expect to see a straight line with intercept log(c) and slope k. We don’t care much about the intercept, but the slope indicates the order of growth: if k=1, the algorithm is linear; if k=2, it’s quadratic.

Looking at the figure in the previous section, you can estimate the slope by eye. But when you call `plotResults` it computes a least squares fit to the data and prints the estimated slope. In this example:

```Estimated slope = 1.06194352346708
```

which is close to 1; and that suggests that the total time for n adds is linear, so each add is constant time, as expected.

One important point: if you see a straight line on a graph like this, that does not mean that the algorithm is linear. If the run time is proportional to nk for any exponent k, we expect to see a straight line with slope k. If the slope is close to 1, that suggests the algorithm is linear. If it is close to 2, it’s probably quadratic.

## 4.5  Exercise 4

In the repository for this book you’ll find the source files you need for this exercise:

1. `Profiler.java` contains the implementation of the `Profiler` class described above. You will use this class, but you don’t have to know how it works. But feel free to read the source.
2. `ProfileListAdd.java` contains starter code for this exercise, including the example, above, which profiles `ArrayList.add`. You will modify this file to profile a few other methods.

Also, in the `code` directory , you’ll find the Ant build file `build.xml`.

Run `ant ProfileListAdd` to run `ProfileListAdd.java`. You should get results similar to Figure 4.1, but you might have to adjust `startN` or `endMillis`. The estimated slope should be close to 1, indicating that performing n add operations takes time proportional to n raised to the exponent 1; that is, it is in O(n).

In `ProfileListAdd.java`, you’ll find an empty method named `profileArrayListAddBeginning`. Fill in the body of this method with code that tests `ArrayList.add`, always putting the new element at the beginning. If you start with a copy of `profileArrayListAddEnd`, you should only have to make a few changes. Add a line in `main` to invoke this method.

Run `ant ProfileListAdd` again and interpret the results. Based on our understanding of how `ArrayList` works, we expect each add operation to be linear, so the total time for n adds should be quadratic. If so, the estimated slope of the line, on a log-log scale, should be near 2. Is it?

Now let’s compare that to the performance of `LinkedList`. Fill in the body of `profileLinkedListAddBeginning` and use it to classify `LinkedList.add` when we put the new element at the beginning. What performance do you expect? Are the results consistent with your expectations?

Finally, fill in the body of `profileLinkedListAddEnd` and use it to classify `LinkedList.add` when we put the new element at the end. What performance do you expect? Are the results consistent with your expectations?

I’ll present results and answer these questions in the next chapter.

#### Are you using one of our books in a class?

We'd like to know about it. Please consider filling out this short survey.

Think Data Structures   Think DSP   Think Java   Think Bayes   Think Python 2e   Think Stats 2e   Think Complexity      