   This HTML version of Think Data Structures is provided for convenience, but it is not the best format of the book. In particular, some of the symbols are not rendered correctly.

You might prefer to read the PDF version.

Or you can buy this book on Amazon.com.

# Chapter 3  ArrayList

This chapter kills two birds with one stone: I present solutions to the previous exercise and demonstrate a way to classify algorithms using amortized analysis.

## 3.1  Classifying MyArrayList methods

For many methods, we can identify the order of growth by examining the code. For example, here’s the implementation of `get` from `MyArrayList`:

```    public E get(int index) {
if (index < 0 || index >= size) {
throw new IndexOutOfBoundsException();
}
return array[index];
}
```

Everything in `get` is constant time, so `get` is constant time. No problem.

Now that we’ve classified `get`, we can classify `set`, which uses it. Here is our implementation of `set` from the previous exercise:

```    public E set(int index, E element) {
E old = get(index);
array[index] = element;
return old;
}
```

One slightly clever part of this solution is that it does not check the bounds of the array explicitly; it takes advantage of `get`, which raises an exception if the index is invalid.

Everything in `set`, including the invocation of `get`, is constant time, so `set` is also constant time.

Next we’ll look at some linear methods. For example, here’s my implementation of `indexOf`:

```    public int indexOf(Object target) {
for (int i = 0; i<size; i++) {
if (equals(target, array[i])) {
return i;
}
}
return -1;
}
```

Each time through the loop, `indexOf` invokes `equals`, so we have to classify `equals` first. Here it is:

```    private boolean equals(Object target, Object element) {
if (target == null) {
return element == null;
} else {
return target.equals(element);
}
}
```

This method invokes `target.equals`; the run time of this method might depend on the size of `target` or `element`, but it probably doesn’t depend on the size of the array, so we consider it constant time for purposes of analyzing `indexOf`.

Getting back to `indexOf`, everything inside the loop is constant time, so the next question we have to consider is: how many times does the loop execute?

If we get lucky, we might find the target object right away and return after testing only one element. If we are unlucky, we might have to test all of the elements. On average, we expect to test half of the elements, so this method is considered linear (except in the unlikely case that we know the target element is at the beginning of the array).

The analysis of `remove` is similar. Here’s my implementation:

```    public E remove(int index) {
E element = get(index);
for (int i=index; i<size-1; i++) {
array[i] = array[i+1];
}
size--;
return element;
}
```

It uses `get`, which is constant time, and then loops through the array, starting from `index`. If we remove the element at the end of the list, the loop never runs and this method is constant time. If we remove the first element, we loop through all of the remaining elements, which is linear. So, again, this method is considered linear (except in the special case where we know the element is at the end or a constant distance from the end).

## 3.2  Classifying `add`

Here’s a version of `add` that takes an index and an element as parameters:

```    public void add(int index, E element) {
if (index < 0 || index > size) {
throw new IndexOutOfBoundsException();
}
// add the element to get the resizing
add(element);

// shift the other elements
for (int i=size-1; i>index; i--) {
array[i] = array[i-1];
}
// put the new one in the right place
array[index] = element;
}
```

This two-parameter version, called `add(int, E)`, uses the one-parameter version, called `add(E)`, which puts the new element at the end. Then it shifts the other elements to the right, and puts the new element in the correct place.

Before we can classify the two-parameter `add(int, E)`, we have to classify the one-parameter `add(E)`:

```    public boolean add(E element) {
if (size >= array.length) {
// make a bigger array and copy over the elements
E[] bigger = (E[]) new Object[array.length * 2];
System.arraycopy(array, 0, bigger, 0, array.length);
array = bigger;
}
array[size] = element;
size++;
return true;
}
```

The one-parameter version turns out to be hard to analyze. If there is an unused space in the array, it is constant time, but if we have to resize the array, it’s linear because `System.arraycopy` takes time proportional to the size of the array.

So is add constant time or linear? We can classify this method by thinking about the average number of operations per add over a series of n adds. For simplicity, assume we start with an array that has room for 2 elements.

• The first time we call add, it finds unused space in the array, so it stores 1 element.
• The second time, it finds unused space in the array, so it stores 1 element.
• The third time, we have to resize the array, copy 2 elements, and store 1 element. Now the size of the array is 4.
• The fourth time stores 1 element.
• The fifth time resizes the array, copies 4 elements, and stores 1 element. Now the size of the array is 8.
• The next 3 adds store 3 elements.
• The next add copies 8 and stores 1. Now the size is 16.
• The next 7 adds store 7 elements.

And so on. Adding things up:

• After 4 adds, we’ve stored 4 elements and copied 2.
• After 8 adds, we’ve stored 8 elements and copied 6.
• After 16 adds, we’ve stored 16 elements and copied 14.

By now you should see the pattern: to do n adds, we have to store n elements and copy n−2. So the total number of operations is n + n − 2, which is 2n−2.

To get the average number of operations per add, we divide the total by n; the result is 2 − 2/n. As n gets big, the second term, 2/n, gets small. Invoking the principle that we only care about the largest exponent of n, we can think of `add` as constant time.

It might seem strange that an algorithm that is sometimes linear can be constant time on average. The key is that we double the length of the array each time it gets resized. That limits the number of times each element gets copied. Otherwise — if we add a fixed amount to the length of the array, rather than multiplying by a fixed amount — the analysis doesn’t work.

This way of classifying an algorithm, by computing the average time in a series of invocations, is called amortized analysis. You can read more about it at http://thinkdast.com/amort. The key idea is that the extra cost of copying the array is spread, or “amortized”, over a series of invocations.

Now, if `add(E)` is constant time, what about `add(int, E)`? After calling `add(E)`, it loops through part of the array and shifts elements. This loop is linear, except in the special case where we are adding at the end of the list. So `add(int, E)` is linear.

## 3.3  Problem Size

The last example we’ll consider is `removeAll`; here’s the implementation in `MyArrayList`:

```    public boolean removeAll(Collection<?> collection) {
boolean flag = true;
for (Object obj: collection) {
flag &= remove(obj);
}
return flag;
}
```

Each time through the loop, `removeAll` invokes `remove`, which is linear. So it is tempting to think that `removeAll` is quadratic. But that’s not necessarily the case.

In this method, the loop runs once for each element in `collection`. If `collection` contains m elements and the list we are removing from contains n elements, this method is in O(nm). If the size of `collection` can be considered constant, `removeAll` is linear with respect to n. But if the size of the collection is proportional to n, `removeAll` is quadratic. For example, if `collection` always contains 100 or fewer elements, `removeAll` is linear. But if `collection` generally contains 1% of the elements in the list, `removeAll` is quadratic.

When we talk about problem size, we have to be careful about which size, or sizes, we are talking about. This example demonstrates a pitfall of algorithm analysis: the tempting shortcut of counting loops. If there is one loop, the algorithm is often linear. If there are two loops (one nested inside the other), the algorithm is often quadratic. But be careful! You have to think about how many times each loop runs. If the number of iterations is proportional to n for all loops, you can get away with just counting the loops. But if, as in this example, the number of iterations is not always proportional to n, you have to give it more thought.

## 3.4  Linked Data Structures

For the next exercise I provide a partial implementation of the `List` interface that uses a linked list to store the elements. If you are not familiar with linked lists, you can read about them at http://thinkdast.com/linkedlist, but this section provides a brief introduction.

A data structure is “linked” if it is made up of objects, often called “nodes”, that contain references to other nodes. In a linked list, each node contains a reference to the next node in the list. Other linked structures include trees and graphs, in which nodes can contain references to more than one other node.

Here’s a class definition for a simple node:

```public class ListNode {

public Object data;
public ListNode next;

public ListNode() {
this.data = null;
this.next = null;
}

public ListNode(Object data) {
this.data = data;
this.next = null;
}

public ListNode(Object data, ListNode next) {
this.data = data;
this.next = next;
}

public String toString() {
return "ListNode(" + data.toString() + ")";
}
}
```

The `ListNode` object has two instance variables: `data` is a reference to some kind of `Object`, and `next` is a reference to the next node in the list. In the last node in the list, by convention, `next` is `null`.

`ListNode` provides several constructors, allowing you to provide values for `data` and `next`, or initialize them to the default value, `null`.

You can think of each `ListNode` as a list with a single element, but more generally, a list can contain any number of nodes. There are several ways to make a new list. A simple option is to create a set of `ListNode` objects, like this:

```        ListNode node1 = new ListNode(1);
ListNode node2 = new ListNode(2);
ListNode node3 = new ListNode(3);
```

And then link them up, like this:

```        node1.next = node2;
node2.next = node3;
node3.next = null;
```

Alternatively, you can create a node and link it at the same time. For example, if you want to add a new node at the beginning of a list, you can do it like this:

```        ListNode node0 = new ListNode(0, node1);
```

After this sequence of instructions, we have four nodes containing the `Integer`s 0, 1, 2, and 3 as data, linked up in increasing order. In the last node, the `next` field is `null`. Figure 3.1: Object diagram of a linked list.

Figure 3.1 is an object diagram that shows these variables and the objects they refer to. In an object diagram, variables appear as names inside boxes, with arrows that show what they refer to. Objects appear as boxes with their type on the outside (like `ListNode` and `Integer`) and their instance variables on the inside.

## 3.5  Exercise 3

In the repository for this book, you’ll find the source files you need for this exercise:

• `MyLinkedList.java` contains a partial implementation of the `List` interface using a linked list to store the elements.
• `MyLinkedListTest.java` contains JUnit tests for `MyLinkedList`.

Run `ant MyArrayList` to run `MyArrayList.java`, which contains a few simple tests.

Then you can run `ant MyArrayListTest` to run the JUnit tests. Several of them should fail. If you examine the source code, you’ll find three `TODO` comments indicating the methods you should fill in.

Before you start, let’s walk through some of the code. Here are the instance variables and the constructor for `MyLinkedList`:

```public class MyLinkedList<E> implements List<E> {

private int size;            // keeps track of the number of elements
private Node head;           // reference to the first node

public MyLinkedList() {
head = null;
size = 0;
}
}
```

As the comments indicate, `size` keeps track of how many elements are in `MyLinkedList`; `head` is a reference to the first `Node` in the list or `null` if the list is empty.

Storing the number of elements is not necessary, and in general it is risky to keep redundant information, because if it’s not updated correctly, it creates opportunities for error. It also takes a little bit of extra space.

But if we store `size` explicitly, we can implement the `size` method in constant time; otherwise, we would have to traverse the list and count the elements, which requires linear time.

Because we store `size` explicitly, we have to update it each time we add or remove an element, so that slows down those methods a little, but it doesn’t change their order of growth, so it’s probably worth it.

The constructor sets `head` to `null`, which indicates an empty list, and sets `size` to 0.

This class uses the type parameter `E` for the type of the elements. If you are not familiar with type parameters, you might want to read this tutorial: http://thinkdast.com/types.

The type parameter also appears in the definition of `Node`, which is nested inside `MyLinkedList`:

```    private class Node {
public E data;
public Node next;

public Node(E data, Node next) {
this.data = data;
this.next = next;
}
}
```

Other than that, `Node` is similar to `ListNode` above.

Finally, here’s my implementation of `add`:

```    public boolean add(E element) {
if (head == null) {
head = new Node(element);
} else {
Node node = head;
// loop until the last node
for ( ; node.next != null; node = node.next) {}
node.next = new Node(element);
}
size++;
return true;
}
```

This example demonstrates two patterns you’ll need for your solutions:

1. For many methods, we have to handle the first element of the list as a special case. In this example, if we are adding the first element of a list, we have to modify `head`. Otherwise, we traverse the list, find the end, and add the new node.
2. This method shows how to use a `for` loop to traverse the nodes in a list. In your solutions, you will probably write several variations on this loop. Notice that we have to declare `node` before the loop so we can access it after the loop.

Now it’s your turn. Fill in the body of `indexOf`. As usual, you should read the documentation, at http://thinkdast.com/listindof, so you know what it is supposed to do. In particular, notice how it’s supposed to handle `null`.

As in the previous exercise, I provide a helper method called `equals` that compares an element from the array to a target value and checks whether they are equal — and it handles `null` correctly. This method is private because it is used inside this class but it is not part of the `List` interface.

When you are done, run the tests again; `testIndexOf` should pass now, as well as the other tests that depend on it.

Next, you should fill in the two-parameter version of `add`, which takes an index and stores the new value at the given index. Again, read the documentation at http://thinkdast.com/listadd, write an implementation, and run the tests for confirmation.

Last one: fill in the body of `remove`. The documentation is here: http://thinkdast.com/listrem. When you finish this one, all tests should pass.

Once you have your implementation working, compare it to the version in the `solution` directory of the repository.

## 3.6  A note on garbage collection

In `MyArrayList` from the previous exercise, the array grows if necessary, but it never shrinks. The array never gets garbage collected, and the elements don’t get garbage collected until the list itself is destroyed.

One advantage of the linked list implementation is that it shrinks when elements are removed, and the unused nodes can get garbage collected immediately.

Here is my implementation of the `clear` method:

```    public void clear() {
head = null;
size = 0;
}
```

When we set `head` to `null`, we remove a reference to the first `Node`. If there are no other references to that `Node` (and there shouldn’t be), it will get garbage collected. At that point, the reference to the second `Node` is removed, so it gets garbage collected, too. This process continues until all nodes are collected.

So how should we classify `clear`? The method itself contains two constant time operations, so it sure looks like it’s constant time. But when you invoke it, you make the garbage collector do work that’s proportional to the number of elements. So maybe we should consider it linear!

This is an example of what is sometimes called a performance bug: a program that is correct in the sense that it does the right thing, but it doesn’t belong to the order of growth we expected. In languages like Java that do a lot of work, like garbage collection, behind the scenes, this kind of bug can be hard to find.

#### Are you using one of our books in a class?

We'd like to know about it. Please consider filling out this short survey.

Think Data Structures   Think DSP   Think Java   Think Bayes   Think Python 2e   Think Stats 2e   Think Complexity      