   This HTML version of Think Data Structures is provided for convenience, but it is not the best format of the book. In particular, some of the symbols are not rendered correctly.

You might prefer to read the PDF version.

Or you can buy this book on Amazon.com.

# Chapter 13  Binary search tree

This chapter presents solutions to the previous exercise, then tests the performance of the tree-backed map. I present a problem with the implementation and explain how Java’s `TreeMap` solves it.

## 13.1  A simple `MyTreeMap`

In the previous exercise I gave you the outline of `MyTreeMap` and asked you to fill in the missing methods. Now I’ll present a solution, starting with `findNode`:

```private Node findNode(Object target) {
// some implementations can handle null as a key, but not this one
if (target == null) {
throw new IllegalArgumentException();
}

// something to make the compiler happy
@SuppressWarnings("unchecked")
Comparable<? super K> k = (Comparable<? super K>) target;

// the actual search
Node node = root;
while (node != null) {
int cmp = k.compareTo(node.key);
if (cmp < 0)
node = node.left;
else if (cmp > 0)
node = node.right;
else
return node;
}
return null;
}
```

`findNode` is a private method used by `containsKey` and `get`; it is not part of the `Map` interface. The parameter `target` is the key we’re looking for. I explained the first part of this method in the previous exercise:

• In this implementation, `null` is not a legal value for a key.
• Before we can invoke `compareTo` on `target`, we have to typecast it to some kind of `Comparable`. The “type wildcard” used here is as permissive as possible; that is, it works with any type that implements `Comparable` and whose `compareTo` method accepts `K` or any supertype of `K`.

After all that, the actual search is relatively simple. We initialize a loop variable `node` so it refers to the root node. Each time through the loop, we compare the target to `node.key`. If the target is less than the current key, we move to the left child. If it’s greater, we move to the right child. And if it’s equal, we return the current node.

If we get to the bottom of the tree without finding the target, we conclude that it is not in the tree and return `null`.

## 13.2  Searching for values

As I explained in the previous exercise, the run time of `findNode` is proportional to the height of the tree, not the number of nodes, because we don’t have to search the whole tree. But for `containsValue`, we have to search the values, not the keys; the BST property doesn’t apply to the values, so we have to search the whole tree.

My solution is recursive:

```public boolean containsValue(Object target) {
return containsValueHelper(root, target);
}

private boolean containsValueHelper(Node node, Object target) {
if (node == null) {
return false;
}
if (equals(target, node.value)) {
return true;
}
if (containsValueHelper(node.left, target)) {
return true;
}
if (containsValueHelper(node.right, target)) {
return true;
}
return false;
}
```

`containsValue` takes the target value as a parameter and immediately invokes `containsValueHelper`, passing the root of the tree as an additional parameter.

Here’s how `containsValueHelper` works:

• The first `if` statement checks the base case of the recursion. If `node` is `null`, that means we have recursed to the bottom of the tree without finding the `target`, so we should return `false`. Note that this only means that the target did not appear on one path through the tree; it is still possible that it will be found on another.
• The second case checks whether we’ve found what we’re looking for. If so, we return `true`. Otherwise, we have to keep going.
• The third case makes a recursive call to search for `target` in the left subtree. If we find it, we can return `true` immediately, without searching the right subtree. Otherwise, we keep going.
• The fourth case searches the right subtree. Again, if we find what we are looking for, we return `true`. Otherwise, having searched the whole tree, we return `false`.

This method “visits” every node in the tree, so it takes time proportional to the number of nodes.

## 13.3  Implementing put

The `put` method is a little more complicated than `get` because it has to deal with two cases: (1) if the given key is already in the tree, it replaces and returns the old value; (2) otherwise it has to add a new node to the tree, in the right place.

In the previous exercise, I provided this starter code:

```public V put(K key, V value) {
if (key == null) {
throw new IllegalArgumentException();
}
if (root == null) {
root = new Node(key, value);
size++;
return null;
}
return putHelper(root, key, value);
}
```

And asked you to fill in `putHelper`. Here’s my solution:

```private V putHelper(Node node, K key, V value) {
Comparable<? super K> k = (Comparable<? super K>) key;
int cmp = k.compareTo(node.key);

if (cmp < 0) {
if (node.left == null) {
node.left = new Node(key, value);
size++;
return null;
} else {
return putHelper(node.left, key, value);
}
}
if (cmp > 0) {
if (node.right == null) {
node.right = new Node(key, value);
size++;
return null;
} else {
return putHelper(node.right, key, value);
}
}
V oldValue = node.value;
node.value = value;
return oldValue;
}
```

The first parameter, `node`, is initially the root of the tree, but each time we make a recursive call, it refers to a different subtree. As in `get`, we use the `compareTo` method to figure out which path to follow through the tree. If `cmp < 0`, the key we’re adding is less than `node.key`, so we want to look in the left subtree. There are two cases:

• If the left subtree is empty, that is, if `node.left` is `null`, we have reached the bottom of the tree without finding `key`. At this point, we know that `key` isn’t in the tree, and we know where it should go. So we create a new node and add it as the left child of `node`.
• Otherwise we make a recursive call to search the left subtree.

If `cmp > 0`, the key we’re adding is greater than `node.key`, so we want to look in the right subtree. And we handle the same two cases as in the previous branch. Finally, if `cmp == 0`, we found the key in the tree, so we replace and return the old value.

I wrote this method recursively to make it more readable, but it would be straightforward to rewrite it iteratively, which you might want to do as an exercise.

## 13.4  In-order traversal

The last method I asked you to write is `keySet`, which returns a `Set` that contains the keys from the tree in ascending order. In other implementations of `Map`, the keys returned by `keySet` are in no particular order, but one of the capabilities of the tree implementation is that it is simple and efficient to sort the keys. So we should take advantage of that.

Here’s my solution:

```public Set<K> keySet() {
Set<K> set = new LinkedHashSet<K>();
addInOrder(root, set);
return set;
}

private void addInOrder(Node node, Set<K> set) {
if (node == null) return;
addInOrder(node.left, set);
set.add(node.key);
addInOrder(node.right, set);
}
```

In `keySet`, we create a `LinkedHashSet`, which is a `Set` implementation that keeps the elements in order (unlike most other `Set` implementations). Then we call `addInOrder` to traverse the tree.

The first parameter, `node`, is initially the root of the tree, but as you should expect by now, we use it to traverse the tree recursively. `addInOrder` performs a classic “in-order traversal” of the tree.

If `node` is `null`, that means the subtree is empty, so we return without adding anything to `set`. Otherwise we:

1. Traverse the left subtree in order.
2. Add `node.key`.
3. Traverse the right subtree in order.

Remember that the BST property guarantees that all nodes in the left subtree are less than `node.key`, and all nodes in the right subtree are greater. So we know that `node.key` has been added in the correct order.

Applying the same argument recursively, we know that the elements from the left subtree are in order, as well as the elements from the right subtree. And the base case is correct: if the subtree is empty, no keys are added. So we can conclude that this method adds all keys in the correct order.

Because this method visits every node in the tree, like `containsValue`, it takes time proportional to n.

## 13.5  The logarithmic methods

In `MyTreeMap`, the methods `get` and `put` take time proportional to the height of the tree, h. In the previous exercise, we showed that if the tree is full — if every level of the tree contains the maximum number of nodes — the height of the tree is proportional to logn.

And I claimed that `get` and `put` are logarithmic; that is, they take time proportional to logn. But for most applications, there’s no guarantee that the tree is full. In general, the shape of the tree depends on the keys and what order they are added.

To see how this works out in practice, we’ll test our implementation with two sample datasets: a list of random strings and a list of timestamps in increasing order.

Here’s the code that generates random strings:

```Map<String, Integer> map = new MyTreeMap<String, Integer>();

for (int i=0; i<n; i++) {
String uuid = UUID.randomUUID().toString();
map.put(uuid, 0);
}
```

`UUID` is a class in the `java.util` package that can generate a random “universally unique identifier”. UUIDs are useful for a variety of applications, but in this example we’re taking advantage of an easy way to generate random strings.

I ran this code with `n=16384` and measured the run time and the height of the final tree. Here’s the output:

```Time in milliseconds = 151
Final size of MyTreeMap = 16384
log base 2 of size of MyTreeMap = 14.0
Final height of MyTreeMap = 33
```

I included “log base 2 of size of MyTreeMap” to see how tall the tree would be if it were full. The result indicates that a full tree with height 14 would contain 16,384 nodes.

The actual tree of random strings has height 33, which is substantially more than the theoretical minimum, but not too bad. To find one key in a collection of 16,384, we only have to make 33 comparisons. Compared to a linear search, that’s almost 500 times faster.

This performance is typical with random strings or other keys that are added in no particular order. The final height of the tree might be 2-3 times the theoretical minimum, but it is still proportional to logn, which is much less than n. In fact, logn grows so slowly as n increases, it can be difficult to distinguish logarithmic time from constant time in practice.

However, binary search trees don’t always behave so well. Let’s see what happens when we add keys in increasing order. Here’s an example that measures timestamps in nanoseconds and uses them as keys:

```MyTreeMap<String, Integer> map = new MyTreeMap<String, Integer>();

for (int i=0; i<n; i++) {
String timestamp = Long.toString(System.nanoTime());
map.put(timestamp, 0);
}
```

`System.nanoTime` returns an integer with type `long` that indicates elapsed time in nanoseconds. Each time we call it, we get a somewhat bigger number. When we convert these timestamps to strings, they appear in increasing alphabetical order.

And let’s see what happens when we run it:

```Time in milliseconds = 1158
Final size of MyTreeMap = 16384
log base 2 of size of MyTreeMap = 14.0
Final height of MyTreeMap = 16384
```

The run time is more than seven times longer than in the previous case. longer. If you wonder why, take a look at the final height of the tree: 16384! Figure 13.1: Binary search trees, balanced (left) and unbalanced (right).

If you think about how `put` works, you can figure out what’s going on. Every time we add a new key, it’s larger than all of the keys in the tree, so we always choose the right subtree, and always add the new node as the right child of the rightmost node. The result is an “unbalanced” tree that only contains right children.

The height of this tree is proportional to n, not logn, so the performance of `get` and `put` is linear, not logarithmic.

Figure 13.1 shows an example of a balanced and unbalanced tree. In the balanced tree, the height is 4 and the total number of nodes is 24−1 = 15. In the unbalanced tree with the same number of nodes, the height is 15.

## 13.6  Self-balancing trees

There are two possible solutions to this problem:

• You could avoid adding keys to the `Map` in order. But this is not always possible.
• You could make a tree that does a better job of handling keys if they happen to be in order.

The second solution is better, and there are several ways to do it. The most common is to modify `put` so that it detects when the tree is starting to become unbalanced and, if so, rearranges the nodes. Trees with this capability are called “self-balancing”. Common self-balancing trees include the AVL tree (“AVL” are the initials of the inventors), and the red-black tree, which is what the Java `TreeMap` uses.

In our example code, if we replace `MyTreeMap` with the Java `TreeMap`, the run times are about the same for the random strings and the timestamps. In fact, the timestamps run faster, even though they are in order, probably because they take less time to hash.

In summary, a binary search tree can implement `get` and `put` in logarithmic time, but only if the keys are added in an order that keeps the tree sufficiently balanced. Self-balancing trees avoid this problem by doing some additional work each time a new key is added.

You can read more about self-balancing trees at http://thinkdast.com/balancing.

## 13.7  One more exercise

In the previous exercise you didn’t have to implement `remove`, but you might want to try it. If you remove a node from the middle of the tree, you have to rearrange the remaining nodes to restore the BST property. You can probably figure out how to do that on your own, or you can read the explanation at http://thinkdast.com/bstdel.

Removing a node and rebalancing a tree are similar operations: if you do this exercise, you will have a better idea of how self-balancing trees work.

#### Are you using one of our books in a class?

We'd like to know about it. Please consider filling out this short survey.

Think Data Structures   Think DSP   Think Java   Think Bayes   Think Python 2e   Think Stats 2e   Think Complexity      