This HTML version of Think Data Structures is provided for convenience, but it is not the best format of the book.
In particular, some of the symbols are not rendered correctly.
You might prefer to read the PDF version.
Or you can buy this book on Amazon.com.
Chapter 12 TreeMap
This chapter presents the binary search tree, which is an efficient
implementation of the Map interface that is particularly useful
if we want to keep the elements sorted.
12.1 What’s wrong with hashing?
At this point you should be familiar with the Map interface and
the HashMap implementation provided by Java. And by making your
own Map using a hash table, you should understand how
HashMap works and why we expect its core methods to be constant
time.
Because of this performance, HashMap is widely used, but it is
not the only implementation of Map . There are a few reasons you
might want another implementation: - Hashing can be slow, so even though
HashMap operations are
constant time, the “constant” might be big. - Hashing works well if the hash function distributes the keys evenly
among the sub-maps. But designing good hash functions is not easy,
and if too many keys end up in the same sub-map, the performance of
the
HashMap may be poor. - The keys in a hash table are not stored in any particular order; in
fact, the order might change when the table grows and the keys are
rehashed. For some applications, it is necessary, or at least useful,
to keep the keys in order.
It is hard to solve all of these problems at the same
time, but Java provides an implementation called TreeMap that
comes close: - It doesn’t use a hash function, so it avoids the cost of hashing
and the difficulty of choosing a hash function.
- Inside the
TreeMap , the keys are are stored in a
binary search tree, which makes it possible to traverse the
keys, in order, in linear time. - The run time of the core methods is proportional to logn,
which is not quite as good as constant time, but it is still
very good.
In the next section, I’ll explain how binary search trees work and then you
will use one to implement a Map . Along the way, we’ll analyze
the performance of the core map methods when implemented using a tree.
12.2 Binary search tree
A binary search tree (BST) is a tree where each node contains a key, and
every node has the “BST property”: - If
node has a left child, all keys in the left subtree must
be less than the key in node . - If
node has a right child, all keys in the right subtree must
be greater than the key in node .
Figure 12.1: Example of a binary search tree. |
Figure 12.1
shows a tree of integers that has this property.
This figure is from the Wikipedia page on binary search trees at
http://thinkdast.com/bst, which you
might find useful while you work on this exercise. The key in the root is 8, and you can confirm that all keys to the left
of the root are less than 8, and all keys to the right are greater. You
can also check that the other nodes have this property.
Looking up a key in a binary search tree is fast because we don’t have
to search the entire tree. Starting at the root, we can use the
following algorithm: - Compare the key you are looking for,
target , to the key in
the current node. If they are equal, you are done. - If
target is less than the current key, search the left tree.
If there isn’t one, target is not in the tree. - If
target is greater than the current key, search the right
tree. If there isn’t one, target is not in the tree.
At each level of the tree, you only have to search one child. For
example, if you look for target = 4 in the previous diagram,
you start at the root, which contains the key 8 . Because
target is less than 8 , you go left. Because
target is greater than 3 you go right. Because
target is less than 6 , you go left. And then you find
the key you are looking for. In this example, it takes four comparisons to find the target,
even though the tree contains nine keys. In general, the number of
comparisons is proportional to the height of the tree, not the number of
keys in the tree.
So what can we say about the relationship between the height of the
tree, h , and the number of nodes, n? Starting small
and working up: - If
h=1 , the tree only contains one node, so n=1 . - If
h=2 , we can add two more nodes, for a total of
n=3 . - If
h=3 , we can add up to four more nodes, for a total
of n=7 . - If
h=4 , we can add up to eight more nodes, for a total
of n=15 .
By now you might see the pattern. If we number the levels of the tree from
1 to h , the level with index i can have up to
2i−1 nodes. And the total number of nodes in h levels is 2h−1.
If we have we can take the logarithm base 2 of both sides: which means that the height of the tree is proportional to
logn, if the tree is full; that is, if each level contains the
maximum number of nodes. So we expect that we can look up a key in a binary search tree in time
proportional to logn. This is true if the tree is full, and
even if the tree is only partially full. But it is not always true, as
we will see.
An algorithm that takes time proportional to logn is called
“logarithmic” or “log time”, and it belongs to the order of growth
O(logn).
12.3 Exercise 10
For this exercise you will write an implementation of
the Map interface using a binary search tree.
Here’s the beginning of an implementation, called MyTreeMap : public class MyTreeMap<K, V> implements Map<K, V> {
private int size = 0;
private Node root = null;
The instance variables are size , which keeps track of the
number of keys, and root , which is a reference to the root node
in the tree. When the tree is empty, root is null and
size is 0. Here’s the definition of Node , which is defined inside
MyTreeMap : protected class Node {
public K key;
public V value;
public Node left = null;
public Node right = null;
public Node(K key, V value) {
this.key = key;
this.value = value;
}
}
Each node contains a key-value pair and references to two child nodes,
left and right . Either or both of the child nodes can
be null . Some of the Map methods are easy to implement, like
size and clear : public int size() {
return size;
}
public void clear() {
size = 0;
root = null;
}
size is clearly constant time.
clear appears to be constant time, but consider this: when
root is set to null , the garbage collector reclaims
the nodes in the tree, which takes linear time. Should work done by the
garbage collector count? I think so.
In the next section, you’ll fill in some of the other methods, including
the most important ones, get and put .
12.4 Implementing a TreeMap
In the repository for this book, you’ll find these source files: MyTreeMap.java contains the code from the previous section
with outlines for the missing methods.MyTreeMapTest.java contains the unit tests for
MyTreeMap .
Run ant build to compile the source files. Then run ant MyTreeMapTest . Several tests should fail, because you have some
work to do!
I’ve provided outlines for get and containsKey . Both of
them use findNode , which is a private method I defined; it is
not part of the Map interface. Here’s how it starts: private Node findNode(Object target) {
if (target == null) {
throw new IllegalArgumentException();
}
@SuppressWarnings("unchecked")
Comparable<? super K> k = (Comparable<? super K>) target;
// TODO: FILL THIS IN!
return null;
}
The parameter target is the key we’re looking for. If
target is null , findNode throws an exception.
Some implementations of Map can handle null as a key,
but in a binary search tree, we need to be able to compare keys, so
dealing with null is problematic. To keep things simple,
this implementation does not allow null as a key. The next lines show how we can compare target to a key in the
tree. From the signature of get and containsKey , the
compiler considers target to be an Object . But we need
to be able to compare keys, so we typecast target to
Comparable<? super K> , which means that
it is comparable to an instance of type K , or any superclass of
K . If you are not familiar with this use of “type
wildcards”, you can read more at
http://thinkdast.com/gentut.
Fortunately, dealing with Java’s type system is not the point of this
exercise. Your job is to fill in the rest of findNode . If it
finds a node that contains target as a key, it should return
the node. Otherwise it should return null . When you get this
working, the tests for get and containsKey should
pass. Note that your solution should only search one path through the tree, so
it should take time proportional to the height of the tree. You should
not search the whole tree!
Your next task is to fill in containsValue . To get you started,
I’ve provided a helper method, equals , that compares
target and a given key. Note that the values in the tree (as
opposed to the keys) are not necessarily comparable, so we can’t use
compareTo ; we have to invoke equals on target .
Unlike your previous solution for findNode , your solution for
containsValue does have to search the whole tree, so its
run time is proportional to the number of keys, n, not the
height of the tree, h . The next method you should fill in is put . I’ve provided
starter code that handles the simple cases: public V put(K key, V value) {
if (key == null) {
throw new IllegalArgumentException();
}
if (root == null) {
root = new Node(key, value);
size++;
return null;
}
return putHelper(root, key, value);
}
private V putHelper(Node node, K key, V value) {
// TODO: Fill this in.
}
If you try to put null as a key, put throws an
exception. If the tree is empty, put creates a new node and initializes
the instance variable root .
Otherwise, it calls putHelper , which is a private method I
defined; it is not part of the Map interface. Fill in putHelper so it searches the tree and: - If
key is already in the tree, it replaces the old value with
the new, and returns the old value. - If
key is not in the tree, it creates a new node, finds the
right place to add it, and returns null .
Your implementation of put should take time proportional to the
height of the tree, h, not the number of elements, n. Ideally you
should search the tree only once, but if you find it easier to search
twice, you can do that; it will be slower, but it doesn’t change the
order of growth.
Finally, you should fill in the body of keySet . According to
the documentation at http://thinkdast.com/mapkeyset, this method
should return a Set that iterates the keys in order; that is,
in increasing order according to the compareTo method. The
HashSet implementation of Set , which we used in
Section 8.3, doesn’t maintain the order of the keys, but
the LinkedHashSet implementation does. You can read about it
at http://thinkdast.com/linkedhashset. I’ve provided an outline of keySet that creates and returns a
LinkedHashSet : public Set<K> keySet() {
Set<K> set = new LinkedHashSet<K>();
return set;
}
You should finish off this method so it adds the keys from the tree to
set in ascending order. HINT: you might want to write a helper
method; you might want to make it recursive; and you might want to
read about in-order tree traversal at
http://thinkdast.com/inorder.
When you are done, all tests should pass. In the next chapter, I’ll go
over my solutions and test the performance of the core methods.
|