This HTML version of Think Data Structures is provided for convenience, but it is not the best format of the book.
In particular, some of the symbols are not rendered correctly.
You might prefer to read the PDF version.
Or you can buy this book on Amazon.com.
Chapter 9 The Map interface
In the next few exercises, I present several implementations of the
Map interface. One of them is based on a hash table,
which is arguably the most magical data structure ever
invented. Another, which is similar to TreeMap , is not quite as
magical, but it has the added capability that it can iterate the
elements in order.
You will have a chance to implement these data structures, and then we
will analyze their performance. But before we can explain hash tables, we’ll start with a simple
implementation of a Map using a List of key-value
pairs.
9.1 Implementing MyLinearMap
As usual, I provide starter code and you will fill in the missing
methods. Here’s the beginning of the MyLinearMap class
definition: public class MyLinearMap<K, V> implements Map<K, V> {
private List<Entry> entries = new ArrayList<Entry>();
This class uses two type parameters, K , which is the type of
the keys, and V , which is the type of the values.
MyLinearMap implements Map , which means it has to
provide the methods in the Map interface.
A MyLinearMap object has a single instance variable,
entries , which is an ArrayList of Entry
objects. Each Entry contains a key-value pair. Here is the
definition: public class Entry implements Map.Entry<K, V> {
private K key;
private V value;
public Entry(K key, V value) {
this.key = key;
this.value = value;
}
@Override
public K getKey() {
return key;
}
@Override
public V getValue() {
return value;
}
}
There’s not much to it; an Entry is just a container for a key
and a value. This definition is nested inside MyLinearList , so
it uses the same type parameters, K and V .
That’s all you need to do the exercise, so let’s get started.
9.2 Exercise 7
In the repository for this book,
you’ll find the source files for this exercise: MyLinearMap.java contains starter code for the first part
of the exercise.MyLinearMapTest.java contains the unit tests for
MyLinearMap .
You’ll also find the Ant build file
build.xml .
Run ant build to compile the source files. Then run ant MyLinearMapTest ; several tests should fail, because you have some
work to do!
First, fill in the body of findEntry . This is a helper method
that is not part of the Map interface, but once you get it
working, you can use it for several methods. Given a target key, it
should search through the entries and return the entry that contains
the target (as a key, not a value) or null if it’s not
there. Notice that I provide an equals method that
compares two keys and handles null correctly.
You can run ant MyLinearMapTest again, but even if your
findEntry is correct, the tests won’t pass because put
is not complete.
Fill in put . You should read the documentation of
Map.put at http://thinkdast.com/listput so you know what
it is supposed to do. You might want to start with a version of
put that always adds a new entry and does not modify an
existing entry; that way you can test the simple case first. Or if
you feel more confident, you can write the whole thing at once.
Once you’ve got put working, the test for containsKey
should pass. Read the documentation of Map.get at
http://thinkdast.com/listget
and then fill in the method. Run the tests again. Finally, read the documentation of Map.remove at
http://thinkdast.com/maprem
and fill in the method. At this point, all tests should pass. Congratulations!
9.3 Analyzing MyLinearMap
In this section I present a solution to the previous exercise and
analyze the performance of the core methods. Here are
findEntry and equals : private Entry findEntry(Object target) {
for (Entry entry: entries) {
if (equals(target, entry.getKey())) {
return entry;
}
}
return null;
}
private boolean equals(Object target, Object obj) {
if (target == null) {
return obj == null;
}
return target.equals(obj);
}
The run time of equals might depend on the size of the
target and the keys, but does not generally depend on
the number of entries, n. So equals is constant time.
In findEntry , we might get lucky and find the key we’re looking
for at the beginning, but we can’t count on it. In general, the number
of entries we have to search is proportional to n, so
findEntry is linear.
Most of the core methods in MyLinearMap use findEntry ,
including put , get , and remove . Here’s what
they look like: public V put(K key, V value) {
Entry entry = findEntry(key);
if (entry == null) {
entries.add(new Entry(key, value));
return null;
} else {
V oldValue = entry.getValue();
entry.setValue(value);
return oldValue;
}
}
public V get(Object key) {
Entry entry = findEntry(key);
if (entry == null) {
return null;
}
return entry.getValue();
}
public V remove(Object key) {
Entry entry = findEntry(key);
if (entry == null) {
return null;
} else {
V value = entry.getValue();
entries.remove(entry);
return value;
}
}
After put calls findEntry , everything else is constant
time. Remember that entries is an ArrayList , so adding
an element at the end is constant time, on average. If the key is
already in the map, we don’t have to add an entry, but we have to call
entry.getValue and entry.setValue , and those are both
constant time. Putting it all together, put is linear.
By the same reasoning, get is also linear. remove is slightly more complicated because
entries.remove might have to remove an element from the
beginning or middle of the ArrayList , and that takes linear
time. But that’s OK: two linear operations are still linear.
In summary, the core methods are all linear, which is why we called this
implementation MyLinearMap (ta-da!). If we know that the number of entries will be small, this implementation
might be good enough, but we can do better. In fact, there is an
implementation of Map where all of the core methods are
constant time. When you first hear that, it might not seem possible.
What we are saying, in effect, is that you can find a needle in a
haystack in constant time, regardless of how big the haystack is. It’s
magic.
I’ll explain how it works in two steps: - Instead of storing entries in one big
List , we’ll break them
up into lots of short lists. For each key, we’ll use a hash
code (explained in the next section) to determine which list to use. - Using lots of short lists is faster than using just one, but as I’ll
explain, it doesn’t change the order of growth; the core operations
are still linear. But there is one more trick: if we increase the
number of lists to limit the number of entries per list, the result is
a constant-time map. You’ll see the details in the next exercise, but
first: hashing!
In the next chapter, I’ll present a solution, analyze the performance
of the core Map methods, and introduce a more efficient
implementation.
|