This HTML version of Think Data Structures is provided for convenience, but it is not the best format of the book. In particular, some of the symbols are not rendered correctly.

You might prefer to read the PDF version.

Or you can buy this book on Amazon.com.

Chapter 9 The Map interface

In the next few exercises, I present several implementations of the Map interface. One of them is based on a hash table, which is arguably the most magical data structure ever invented. Another, which is similar to TreeMap, is not quite as magical, but it has the added capability that it can iterate the elements in order.

You will have a chance to implement these data structures, and then we will analyze their performance.

But before we can explain hash tables, we’ll start with a simple implementation of a Map using a List of key-value pairs.

9.1 Implementing `MyLinearMap`

As usual, I provide starter code and you will fill in the missing methods. Here’s the beginning of the MyLinearMap class definition:

public class MyLinearMap<K, V> implements Map<K, V> {

    private List<Entry> entries = new ArrayList<Entry>();

This class uses two type parameters, K, which is the type of the keys, and V, which is the type of the values. MyLinearMap implements Map, which means it has to provide the methods in the Map interface.

A MyLinearMap object has a single instance variable, entries, which is an ArrayList of Entry objects. Each Entry contains a key-value pair. Here is the definition:

    public class Entry implements Map.Entry<K, V> {
        private K key;
        private V value;
        
        public Entry(K key, V value) {
            this.key = key;
            this.value = value;
        }
        
        @Override
        public K getKey() {
            return key;
        }
        @Override
        public V getValue() {
            return value;
        }
    }

There’s not much to it; an Entry is just a container for a key and a value. This definition is nested inside MyLinearList, so it uses the same type parameters, K and V.

That’s all you need to do the exercise, so let’s get started.

9.2 Exercise 7

In the repository for this book, you’ll find the source files for this exercise:

MyLinearMap.java contains starter code for the first part of the exercise.
MyLinearMapTest.java contains the unit tests for MyLinearMap.

You’ll also find the Ant build file build.xml.

Run ant build to compile the source files. Then run ant MyLinearMapTest; several tests should fail, because you have some work to do!

First, fill in the body of findEntry. This is a helper method that is not part of the Map interface, but once you get it working, you can use it for several methods. Given a target key, it should search through the entries and return the entry that contains the target (as a key, not a value) or null if it’s not there. Notice that I provide an equals method that compares two keys and handles null correctly.

You can run ant MyLinearMapTest again, but even if your findEntry is correct, the tests won’t pass because put is not complete.

Fill in put. You should read the documentation of Map.put at http://thinkdast.com/listput so you know what it is supposed to do. You might want to start with a version of put that always adds a new entry and does not modify an existing entry; that way you can test the simple case first. Or if you feel more confident, you can write the whole thing at once.

Once you’ve got put working, the test for containsKey should pass.

Read the documentation of Map.get at http://thinkdast.com/listget and then fill in the method. Run the tests again.

Finally, read the documentation of Map.remove at http://thinkdast.com/maprem and fill in the method.

At this point, all tests should pass. Congratulations!

9.3 Analyzing `MyLinearMap`

In this section I present a solution to the previous exercise and analyze the performance of the core methods. Here are findEntry and equals:

private Entry findEntry(Object target) {
    for (Entry entry: entries) {
        if (equals(target, entry.getKey())) {
            return entry;
        }
    }
    return null;
}

private boolean equals(Object target, Object obj) {
    if (target == null) {
        return obj == null;
    }
    return target.equals(obj);
}

The run time of equals might depend on the size of the target and the keys, but does not generally depend on the number of entries, n. So equals is constant time.

In findEntry, we might get lucky and find the key we’re looking for at the beginning, but we can’t count on it. In general, the number of entries we have to search is proportional to n, so findEntry is linear.

Most of the core methods in MyLinearMap use findEntry, including put, get, and remove. Here’s what they look like:

public V put(K key, V value) {
    Entry entry = findEntry(key);
    if (entry == null) {
        entries.add(new Entry(key, value));
        return null;
    } else {
        V oldValue = entry.getValue();
        entry.setValue(value);
        return oldValue;
    }
}

public V get(Object key) {
    Entry entry = findEntry(key);
    if (entry == null) {
        return null;
    }
    return entry.getValue();
}

public V remove(Object key) {
    Entry entry = findEntry(key);
    if (entry == null) {
        return null;
    } else {
        V value = entry.getValue();
        entries.remove(entry);
        return value;
    }
}

After put calls findEntry, everything else is constant time. Remember that entries is an ArrayList, so adding an element at the end is constant time, on average. If the key is already in the map, we don’t have to add an entry, but we have to call entry.getValue and entry.setValue, and those are both constant time. Putting it all together, put is linear.

By the same reasoning, get is also linear.

remove is slightly more complicated because entries.remove might have to remove an element from the beginning or middle of the ArrayList, and that takes linear time. But that’s OK: two linear operations are still linear.

In summary, the core methods are all linear, which is why we called this implementation MyLinearMap (ta-da!).

If we know that the number of entries will be small, this implementation might be good enough, but we can do better. In fact, there is an implementation of Map where all of the core methods are constant time. When you first hear that, it might not seem possible. What we are saying, in effect, is that you can find a needle in a haystack in constant time, regardless of how big the haystack is. It’s magic.

I’ll explain how it works in two steps:

Instead of storing entries in one big List, we’ll break them up into lots of short lists. For each key, we’ll use a hash code (explained in the next section) to determine which list to use.
Using lots of short lists is faster than using just one, but as I’ll explain, it doesn’t change the order of growth; the core operations are still linear. But there is one more trick: if we increase the number of lists to limit the number of entries per list, the result is a constant-time map. You’ll see the details in the next exercise, but first: hashing!

In the next chapter, I’ll present a solution, analyze the performance of the core Map methods, and introduce a more efficient implementation.