|
This HTML version of Think Data Structures is provided for convenience, but it is not the best format of the book. In particular, some of the symbols are not rendered correctly. You might prefer to read the PDF version. Or you can buy this book on Amazon.com. Chapter 10 HashingIn this chapter, I define
10.1 HashingTo improve the performance of Here’s the beginning of the class definition: public class MyBetterMap<K, V> implements Map<K, V> {
protected List<MyLinearMap<K, V>> maps;
public MyBetterMap(int k) {
makeMaps(k);
}
protected void makeMaps(int k) {
maps = new ArrayList<MyLinearMap<K, V>>(k);
for (int i=0; i<k; i++) {
maps.add(new MyLinearMap<K, V>());
}
}
}
The instance variable, Now, the key to making this work is that we need some way to look at a
key and decide which of the embedded maps it should go into. When we
One possibility is to choose one of the sub-maps at random and keep
track of where we put each key. But how should we keep track? It might
seem like we could use a A better approach is to use a hash function, which takes an
In Java, every Here’s a helper method that chooses the right sub-map for a given key: protected MyLinearMap<K, V> chooseMap(Object key) {
int index = 0;
if (key != null) {
index = Math.abs(key.hashCode()) % maps.size();
}
return maps.get(index);
}
If We use Here’s my implementation of public V put(K key, V value) {
MyLinearMap<K, V> map = chooseMap(key);
return map.put(key, value);
}
public V get(Object key) {
MyLinearMap<K, V> map = chooseMap(key);
return map.get(key);
}
Pretty simple, right? In both methods, we use If there are n entries split up among k sub-maps, there will be n/k entries per map, on average. When we look up a key, we have to compute its hash code, which takes some time, then we search the corresponding sub-map. Because the entry lists in
10.2 How does hashing work?The fundamental requirement for a hash function is that the same object should produce the same hash code every time. For immutable objects, that’s relatively easy. For objects with mutable state, we have to think harder. As an example of an immutable object, I’ll define a class called
public class SillyString {
private final String innerString;
public SillyString(String innerString) {
this.innerString = innerString;
}
public String toString() {
return innerString;
}
This class is not very useful, which is why it’s called
@Override
public boolean equals(Object other) {
return this.toString().equals(other.toString());
}
@Override
public int hashCode() {
int total = 0;
for (int i=0; i<innerString.length(); i++) {
total += innerString.charAt(i);
}
return total;
}
Notice that
This hash function satisfies the requirement: if two
This works correctly, but it might not yield good performance,
because it returns the same hash code for many different strings. If two
strings contain the same letters in any order, they will have the same
hash code. And even if they don’t contain the same letters, they might
yield the same total, like If many objects have the same hash code, they end up in the same sub-map. If some sub-maps have more entries than others, the speedup when we have k maps might be much less than k. So one of the goals of a hash function is to be uniform; that is, it should be equally likely to produce any value in the range. You can read more about designing good hash functions at http://thinkdast.com/hash. 10.3 Hashing and mutation
But let’s see what happens with a mutable object. Here’s a definition
for public class SillyArray {
private final char[] array;
public SillyArray(char[] array) {
this.array = array;
}
public String toString() {
return Arrays.toString(array);
}
@Override
public boolean equals(Object other) {
return this.toString().equals(other.toString());
}
@Override
public int hashCode() {
int total = 0;
for (int i=0; i<array.length; i++) {
total += array[i];
}
System.out.println(total);
return total;
}
public void setChar(int i, char c) {
this.array[i] = c;
}
Now suppose we create a SillyArray array1 = new SillyArray("Word1".toCharArray());
map.put(array1, 1);
The hash code for this array is 461. Now if we modify the contents of the array and then try to look it up, like this: array1.setChar(0, 'C'); Integer value = map.get(array1); the hash code after the mutation is 441. With a different hash code, there’s a good chance we’ll go looking in the wrong sub-map. In that case, we won’t find the key, even though it is in the map. And that’s bad. In general, it is dangerous to use mutable objects as keys in data
structures that use hashing, which includes 10.4 Exercise 8In this exercise, you will finish off the implementation of
As usual, you should run Review the implementation of Fill in the body of Like |
Are you using one of our books in a class?We'd like to know about it. Please consider filling out this short survey.
|