|
This HTML version of Think Data Structures is provided for convenience, but it is not the best format of the book. In particular, some of the symbols are not rendered correctly. You might prefer to read the PDF version. Or you can buy this book on Amazon.com. Chapter 4 LinkedListThis chapter presents solutions to the previous exercise and continues the discussion of analysis of algorithms. 4.1 Classifying
|
||||||||||||||||||||||||||||||||||||||||||||||
| MyArrayList | MyLinkedList | |
| add (at the end) | 1 | n |
| add (at the beginning) | n | 1 |
| add (in general) | n | n |
| get / set | 1 | n |
| indexOf / lastIndexOf | n | n |
| isEmpty / size | 1 | 1 |
| remove (from the end) | 1 | n |
| remove (from the beginning) | n | 1 |
| remove (in general) | n | n |
The operations where MyArrayList has an advantage are adding at
the end, removing from the end, getting and setting.
The operations where MyLinkedList has an advantage are adding
at the beginning and removing from the beginning.
For the other operations, the two implementations are in the same order of growth.
Which implementation is better? It depends on which operations you are likely to use the most. And that’s why Java provides more than one implementation, because it depends.
For the next exercise I provide a class called Profiler that
contains code that runs a method with a range of problem sizes,
measures run times, and plots the results.
You will use Profiler to classify the performance
of the add method for the Java implementations of
ArrayList and LinkedList.
Here’s an example that shows how to use the profiler:
public static void profileArrayListAddEnd() {
Timeable timeable = new Timeable() {
List<String> list;
public void setup(int n) {
list = new ArrayList<String>();
}
public void timeMe(int n) {
for (int i=0; i<n; i++) {
list.add("a string");
}
}
};
String title = "ArrayList add end";
Profiler profiler = new Profiler(title, timeable);
int startN = 4000;
int endMillis = 1000;
XYSeries series = profiler.timingLoop(startN, endMillis);
profiler.plotResults(series);
}
This method measures the time it takes to run add on an
ArrayList, which adds the new element at the end. I’ll explain
the code and then show the results.
In order to use Profiler, we need to create a Timeable
object that provides two methods: setup and timeMe.
The setup method does whatever needs to be done before we start
the clock; in this case it creates an empty list. Then timeMe
does whatever operation we are trying to measure; in this case it adds
n elements to the list.
The code that creates timeable is an anonymous class that
defines a new implementation of the Timeable interface and
creates an instance of the new class at the same time. If you are not
familiar with anonymous classes, you can read about them here:
http://thinkdast.com/anonclass.
But you don’t need to know much for the next exercise; even if you are not comfortable with anonymous classes, you can copy and modify the example code.
The next step is to create the Profiler object, passing the
Timeable object and a title as parameters.
The Profiler provides timingLoop which uses the
Timeable object stored as an instance variable. It invokes the
timeMe method on the Timeable object several times
with a range of values of n. timingLoop takes two
parameters:
startN is the value of n the timing loop should
start at.endMillis is a threshold in milliseconds. As
timingLoop increases the problem size, the run time increases;
when the run time exceeds this threshold, timingLoop stops.When you run the experiments, you might have to adjust these
parameters. If startN is too low, the run time might be too
short to measure accurately. If endMillis is too low, you might
not get enough data to see a clear relationship between problem size and
run time.
This code is in ProfileListAdd.java, which you’ll run in the next
exercise. When I ran it, I got this output:
4000, 3 8000, 0 16000, 1 32000, 2 64000, 3 128000, 6 256000, 18 512000, 30 1024000, 88 2048000, 185 4096000, 242 8192000, 544 16384000, 1325
The first column is problem size, n; the second column is
run time in milliseconds. The first few measurements are pretty noisy; it
might have been better to set startN around 64000.
The result from timingLoop is an XYSeries that
contains this data. If you pass this series to plotResults, it
generates a plot like the one in Figure 4.1.
The next section explains how to interpret it.
Based on our understanding of how ArrayList works, we expect
the add method to take constant time when we add elements to
the end. So the total time to add n elements should be linear.
To test that theory, we could plot total run time versus problem size, and we should see a straight line, at least for problem sizes that are big enough to measure accurately. Mathematically, we can write the function for that line:
| runtime = a + b n |
where a is the intercept of the line and b is the slope.
On the other hand, if add is linear, the total time for
n adds would be quadratic. If we plot run time versus problem
size, we expect to see a parabola. Or mathematically, something like:
| runtime = a + b n + c n2 |
With perfect data, we might be able to tell the difference between a straight line and a parabola, but if the measurements are noisy, it can be hard to tell. A better way to interpret noisy measurements is to plot run time and problem size on a log-log scale.
Why? Let’s suppose that run time is proportional to nk, but we don’t know what the exponent k is. We can write the relationship like this:
| runtime = a + b n + … + c nk |
For large values of n, the term with the largest exponent is the most important, so:
| runtime ≈ c nk |
where ≈ means “approximately equal”. Now, if we take the logarithm of both sides of this equation:
| log(runtime) ≈ log(c) + k log(n) |
This equation implies that if we plot runtime versus n on a log-log scale, we expect to see a straight line with intercept log(c) and slope k. We don’t care much about the intercept, but the slope indicates the order of growth: if k=1, the algorithm is linear; if k=2, it’s quadratic.
Looking at the figure in the previous section, you can estimate the
slope by eye. But when you call plotResults it computes a least
squares fit to the data and prints the estimated slope. In this
example:
Estimated slope = 1.06194352346708
which is close to 1; and that suggests that the total time for n adds is linear, so each add is constant time, as expected.
One important point: if you see a straight line on a graph like this, that does not mean that the algorithm is linear. If the run time is proportional to nk for any exponent k, we expect to see a straight line with slope k. If the slope is close to 1, that suggests the algorithm is linear. If it is close to 2, it’s probably quadratic.
In the repository for this book you’ll find the source files you need for this exercise:
Profiler.java contains the implementation of the
Profiler class described above. You will use this class, but
you don’t have to know how it works. But feel free to read the source.ProfileListAdd.java contains starter code for this exercise,
including the example, above, which profiles
ArrayList.add. You will modify this file to profile a few
other methods.Also, in the code directory , you’ll find the Ant build file
build.xml.
Run ant ProfileListAdd to run ProfileListAdd.java. You should
get results similar to Figure 4.1, but you might have to adjust
startN or endMillis. The estimated slope should be close
to 1, indicating that performing n add operations takes time
proportional to n raised to the exponent 1; that is, it is
in O(n).
In ProfileListAdd.java, you’ll find an empty method named
profileArrayListAddBeginning. Fill in the body of this method
with code that tests ArrayList.add, always putting the new
element at the beginning. If you start with a copy of
profileArrayListAddEnd, you should only have to make a few
changes. Add a line in main to invoke this method.
Run ant ProfileListAdd again and interpret the results. Based on
our understanding of how ArrayList works, we expect each add
operation to be linear, so the total time for n adds should be
quadratic. If so, the estimated slope of the line, on a log-log scale,
should be near 2. Is it?
Now let’s compare that to the performance of LinkedList. Fill
in the body of profileLinkedListAddBeginning and use it to
classify LinkedList.add when we put the new element at the
beginning. What performance do you expect? Are the results consistent
with your expectations?
Finally, fill in the body of profileLinkedListAddEnd and use it
to classify LinkedList.add when we put the new element at the
end. What performance do you expect? Are the results consistent with
your expectations?
I’ll present results and answer these questions in the next chapter.