This HTML version of Think Data Structures is provided for convenience, but it is not the best format of the book.
In particular, some of the symbols are not rendered correctly.
You might prefer to read the PDF version.
Or you can buy this book on Amazon.com.
Chapter 5 Doubly-linked list
This chapter reviews results from the previous exercise and introduces yet
another implementation of the List interface, the doubly-linked
list.
5.1 Performance profiling results
In the previous exercise, we used Profiler.java to run various
ArrayList and LinkedList operations with a range of
problem sizes. We plotted run time versus problem size on a log-log
scale and estimated the slope of the resulting curve, which indicates
the leading exponent of the relationship between run time and problem
size.
For example, when we used the add method to add elements
to the end of an ArrayList , we found that the total time to
perform n adds was proportional to n; that is, the
estimated slope was close to 1. We concluded that performing n
adds is in O(n), so on average the time for a single add is
constant time, or O(1), which is what we expect based on algorithm
analysis.
The exercise asks you to fill in the body of
profileArrayListAddBeginning , which tests the performance of
adding new elements at the beginning of an ArrayList . Based on our
analysis, we expect each add to be linear, because it has to shift the
other elements to the right; so we expect n adds to be
quadratic.
Here’s a solution, which you can find in
the solution directory of the repository. public static void profileArrayListAddBeginning() {
Timeable timeable = new Timeable() {
List<String> list;
public void setup(int n) {
list = new ArrayList<String>();
}
public void timeMe(int n) {
for (int i=0; i<n; i++) {
list.add(0, "a string");
}
}
};
int startN = 4000;
int endMillis = 10000;
runProfiler("ArrayList add beginning", timeable, startN, endMillis);
}
This method is almost identical to profileArrayListAddEnd . The
only difference is in timeMe , which uses the two-parameter
version of add to put the new element at index 0. Also, we
increased endMillis to get one additional data point. Here are the timing results (problem size on the left, run time in
milliseconds on the right): 4000, 14
8000, 35
16000, 150
32000, 604
64000, 2518
128000, 11555
Figure 5.1
shows the graph of run time versus problem size.
Figure 5.1: Profiling results: run time versus problem size for adding
n elements at the beginning of an ArrayList . |
Remember that a straight line on this graph does not mean that
the algorithm is linear. Rather, if the run time is proportional to
nk for any exponent, k, we expect to see a straight
line with slope k. In this case, we expect the total time for
n adds to be proportional to n2, so we expect a
straight line with slope 2. In fact, the estimated slope is 1.992, which
is so close I would be afraid to fake data this good.
5.2 Profiling LinkedList methods
In the previous exercise you also profiled the performance of adding new
elements at the beginning of a LinkedList . Based on our
analysis, we expect each add to take constant time, because in
a linked list, we don’t have to shift the existing elements; we can just
add a new node at the beginning. So we expect the total time for
n adds to be linear.
Here’s a solution: public static void profileLinkedListAddBeginning() {
Timeable timeable = new Timeable() {
List<String> list;
public void setup(int n) {
list = new LinkedList<String>();
}
public void timeMe(int n) {
for (int i=0; i<n; i++) {
list.add(0, "a string");
}
}
};
int startN = 128000;
int endMillis = 2000;
runProfiler("LinkedList add beginning", timeable, startN, endMillis);
}
We only had a make a few changes, replacing
ArrayList with LinkedList and adjusting
startN and endMillis to get a good range of data.
The measurements were noisier than the previous batch; here
are the results: 128000, 16
256000, 19
512000, 28
1024000, 77
2048000, 330
4096000, 892
8192000, 1047
16384000, 4755
Figure 5.2
shows the graph of these results.
Figure 5.2: Profiling results: run time versus problem size for adding
n elements at the beginning of a LinkedList . |
It’s not a very straight line, and the slope is not exactly 1; the slope
of the least squares fit is 1.23. But these results indicate that the
total time for n adds is at least approximately O(n),
so each add is constant time.
5.3 Adding to the end of a LinkedList
Adding elements at the beginning is one of the operations where we
expect LinkedList to be faster than ArrayList . But for
adding elements at the end, we expect LinkedList to be slower.
In my implementation, we have to traverse the entire list to add an
element to the end, which is linear. So we expect the total time for
n adds to be quadratic.
Well, it’s not. Here’s the code: public static void profileLinkedListAddEnd() {
Timeable timeable = new Timeable() {
List<String> list;
public void setup(int n) {
list = new LinkedList<String>();
}
public void timeMe(int n) {
for (int i=0; i<n; i++) {
list.add("a string");
}
}
};
int startN = 64000;
int endMillis = 1000;
runProfiler("LinkedList add end", timeable, startN, endMillis);
}
Here are the results: 64000, 9
128000, 9
256000, 21
512000, 24
1024000, 78
2048000, 235
4096000, 851
8192000, 950
16384000, 6160
Figure 5.3
shows the graph of these results.
Figure 5.3: Profiling results: runtime versus problem size for adding
n elements at the end of a LinkedList . |
Again, the measurements are noisy and the line is not perfectly
straight, but the estimated slope is 1.19, which is close to what
we got adding elements at the beginning, and not very close to 2, which
is what we expected based on our analysis. In fact, it is closer to 1,
which suggests that adding elements at the end is constant time.
What’s going on?
5.4 Doubly-linked list
My implementation of a linked list, MyLinkedList , uses a
singly-linked list; that is, each element contains a link to the next,
and the MyArrayList object itself has a link to the first node.
But if you read the documentation of LinkedList at
http://thinkdast.com/linked,
it says:
Doubly-linked list implementation of the List and Deque
interfaces. […] All of the operations perform as
could be expected for a doubly-linked list. Operations that
index into the list will traverse the list from the beginning or the
end, whichever is closer to the specified index.
If you are not familiar with doubly-linked lists, you can read more
about them at http://thinkdast.com/doublelist,
but the short version is: -
Each node contains a link to the next node and a link to the previous
node.
- The
LinkedList object contains links to the first and last
elements of the list.
So we can start at either end of the list and traverse it in either
direction. As a result, we can add and remove elements from the
beginning and the end of the list in constant time!
The following table summarizes the performance we expect from
ArrayList , MyLinkedList (singly-linked), and
LinkedList (doubly-linked): |
| MyArrayList | MyLinkedList | LinkedList
|
|
add (at the end) | 1 | n | 1
|
add (at the beginning) | n | 1 | 1
|
add (in general) | n | n | n
|
get / set | 1 | n | n
|
indexOf / lastIndexOf | n | n | n
|
isEmpty / size | 1 | 1 | 1
|
remove (from the end) | 1 | n | 1
|
remove (from the beginning) | n | 1 | 1
|
remove (in general) | n | n | n
|
|
5.5 Choosing a Structure
The doubly-linked implementation is better than ArrayList for
adding and removing at the beginning, and just as good as
ArrayList for adding and removing at the end. So the only
advantage of ArrayList is for get and set ,
which require linear time in a linked list, even if it is doubly-linked.
If you know that the run time of your application depends on the time it
takes to get and set elements, an ArrayList
might be the better choice. If the run time depends on adding and
removing elements near the beginning or the end, LinkedList
might be better.
But remember that these recommendations are based on the order of growth
for large problems. There are other factors to consider: - If these operations don’t take up a substantial fraction of the
run time for your application — that is, if your applications spends
most of its time doing other things — then your choice of a
List implementation won’t matter very much. - If the lists you are working with are not very big, you might not get
the performance you expect. For small problems, a quadratic algorithm
might be faster than a linear algorithm, or linear might be faster
than constant time. And for small problems, the difference probably
doesn’t matter.
- Also, don’t forget about space. So far we have focused on run time, but
different implementations require different amounts of space. In an
ArrayList , the elements are stored side-by-side in a single
chunk of memory, so there is little wasted space, and computer
hardware is often faster with contiguous chunks. In a linked list,
each element requires a node with one or two links. The links take up
space (sometimes more than the data!), and with nodes scattered
around in memory, the hardware might be less efficient.
In summary, analysis of algorithms provides some guidance for choosing
data structures, but only if - The run time of your application is important,
- The run time of your application depends on your choice of data
structure, and
- The problem size is large enough that the order of growth actually
predicts which data structure is better.
You could have a long career as a software engineer without ever finding
yourself in this situation.
|