This HTML version of Think Complexity, 2nd Edition is provided for convenience, but it is not the best format of the book. In particular, some of the symbols are not rendered correctly.
You might prefer to read the PDF version.
Chapter 2 Graphs
The first three chapters of this book are about models that describe systems that are made up of components and connections between components. For example, in a social network, the components are people and connections represent friendships, business relationships, etc. In an ecological food web, the components are species and the connections represent predator-prey relationships.
In this chapter, I introduce NetworkX, a Python package for building and studying these models. We start with the Erdős-Rényi model, which has interesting mathematical properties. In the next chapter we move on to models that are more useful for explaining real-world systems.
The code for this chapter is in chap02.ipynb in the repository for this book. More information about working with the code is in Section ??.
2.1 What is a graph?
For example, you could represent a road map with a node for each city and an edge for each road between cities. Or you could represent a social network using a node for each person, with an edge between two people if they are friends and no edge otherwise.
In some graphs, edges have attributes like length, cost, or weight. For example, in a road map, the length of an edge might represent the distance between two cities, or the travel time. In a social network there might be different kinds of edges to represent different kinds of relationships: friends, business associates, etc.
Edges may be directed or undirected, depending on whether the relationships they represent are asymmetric or symmetric. In a road map, you might represent a one-way street with a directed edge and a two-way street with an undirected edge. In some social networks, like Facebook, friendship is symmetric: if A is friends with B then B is friends with A. But on Twitter, for example, the “follows” relationship is not symmetric; if A follows B, that doesn’t imply that B follows A. So you might use undirected edges to represent a Facebook network and directed edges for Twitter.
Graphs are also useful, because there are many real world problems that can be solved using graph algorithms. For example, Dijkstra’s shortest path algorithm is an efficient way to find the shortest path from a node to all other nodes in a graph. A path is a sequence of nodes with an edge between each consecutive pair.
Graphs are usually drawn with squares or circles for nodes and lines for edges. For example, the directed graph in Figure ?? might represent three people who follow each other on Twitter. The thick part of the line indicates edge direction. In this example, Alice and Bob follow each other and both follow Chuck, but Chuck follows no one.
The undirected graph in Figure ?? shows four cities in the north-east United States; the labels on the edges indicate driving time in hours. In this example the placement of the nodes corresponds roughly to the geography of the cities, but in general the layout of a graph is arbitrary.
To represent graphs, we’ll use a package called NetworkX, which is the most commonly used network library in Python. You can read more about it at https://networkx.github.io/, but I’ll explain it as we go along.
We can create a directed graph by importing NetworkX (usually imported as
At this point,
Now we can use the
Adding edges works pretty much the same way:
And we can use
NetworkX provides several functions for drawing graphs;
And that’s the code I use to generate Figure ??.
To generate Figure ??, I start with a dictionary that maps from each city name to its approximate longitude and latitude:
Since this is an undirected graph, I instantiate
Then I can use
Next I’ll make a dictionary that maps from each edge to the corresponding driving time:
Now I can use
Now instead of
To add the edge labels, we use
2.3 Random graphs
A random graph is just what it sounds like: a graph with nodes and edges generated at random. Of course, there are many random processes that can generate graphs, so there are many kinds of random graphs.
An Erdős-Rényi graph (ER graph) is characterized by two parameters: n is the number of nodes and p is the probability that there is an edge between any two nodes. See http://en.wikipedia.org/wiki/Erdos-Renyi_model.
One of the properties that displays this kind of transition is connectivity. An undirected graph is connected if there is a path from every node to every other node.
In an ER graph, the probability that the graph is connected is very low when p is small and nearly 1 when p is large. Between these two regimes, there is a rapid transition at a particular value of p, denoted p*.
Erdős and Rényi showed that this critical value is p* = lnn / n, where n is the number of nodes. A random graph, G(n, p), is unlikely to be connected if p < p* and very likely to be connected if p > p*.
To test this claim, we’ll develop algorithms to generate random graphs and check whether they are connected.
2.4 Generating graphs
I’ll start by generating a complete graph, which is a graph where every node is connected to every other.
Here’s a generator function that takes a list of nodes and enumerates all distinct pairs. If you are not familiar with generator functions, you can read about them at http://intermediatepythonista.com/python-generators.
We can use
The following code makes a complete graph with 10 nodes and draws it.
Figure ?? shows the result. Soon we will modify this code to generate ER graphs, but first we’ll develop functions to check whether a graph is connected.
2.5 Connected graphs
A graph is connected if there is a path from every node to every other node (see http://en.wikipedia.org/wiki/Connectivity_(graph_theory)).
For many applications involving graphs, it is useful to check whether a graph is connected. Fortunately, there is a simple algorithm that does it.
You can start at any node and check whether you can reach all other nodes. If you can reach a node, v, you can reach any of the neighbors of v, which is any node connected by v by an edge.
Suppose we start at node s. We can mark s as “seen”, then we can mark its neighbors. Then we mark the neighbor’s neighbors, and so on, until we can’t reach any more nodes. If all nodes are seen, the graph is connected.
Here’s what that looks like in Python:
Initially the set,
Now, each time through the loop, we
When the stack is empty, we can’t reach any more nodes, so we
break out of the loop and return
As an example, we can find all nodes in the complete graph that are reachable from node 0:
Initially, the stack contains node 0 and
The next time through the loop,
Notice that the same node can appear more than once in the stack; in fact, a node with k neighbors will be added to the stack k times. Later we will look for ways to make this algorithm more efficient.
We can use
A complete graph is, not surprisingly, connected:
In the next section we will generate ER graphs and check whether they are connected.
2.6 Generating ER graphs
The ER graph G(n, p) contains n nodes, and each pair of nodes is connected by an edge with probability p. Generating an ER graph is similar to generating a complete graph.
The following generator function enumerates all possible edges and
uses a helper function,
Here’s an example with
Figure ?? shows the result. This graph turns out to be connected; in fact, most ER graphs with n=10 and p=0.3 are connected. In the next section, we’ll see how many.
2.7 Probability of connectivity
For given values of n and p, we would like to know the probability that G(n, p) is connected. We can estimate it by generating a large number of random graphs and counting how many are connected. Here’s how:
Out of 10000 ER graphs with these parameters, 6498 are connected, so we estimate that 65% of them are connected. I chose 0.3 because it is close to the critical value where the probability of connectivity goes from near 0 to near 1. According to Erdős and Rényi, p* = lnn / n = 0.23.
We can get a clearer view of the transition by estimating the probability of connectivity for a range of values of p:
This is the first example we’ve seen using NumPy. Following convention,
I import NumPy as np. The function
Figure ?? shows the results, with a vertical line at p*. The transition from 0 to 1 occurs near the predicted critical value, 0.23. With p on a log scale, the transition is roughly symmetric.
Figure ?? shows similar results for larger values of n. As n increases, the critical value gets smaller and the transition gets more abrupt.
These experiments are consistent with the results Erdős and Rényi proved in their papers.
2.8 Analysis of graph algorithms
In this chapter I presented an algorithm for checking whether a graph is connected; in the next few chapters, we will see other graph algorithms. And we will analyze the performance of those algorithms, figuring out how their run times grow as the size of the graphs increases.
If you are not already familiar with analysis of algorithms, you should read Appendix ?? before you continue.
As an example, let’s analyze
Each time through the loop, we pop a node off the stack; by default,
Next we check whether the node is in
If the node is not already in
To express the run time in terms of n and m, we can add up
the total number of times each node is added
Each node is only added to
But nodes might be added to
So the total number
of additions to
If we know the relationship between n and m, we can simplify
this expression. For example, in a complete graph the number of edges
is n(n−1)/2, which is in O(n2). So for a complete graph,
The code for this chapter is in
Exercise 1 Launch
Exercise 2 In Section ?? we analyzed the performance of
Exercise 3 In my implementation of
There are actually two kinds of ER graphs. The one we generated in this chapter, G(n, p), is characterized by two parameters, the number of nodes and the probability of an edge between nodes.
An alternative definition, denoted G(n, m), is also characterized by two parameters: the number of nodes, n, and the number of edges, m. Under this definition, the number of edges is fixed, but their location is random.
Repeat the experiments we did in this chapter using this alternative definition. Here are a few suggestions for how to proceed:
1. Write a function called
2. Write a function called
3. Make a version of
4. Compute the probability of connectivity for a range of values of m.
How do the results of this experiment compare to the results using the first type of ER graph?
Are you using one of our books in a class?We'd like to know about it. Please consider filling out this short survey.