Case study: Directed graphs and knots

Imagine that it’s 2 a.m. during finals week, and you’re scrambling to finish a research paper on topica obscura. Your adrenaline jumps when you find a relevant Wikipedia article, with links to more Wikipedia articles! You start clicking away, jumping from page to page in search of facts. An hour later, you realize you’re still clicking, but these are pages you’ve already visited. No matter which link you click, you can’t seem to discover any new pages!

If this has happened to you, you’ve unknowingly (and unfortunately) stumbled upon a knot in Wikipedia. Knots are a unique property of directed graphs. To understand them, it is necessary to have a basic understanding of directed graphs. The graphs in the rest of this book are undirected. In an undirected graph, an edge represents a symmetric relationship between vertices. This abstraction is appropriate for some applications, such as acquaintance networks and transportation networks; but other applications involve asymmetric relationships, for example, links between pages in the World Wide Web.

Wikipedia links are also asymmetric. Page A might link to page B, but page B doesn’t have to include any links to page A. To fully describe the relationship between pages A and B, we need two bits of information: whether A links to B, and whether B links to A . This is the essence of a directed graph. In a directed graph, a directed edge, e = (A,B), is an edge from A to B.

Knots are a unique (and sometimes cruel) property of directed graphs. A knot is a collection of vertices and edges with the property that every vertex in the knot has outgoing edges, and all outgoing edges from vertices in the knot terminate at other vertices in the knot. Thus it is impossible to leave the knot while following any path along its directed edges.

13.2 Implementation

In the scenario above, we imagine finding a knot in Wikipedia; that is, a set of articles with links to each other, but no links to other articles.

We wondered whether such a thing was even possible. Given that there are 3,811,000+ articles on Wikipedia and they all link to different pages, it seemed unlikely that there would be a knot, but we decided to investigate.

We started by extending Graph.py, from Section 2.2, to support directed graphs. The result is DirectedGraph.py, which you can download from thinkcomplex.com/DirectedGraph.py.

When we were designing DirectedGraph, we wanted to preserve the order of growth of Vertex and Arc lookups as they were in Graph. However, we needed to keep track of edges into a vertex and edges from a vertex. It made sense to keep track of these in separate dictionaries. Having two dictionaries takes twice as much memory, but this is a reasonable price to pay for maintaining the speed of lookups.

13.3 Detecting knots

To detect knots in a directed graph, we developed an algorithm based on a breadth-first-search. _bfsknots searches for all the vertices that can be reached from a given starting vertex, and returns these vertices as a set.

We run _bfsknots for every vertex in the graph, and build a dictionary that maps from a vertex to the set of vertices it can reach. That is, for each vertex, V, we know the set of reachable vertices, S_V.

If there is a knot in the graph, then for every vertex, V, in the knot the reachable set S_V is exactly the set of vertices in the knot.

The function has_knot iterates through each vertex in a graph, and returns True if this condition holds. If it checks the whole graph and does not find a knot, it returns False.

Finally, _knot_at_v checks whether all vertices reachable from V have the reachable set S_V.

Exercise 1

Download thinkcomplex.com/DirectedGraph.py. Read through the file to make yourself familiar with the terminology. Note that edges in DirectedGraph are represented by Arcs.

Determine the order of growth of has_knots experimentally. You can follow the example in Section 3.5. Below are some hints to help you out.

Use DirectedGraph.add_random_edges to generate graphs of different sizes.
For each graph, time how long it takes to check if it has a knot.
On a log-log scale, plot run time versus the number of vertices and run time versus the number of edges.

Exercise 2 Find all the knots in a directed graph.

Write a function, all_knots, that returns a list of all knots in a graph, where each knot is a set of vertices.
Write a function named entry_points that returns a list of all the vertices that serve as entry points into knots.

13.4 Knots in Wikipedia

To find knots in Wikipedia, we selected 558 disjoint subsets of Wikipedia articles, organized by index. For example, the index of articles about neurobiology was used to define one subset, and the index of articles about Zimbabwe was used to define another subset. The 558 subsets contain about 321,000 articles, or 10% of Wikipedia.

Of these subsets we examined, 38% contained at least one knot. So if you are restricted to articles listed on a single index page, there is a substantial chance you will eventually find a knot.

For the complete Wikipedia, the probability is lower because articles can link to articles in other indices. Based on these results, we cannot say, yet, whether the complete Wikipedia has a knot.

When you started your hypothetical research on topica obscura, you might have used journal articles instead of Wikipedia articles. Instead of links, you would have used citations to find other papers.

In that case, you would never find a knot. Why not? Because science articles are published sequentially, and a paper cannot cite a paper that will be created in the future. So the directed graph that represents papers and their citations cannot have a knot.

Chapter 13 Case study: Directed graphs and knots

13.1 Directed Graphs

13.2 Implementation

13.3 Detecting knots

13.4 Knots in Wikipedia

Like this book?

Are you using one of our books in a class?