Chapter 13 Case study: Directed graphs and knotsRachel Bobbins and Noam Rubin 13.1 Directed GraphsImagine that it’s 2 a.m. during finals week, and you’re scrambling to finish a research paper on topica obscura. Your adrenaline jumps when you find a relevant Wikipedia article, with links to more Wikipedia articles! You start clicking away, jumping from page to page in search of facts. An hour later, you realize you’re still clicking, but these are pages you’ve already visited. No matter which link you click, you can’t seem to discover any new pages! If this has happened to you, you’ve unknowingly (and unfortunately) stumbled upon a knot in Wikipedia. Knots are a unique property of directed graphs. To understand them, it is necessary to have a basic understanding of directed graphs. The graphs in the rest of this book are undirected. In an undirected graph, an edge represents a symmetric relationship between vertices. This abstraction is appropriate for some applications, such as acquaintance networks and transportation networks; but other applications involve asymmetric relationships, for example, links between pages in the World Wide Web. Wikipedia links are also asymmetric. Page A might link to page B, but page B doesn’t have to include any links to page A. To fully describe the relationship between pages A and B, we need two bits of information: whether A links to B, and whether B links to A . This is the essence of a directed graph. In a directed graph, a directed edge, e = (A,B), is an edge from A to B. Knots are a unique (and sometimes cruel) property of directed graphs. A knot is a collection of vertices and edges with the property that every vertex in the knot has outgoing edges, and all outgoing edges from vertices in the knot terminate at other vertices in the knot. Thus it is impossible to leave the knot while following any path along its directed edges. 13.2 ImplementationIn the scenario above, we imagine finding a knot in Wikipedia; that is, a set of articles with links to each other, but no links to other articles. We wondered whether such a thing was even possible. Given that there are 3,811,000+ articles on Wikipedia and they all link to different pages, it seemed unlikely that there would be a knot, but we decided to investigate. We started by extending When we were designing 13.3 Detecting knotsTo detect knots in a directed graph, we developed an algorithm based
on a breadth-first-search. def _bfsknots(self, s): # initialize the queue with the start vertex queue = [s] visited = set() on_first_vertex = True while queue: # get the next vertex v = queue.pop(0) # skip it if it's already marked if v in visited: continue # if we're on the first vertex, we're not actually visting if v != s or not on_first_vertex: visited.add(v) on_first_vertex = False for x in self.out_vertices(v): #if its out vertices have been cached, update visited if x in self._knot_cache.keys(): visited.update(self._knot_cache[x]) visited.add(x) #otherwise add it to the queue elif x not in self._knot_cache.keys(): queue.append(x) return visited We run If there is a knot in the graph, then for every vertex, V, in the knot the reachable set SV is exactly the set of vertices in the knot. The function def has_knot(self): """ Returns true if directed graph has a knot. """ self._knot_cache = {} #build the cache of which vertices are accessible from which for v in self: self._knot_cache[v] = self._bfsknots(v) #searches for knot for v in self: if self._knot_at_v(v): return True return False Finally, Exercise 1 Download thinkcomplex.com/DirectedGraph.py. Read through the file to make yourself familiar with the terminology. Note that edges in DirectedGraph are represented by Arcs. Determine the order of growth of
Exercise 2
Find all the knots in a directed graph.
13.4 Knots in WikipediaTo find knots in Wikipedia, we selected 558 disjoint subsets of Wikipedia articles, organized by index. For example, the index of articles about neurobiology was used to define one subset, and the index of articles about Zimbabwe was used to define another subset. The 558 subsets contain about 321,000 articles, or 10% of Wikipedia. Of these subsets we examined, 38% contained at least one knot. So if you are restricted to articles listed on a single index page, there is a substantial chance you will eventually find a knot. For the complete Wikipedia, the probability is lower because articles can link to articles in other indices. Based on these results, we cannot say, yet, whether the complete Wikipedia has a knot. When you started your hypothetical research on topica obscura, you might have used journal articles instead of Wikipedia articles. Instead of links, you would have used citations to find other papers. In that case, you would never find a knot. Why not? Because science articles are published sequentially, and a paper cannot cite a paper that will be created in the future. So the directed graph that represents papers and their citations cannot have a knot. |
Like this book?
Are you using one of our books in a class?We'd like to know about it. Please consider filling out this short survey.
|