The Connectivity Problem
Algorithmically Speaking - #13: Explaining the foundations of the most present real-world graph application.
Hello there, and welcome to a new edition of Algorithmically Speaking!
In the modern world, we are used to technology telling us how to travel from our current position to places we have never been. It usually lets us know several ways to get there, including the fastest or cheapest.
In todayβs discussion, I want to present the two most basic algorithms used in real-life scenarios that support a massive part of what our current delivery, location, and transportation services use. Though the mechanisms used in modern-day apps are much more sophisticated, they stand on the shoulders of these two basic but effective methods.
In order, this is our agenda for today:
π Breadth-First Search β a graph traversal that simulates how you would explore a new city.
π₯ Depth-First Search β a graph traversal that would allow you to escape from any maze.
π₯ Connected Components β a graph term to call your hometown if you knew how to get to any place from any location.
This article is part of a chapter on Fundamental Definitions of graph theory in my upcoming book, The Competitive Programmer Graphs Handbook. You can read all the details here:
You can support the book by preordering your digital, DRM-free copy, with early and frequent updates, including all future editions.
Given a graph G = < V, E > and two vertices s and t, the goal is to determine whether there exists a path that starts at s and ends at t. This problem is known as the connectivity problem.
Below are two algorithms for naturally solving this problem: Breadth-First Search and Depth-First Search. The workings and implementation of these algorithms will be explained in detail in the following sections.
π Breadth-First Search
This algorithm is the simplest one for solving the connectivity problem between a pair of vertices. It involves starting from s, exploring all directions, and adding the discovered vertices one layer at a time. The first layer consists of vertex s, and the second layer contains all vertices adjacent to s. If this process continues, the current layer will always consist of adjacent vertices from the previous layer. The algorithm terminates when there are no more vertices to discover.
This algorithm can be interpreted as a wave starting at vertex s and expanding to adjacent vertices as soon as possible. The layer assigned to a vertex corresponds to when this wave reaches it.
Formally, if the layers constructed by the BFS algorithm are enumerated as C1, C2, C3, . . ., then:
Layer C1 consists of all vertices that are adjacent to s. For convenience, the layer that consists of only vertex s will be denoted as C0.
Assuming the layers C1, . . . , Cj have been constructed, layer Cj+1 consists of all vertices that do not belong to any previous layer and share an edge with some vertex in layer Cj.
This algorithm is beneficial because it provides a lot of information. Suppose any vertex v in the graph does not belong to any of the layers generated by the BFS traversal. In that case, it means that there is no path from s to v. Furthermore, this traversal, assuming the definition that the distance between two vertices is the minimum number of edges in any path between them, groups the vertices according to their distance from s.
Proposition 1:
For each j β₯ 1, the layer Cj generated by the BFS algorithm consists of all vertices at a distance j from s. A path exists from s to t if and only if t appears in one of the layers.
Another important property of the BFS traversal is that it produces a tree with its root at vertex s and containing all the vertices reachable from s. Every vertex v distinct from s is added to this tree when it is first discovered.
When analyzing layer Cj, if there is a vertex u in this layer with an edge connecting it to a node v that has not yet been discovered, then the edge (u, v) is added to the tree such that u is the parent of v. This signifies that u was responsible for completing the path from s to v. This tree is known as the BFS tree.
The image below illustrates how the BFS tree is determined in an example graph.
π₯ Depth-First Search
The other method to find the set of vertices reachable from s simulates the natural way of behaving if graph G were a maze of interconnected rooms. You would start at s and visit one of its adjacent rooms. Once in this new room, you would repeat the previous process until reaching a point where all adjacent rooms to the current room have been explored. At this point, you backtrack along the same path until you find a room with unexplored neighbors and resume the traversal from one of them.
The method described is known as Depth-First Search (DFS), and it receives this name because it explores G by traversing as profoundly as possible before backtracking only when necessary.
This method differs from BFS in the way it finds unvisited vertices. Although both algorithms find the same set of vertices reachable from an initial node, they generally do so in a very different order. The DFS traversal explores very long paths, potentially moving quite far from s before returning to explore new paths.
The DFS method also produces a tree with its root at the initial vertex s. This tree is constructed similarly to the BFS tree: s is denoted as the root, and v is designated as the parent of u if v discovered u. The tree produced by this algorithm is known as the DFS tree, and although it is built under the same criteria as the BFS tree, it can differ significantly in structure due to the nature of the traversals.
The most characteristic difference is that the BFS tree usually has paths from the root to the leaves that are as short as possible, whereas the tree produced by the DFS algorithm is typically narrow and deep.
Proposition 2:
Let T be the DFS tree produced by the DFS algorithm on a graph G. Given a recursive call of DFS(u), every vertex marked as visited between the invocation and the end of this recursive call is a descendant of u in T.
Using the proposition above, it can be proven that
Proposition 3:
Let T be a DFS tree, u and v be vertices of T, and (u, v) an edge of G that does not belong to T. Then u is an ancestor of v, or v is an ancestor of u in T.
Hereβs how you would go about proving this proposition:
Proof. Without loss of generality, suppose that u is reached before v in the DFS traversal. When the edge (u, v) is examined during the call DFS(u), it is not added to T; therefore, v is marked as visited. Since v was not marked as visited at the time DFS(u) was invoked, it is a node that was discovered between the invocation of this recursive call and its end. According to proposition 2, v is a descendant of u. β
π₯ Connected Components
The set of vertices discovered by the BFS and DFS algorithms contains all those reachable from s. This set is, by definition, the connected component of G that includes s. To solve the connectivity problem between two vertices, s and t, one must check if t belongs to this component.
The BFS and DFS algorithms are some of the ways to generate the connected component that a vertex belongs to. More generally, the following procedure can be used to discover the set of vertices reachable from s:
Start with a set D = {s}, and at each step, check if an edge in G exists, say (u, v), such that one of its endpoints belongs to D and the other does not. In this case, add the new vertex. The algorithm terminates when no more such edges exist.
As can be seen, the described algorithm is a generic way to find a connected component. BFS and DFS algorithms would be specific methods for finding edges with one endpoint in the current set and the other outside of it.
This article concludes the series covering the chapter on Fundamental Definitions of my upcoming book, The Competitive Programmer Graphs Handbook.
The following articles will explore the first of the three main topics I plan to write about in the book: Bipartite Graphs. Expect everything from the basic definitions around them to the context in which these graphs are usually utilized in real-life scenarios to covering specific examples of problems that require their efficient use.
Here are the previous posts in this series, in case you want to take a look:
And thatβs it for today! If you are finding this newsletter valuable, consider doing any of these:
1) βοΈ Subscribe to the newsletterΒ β if you arenβt already, consider becoming a free or paid subscriber.
2) π Get early access to my bookβ thanks to all of you who have contributed to support my writing. The book I'm writing covers this and similar topics in more depth.
3)Β π»Β Read with your friendsΒ β this newsletter grows because readers share. I donβt use social media too much to advertise my writing. Iβm counting on you!
Have a great day! βοΈ
Alberto
Nice article. I liked your exploration of BFS but I feel you can have more detail in the explanation of how DFS works. Also it would be nice to see an application of these search techniques to a problem