Breadth First Search (BFS) and Depth First Search (DFS)

Why are they useful?

tree vs graph

BFS and DFS are two algorithms that traverse a graph or tree data structure. They traverse it in a systematic way, visiting all the nodes and edges.
They can be used to:

Search for a specific node or value in a graph or tree . You can use BFS and DFS to search for a specific node or value by starting at the root node and exploring the graph or tree until you find the target node.

Find the shortest path between two nodes in the graph. (BFS is particularly useful for this as it always expands the frontier of the search space in the shortest possible way).

Topological sorting. DFS can be used to perform a topological ordering of the nodes in a directed acyclic graph (DAG), that find an ordering s.t for every edge (u,v), node u comes before node v in the ordering.

Testing if a graph is connected. You can use BFS and DFS to test if a graph is connected by starting at a node and exploring the entire graph to make sure that all the nodes are reachable from the starting node.

Finding strongly connected components in a graph. A set of nodes are strongly connected if they are all connected to each other.

Testing if a graph is bipartite. BFS and DFS can be used to check if a graph can be divided into two sets s.t no two nodes in the same set are connected by an edge.

What are stacks and queue?

A stack and a queue are both data structures that are used to store and organize data. However, they have different properties and are used for different purposes.

A stack is a data structure that follows the principle of "last in, first out" (LIFO). This means that the last element that was added to the stack is the first one to be removed. Stacks are often used to store data temporarily and are used in a variety of applications, such as evaluating expressions, undo/redo operations, and function calls.

A queue is a data structure that follows the principle of "first in, first out" (FIFO). This means that the first element that was added to the queue is the first one to be removed. Queues are often used to store data that needs to be processed in a specific order, and are used in a variety of applications, such as scheduling tasks, handling events, and storing data for later processing.

Here are some key differences between stacks and queues:

Order of insertion and removal: In a stack, the last element that was added is the first one to be removed. In a queue, the first element that was added is the first one to be removed.

Accessibility: In a stack, you can only access the top element. In a queue, you can only access the front (head) and rear

1) BFS

Pseudo-Code, time and space complexity

pseudo
BFS(root):

  create an empty queue q
  create an empty set s

  add root to q and s

  while q is not empty:

    dequeue a node n from q

    for each neighbor m of n:
      if m is not in s:
        add m to s
        add m to q

At the end of the algorithm, every nodes are explored and s will contain all the nodes of the graph.
If there are N nodes and E edges, the algorithm runs in O(E+N) since every nodes and edges are visited only once. Space is also O(E+N) because it stores the nodes that need to be visited and the size of the queue can be at most N+E.

Implementation in Python

  # implementation of BFS algo in Python 
from collections import deque

def BFS(root):
  q = deque() # deque is a list-like sequence optimized for data accesses near its endpoint. 
  s = set()
  q.append(root)
  s.add(root)
  while len(q) > 0:
    n = q.popleft()
    for m in n.neighbors:
      if m not in s:
        s.add(m)
        q.append(m)
      

2) DFS

Pseudo-Code, time and space complexity


DFS(root):

  create an empty stack s
  create an empty set visited

  add root to s and visited

  while s is not empty:

    pop a node n from s

    for each neighbor m of n:
      if m is not in visited:
        add m to visited
        add m to s

As for BFS, the time complexity is O(N+E) as you visit each node and edge only once. However the DFS space complexity can be O(N) if it is implemented iteratively (O(N+E) if implemented recursively because you need to store the nodes that are currently being visited in the stack, as well as the nodes and edges that has been visited.)

Implementation in Python

def DFS(root):
  s = []
  visited = set()
  s.append(root)
  visited.add(root)
  while len(s) > 0:
    n = s.pop()
    for m in n.neighbors:
      if m not in visited:
        visited.add(m)
        s.append(m)
      

Application to the problems listed above

We first create a function that create random graphs that will be useful to test our solutions.

import random
import networkx as nx # to visualize the graph
import matplotlib as plt 

# create the graph

# Define a class for the nodes in the graph
class Node:
  def __init__(self, value):
    self.value = value
    self.neighbors = set()

# class Random Graph 
class Random_graph:

  def __init__(self, size):
    self.nodes = []
    
    for i in range(size):
      node = Node(str(i))
      self.nodes.append(node)

    # create random edges:
    
    for i in range(size):
      node = self.nodes[i]
      num_neighbors = random.randint(1,size-1)

      for j in range(num_neighbors):
        neighbor = self.nodes[random.randint(0,size-1)]

        if neighbor != node:                  # to avoid a node, to have an edge to itself
          node.neighbors.add(neighbor)
          neighbor.neighbors.add(node)


  def visualize(self):
  
    # Create a NetworkX graph from the given graph
    G = nx.Graph()

    # Add the nodes to the graph
    for node in self.nodes:
      G.add_node(node.value)

    # Add the edges to the graph
    for node in self.nodes:
      for neighbor in node.neighbors:
        G.add_edge(node.value, neighbor.value)

    # Draw the graph
    nx.draw(G, with_labels=True)
  
    

graph = Random_graph(6)
graph.visualize()

for i in graph.nodes[0].neighbors:
    print(i.value)

A) Search for a specific node or value

To check if a node is in a graph, we just need to add a if condition to check if the node is the targeted node, return True if we meet the node, False otherwise.
I will implement the algorithms for the case where we search multiple nodes (and then the above case will just be a special case.)

# Using BFS
from collections import deque

def BFS_search(root, target_value):

  # initialize the dictionary that stores the False as long as we havent seen the targeted value 
    sol = {}
    for i in range(len(target_value)):
        sol[target_value[i]] = False


    q = deque([root]) # deque is a list-like sequence optimized for data accesses near its endpoint. 
    
    visited = set()
    visited.add(root)

    

    while len(q) > 0:

        n = q.popleft()
        for m in n.neighbors:
            if m not in visited:
            
                if m.value in target_value:
                    sol[m.value] = True

                visited.add(m)
                q.append(m)

    # check if the value is in target_value
    if False in sol.values():
        return False
    else:
        return True 

        

# applying the algorithm to the random graph of size 6 

print(BFS_search(graph.nodes[0],["2","5"]))
print(BFS_search(graph.nodes[0],["2","5","7"]))

True
False

What is the time and space complexity of the algorithm?

We need to add to the standard BFS time complexity the time taken to initialize the dictionnary, which is O(Q) where Q is the lenght of the target_value list. We also need to add the time taken to check if the node is in the target_value list which take O(Q). So in total O((E+N)Q) which will be O(E+N) in general as Q will be very small compared to N or E.
For the space complexity we just need to add the space required to store the dictionnay which is negligable compared to the space required to store the queue.

B) Find the shortest path between two nodes in the graph

We need to have a dictionnary that stores the distance of nodes as they are discovered from the root node. We use a dictionnary to be able to get the distance of any node from the root node in O(1)

from collections import deque

def BFS_shortest_path(node1, node2):

    q = deque([node1]) # deque is a list-like sequence optimized for data accesses near its endpoint. 
    visited = set()

    visited.add(node1)

    # initialize the distance from node 1
    distance = 0

    # create a dictionnary that will store the distance of every node from node 1
    dist_node = {}
    dist_node[node1] = 0 

    while len(q) > 0:

        n = q.popleft()

        for m in n.neighbors:
      
            if m == node2:
                return dist_node.get(n) + 1

            if m not in visited:
                visited.add(m)
                q.append(m)
                dist_node[m] = dist_node.get(n) + 1
        
    return "the target node is not in the graph"

print(BFS_shortest_path(graph.nodes[0],graph.nodes[5]))
print(BFS_shortest_path(graph.nodes[0],graph.nodes[4]))

1
3

C) Testing if a graph is bipartite (using DFS)

A graph is bipartite if we can color all the nodes such that no node having the same color are directly connected by an edge.
We apply DFS as follow: start with a node and color the first one Red, then Blue, then Red, until we reach as deep as we can in the graph. Explore the graph like above until either we find

def DFS_bipartite(root):

    colors = {root: 0}
    s = []

    s.append(root)

    while len(s) > 0:
        n = s.pop()
        for m in n.neighbors:

            if m not in colors:

                colors[m] = 1-colors[n]
                s.append(m)

            else:
                if colors[m] == colors[n]:
                    return False
                
    return True 
                
            

# Create a bipartite graph with 4 nodes
a = Node("A")
b = Node("B")
c = Node("C")
d = Node("D")

# Connect the nodes with edges
a.neighbors = [b, c]
b.neighbors = [a, d]
c.neighbors = [a, d]
d.neighbors = [b, c]

# test on a bipartite graph 
print(DFS_bipartite(a))
# test on a non-bipartite graph
print(DFS_bipartite(graph.nodes[0]))

True
False

Conclusion

This was the opportunity for me to review two of the most famous and widely used algorithm, BFS and DFS and see some problems that can be solved using them.