Graphs

Applications

Google knowledge graph | six degrees of separation

google map (select the shortest, cheapest or most scenic way to a particular place)

Terminology

A graph G=(V, E) consists of a set of vertices V and a set of edges E, such that each edge in E is a connection between a pair of vertices in V.
A graph with relatively few edges is called sparse.
A graph with many edges is called dense.
Two edges are parallel if they connect the same pair of vertices.
Cardinality of a graph: the number of vertices; e.g., |G| = 6.
A subgraph is a subset of a graph's edges (and associated vertices) that constitutes a graph.
A path in a graph is a sequence of vertices connected by edges; a path is simple if all vertices on the path are distinct.
A cycle is a path (with at least one edge) whose first and last vertices are the same; a cycle is simple if the path is simple, except for the first and last vertices being the same.
The length of a path or a cycle is its number of edges.
A graph is connected if there is a path from every vertex to every other vertex.
A graph that is not connected consists of a set of connected components, which are maximal connected subgraphs.
An acyclic graph is a graph with no cycles.
A tree is an acyclic connected graph (a connected graph that has no cycle); a free tree is a connected, undirected graph with no simple cycles (i.e., a free tree is connected and has |V| - 1 edges).
Undirected graph (digraph): a graph whose edges are not directed.
- We say one vertex is connected to another if there exists a path that contains both of them.
- Degree of a vertex: the number of edges incident on the vertex. E.g., deg(v1) = 3.
Directed graph (digraph): a graph with edges directed from one vertex to another.
- The outdegree of a vertex is the number of edges pointing from it. The indegree of a vertex is the number of edges pointing to it.
- We say that a vertex w is reachable from a vertex v if there exists a directed path from v to w.
- A directed acyclic graph (or DAG) is a digraph with no directed cycles.
Labeled graph
Weighted graph vs unweighted graph: A weighted graph is a graph in which each edge is given a numerical weight. A weighted graph is therefore a special type of labeled graph in which the labels are numbers (which are usually taken to be positive).
Two vertices are adjacent if they are joined by an edge; they are also called neighbors.

Graph representations: adjacency matrix, adjacency list, edge list

Adjacency matrix

(A)                               (B)
  | 1  2  3  4  5                  | 1  2  3  4  5
--|--------------                --|--------------
1 |    1        1                1 |    1        1 
2 |       1                      2 | 1     1     1
3 |          1                   3 |    1     1
4 |             1                4 |       1     1
5 |    1                         5 | 1  1     1

Adjacency list

(A)                               (B)
Vertex Adjacency-list             Vertex  Adjacency-list
1      [2 5]                      1       [2 5]
2      [3]                        2       [1 3 5]
3      [4]                        3       [2 4]
4      [5]                        4       [3 5]
5      [2]                        5       [1 2 4]

Edge list

Other representations:

Object Oriented Representations
v.neighbors = Adj[v]

Implicit Representations
Adj(u) is a function
v.neighbors() is a method
(the above is useful when there are enormous numbers of vertices, e.g. possible distinct states of a complex system, etc.)

Graph visualization

Graphviz (the DOT language) (webgraphviz)
- Hello World: echo "digraph G {Hello->World}" | dot -Tpdf >hello.pdf
- See a family tree
- See a co-authorship graph
- Figure 11.11 (left: manually drawn; right: generated using graphviz)
Cytoscape
Isomorphic graphs: Two graphs which contain the same number of graph vertices connected in the same way are said to be isomorphic.

Graph applications

internet: packet routing
social networking: e.g. "people you may know" suggestions in LinkedIn
web "crawling": cataloging hypertext
solving games & puzzles
garbage collection in programming languages
network analysis for (design of resilient infrastructure networks)
theorem proof, etc.

Graph implementations

An ADT for graphs | An implementation of the ADT using adjacency list

Graph traversal (walk through a graph)

Graph traversal: visit the vertices of a graph in some specific order based on the graph's topology.
(Recall that tree traversals visit every node exactly once, in some specified order such as preorder, inorder, or postorder)

Graph traversal algorithm typically begins with a start vertex and then visits the remaining vertices from the start vertex.
Be aware of the following cases:
1) It may not be possible to reach all vertices from the start vertex (when the graph is not connected).
2) The graph may contain cycles, and cycles may cause the algorithm to go into an infinite loop if not carefully considered.
Solutions: to maintain a mark bit for each vertex on the graph and set the bit when the vertex is visited.

Depth-First Search (DFS)

Whenever a vertex V is visited during the search, DFS will recursively visit all of V's unvisited neighbors. Equivalently, DFS will add all edges leading out of v (to an unvisited vertex) to a stack. The next vertex to be visited is determined by popping the stack and following that edge. The DFS process can be used to define a depth-first search tree, which is composed of the edges that were followed to any unvisited vertex during the traversal, and leaves out the edges that lead to already visited vertices.

	
	public void DFS(int v) {
		preVisit(v); 
  		//e.g., preNum.add(v) for preVisit enumeration of the nodes
		//when v is first "discovered" (aizu)
		setVisited(v);
		List<Integer> neighbors = getNeighbors(v);
		for(int i = 0; i < neighbors.size(); i ++) {
			int v1 = neighbors.get(i);
			if(ifVisited(v1) == false) DFS(v1); //use recursion!
		}
		postVisit(v); 
		//e.g., postNum.add(v) for postVisit enumeration of the nodes
		//when search finishes examining v's neighbors 
	}

Cost of the DFS? DFS processes each edge once in a directed graph. In an undirected graph, DFS processes each edge from both directions. Each vertex must be visited, but only once, so the total cost is $\Theta(|V| + |E|)$.

preVisit() vs postVisit() functions; E.g., postVisit enumeration of the nodes for topological sort.

How does DFS work?

the algorithm is recursive
it's an algorithm that can be used to solve a maze, i.e.
walk through a maze leaving "bread crumbs" behind
at each node, keep track of:
- which edges have already been visited
- which edges still have to be visited
backtrack if necessary
do not repeat visiting vertices

Breath-First Search (BFS)

BFS examines all vertices connected to the start vertex before visiting vertices further away. Similarly, the BFS process results in a breadth-first search tree
BFS is implemented similarly to DFS, except that a queue replaces the recursion stack.

How does it work?

Start from a node s in the graph. (1 node visited in 0 steps)
Visit all nodes that can be reached by only one edge. ( Adj[s] nodes visited in 1 step)
From each node visited in (1.), visit all nodes that can be reached by only one edge.
Continue until you visit all nodes that are reachable from the node s.

Avoid duplicates--don't visit a node that has been visited before!
This will work in $O(|V|+|E|)$ time.

	public void BFS(int s) {
		ArrayList<Integer> toVisit = new ArrayList<Integer>();
		toVisit.add(s);
		while(toVisit.size() > 0) {
			//first-in, first-visit; if queue is used, use its dequeue() function
			int v = toVisit.remove(0); 
			setVisited(v);
			List<Integer> neighbors = getNeighbors(v);
			for(int i = 0; i < neighbors.size(); i ++) {
				int v1 = neighbors.get(i);
				if((ifVisited(v1) == false) && (toVisit.contains(v1) == false)) {
					toVisit.add(v1);
				}
			}
		}
	}

Topological Sort

Topological sorting is to lay out the vertices of a directed acylic graph (DAG) in a linear order to meet the prerequisite rules. In other words, a topological sort (sometimes abbreviated topsort or toposort) or topological ordering of a DAG is a linear ordering of its vertices such that for every directed edge uv from vertex u to vertex v, u comes before v in the ordering.
Applications of topological sort: Scheduling a sequence of jobs, classes, or tasks based on their dependencies; determining the order of compilation tasks to perform in makefiles; etc

Figure 11.14 (Shaffer): An example graph for topological sort. Seven tasks have dependencies as shown by the directed graph.
Algorithms:
(1) Algorithm 1 (DFS-based): when visit a vertex using DFS, add the vertex to a list when the recursion pops back to that vertex (post visit action); reverse the list results a topological sort.
(2) Algorithm 2 (use a queue of eligible vertices):
- Set list of visited vertext to {}, place all vertices that have no incoming edges into a queue of eligible vertices.
- Remove a vertex from the queue of eligible vertices, add it to the list of visited, "remove" edges from this vertex to its neighbors; if any of the neighbors becomes an eligible vertex (no incoming edges), add it to the queue.
- Repeat step 2) unitl the queue becomes empty.
- If all vertices are found in the list of visitied, the topological sort is found; otherwise, the graph is cyclic and there is no solution to the topological sort problem.

Implementations: Recursive topological sort (algorithm 1) | Queue-based topological sort (algorithm 2)

	//topological sort using recursion (DFS)
	public void topSortRec() {
		postEnum = new Vector<Integer>();
		walk("DFS"); //do depth-first search
		System.out.println("topological sort: ");
		//topological order is the reverse order of the post-visit enumeration of the nodes
		for(int i = postEnum.size() - 1; i >= 0; i --)
			System.out.print(nodeList.elementAt(postEnum.elementAt(i)) + " ");
	}

//Pseudocode for the queue-based topological sorting algorithm
L <- a list that will contain sorted vertices
C <- a list contains indegrees for each of the vertices
Q <- a queue that contains eligible vertices (vertices that have no incoming edges)
while Q is nonempty do
    remove a vertex u from Q
    add u to the end of L
    for each neighbor v of u do
        decrement C[v] by one
        if C[v] == 0 (v has no other incoming edges)
            push v to Q (queue of eligible vertices)
if L contains all the vertices 
    L is the solution
else
    solution not found

Shortest-Paths Problems

Single-Source Shortest Paths
All-Pairs Shortest Paths

Minimum-Cost Spanning Trees

Prim's Algorithm
Kruskal's Algorithm