Graphs
Applications
Google knowledge graph | six degrees of separation
google map (select the shortest, cheapest or most scenic way to a particular place)
Terminology
- A graph G=(V, E) consists of a set of vertices V and a set of edges
E, such that each edge in E is a connection between a pair of vertices
in V.
- A graph with relatively few edges is called sparse.
- A graph with many edges is called dense.
- Two edges are parallel if they connect the same pair of vertices.
- Cardinality of a graph: the number of vertices; e.g., |G| = 6.
- A subgraph is a subset of a graph's edges (and associated vertices) that constitutes a graph.
- A path in a graph is a sequence of vertices connected by edges; a path is simple if all vertices on the path are distinct.
- A cycle is a path (with at least one edge) whose first and last vertices are the same; a cycle is simple if the path is simple, except for the first and last vertices being the same.
- The length of a path or a cycle is its number of edges.
- A graph is connected if there is a path from every vertex to every other vertex.
- A graph that is not connected consists of a set of connected components, which are maximal connected subgraphs.
- An acyclic graph is a graph with no cycles.
- A tree is an acyclic connected graph (a connected graph that has no cycle); a free tree is a connected, undirected graph with no simple cycles (i.e., a free tree is connected and has |V| - 1 edges).
- Undirected graph (digraph): a graph whose edges are not directed.
- We say one vertex is connected to another if there exists a path that contains both of them.
- Degree of a vertex: the number of edges incident on the vertex. E.g., deg(v1) = 3.
- Directed graph (digraph): a graph with edges directed from one vertex to another.
- The outdegree of a vertex is the number of edges pointing from it. The indegree of a vertex is the number of edges pointing to it.
- We say that a vertex w is reachable from a vertex v if there exists a directed path from v to w.
- A directed acyclic graph (or DAG) is a digraph with no directed cycles.
- Labeled graph
- Weighted graph vs unweighted graph: A weighted graph is a graph in which each edge is given a numerical weight. A weighted graph is therefore a special type of labeled graph in which the labels are numbers (which are usually taken to be positive).
- Two vertices are adjacent if they are joined by an edge; they are also called neighbors.
Graph representations: adjacency matrix, adjacency list, edge list
Adjacency matrix(A) (B) | 1 2 3 4 5 | 1 2 3 4 5 --|-------------- --|-------------- 1 | 1 1 1 | 1 1 2 | 1 2 | 1 1 1 3 | 1 3 | 1 1 4 | 1 4 | 1 1 5 | 1 5 | 1 1 1Adjacency list
(A) (B) Vertex Adjacency-list Vertex Adjacency-list 1 [2 5] 1 [2 5] 2 [3] 2 [1 3 5] 3 [4] 3 [2 4] 4 [5] 4 [3 5] 5 [2] 5 [1 2 4]Edge list
1 2 1 5 2 3 3 4 5 2
Other representations:
Object Oriented Representationsv.neighbors = Adj[v]
Implicit Representations
Adj(u) is a function
v.neighbors() is a method
(the above is useful when there are enormous numbers of vertices, e.g. possible distinct states of a complex system, etc.)
Graph visualization
- Graphviz (the DOT language) (webgraphviz)
- Hello World: echo "digraph G {Hello->World}" | dot -Tpdf >hello.pdf
- See a family tree
- See a co-authorship graph
- Figure 11.11 (left: manually drawn; right: generated using graphviz)
- Cytoscape
- Isomorphic graphs: Two graphs which contain the same number of graph vertices connected in the same way are said to be isomorphic.
Graph applications
- internet: packet routing
- social networking: e.g. "people you may know" suggestions in LinkedIn
- web "crawling": cataloging hypertext
- solving games & puzzles
- garbage collection in programming languages
- network analysis for (design of resilient infrastructure networks)
- theorem proof, etc.
Graph implementations
An ADT for graphs | An implementation of the ADT using adjacency listGraph traversal (walk through a graph)
Graph traversal: visit the vertices of a graph in some specific order based on the graph's topology.(Recall that tree traversals visit every node exactly once, in some specified order such as preorder, inorder, or postorder)
Graph traversal algorithm typically begins with a start vertex and then visits the remaining vertices from the start vertex.
Be aware of the following cases:
1) It may not be possible to reach all vertices from the start vertex (when the graph is not connected).
2) The graph may contain cycles, and cycles may cause the algorithm to go into an infinite loop if not carefully considered.
Solutions: to maintain a mark bit for each vertex on the graph and set the bit when the vertex is visited.
Depth-First Search (DFS)
Whenever a vertex V is visited during the search, DFS will recursively visit all of V's unvisited neighbors. Equivalently, DFS will add all edges leading out of v (to an unvisited vertex) to a stack. The next vertex to be visited is determined by popping the stack and following that edge. The DFS process can be used to define a depth-first search tree, which is composed of the edges that were followed to any unvisited vertex during the traversal, and leaves out the edges that lead to already visited vertices.public void DFS(int v) { preVisit(v); //e.g., preNum.add(v) for preVisit enumeration of the nodes //when v is first "discovered" (aizu) setVisited(v); List<Integer> neighbors = getNeighbors(v); for(int i = 0; i < neighbors.size(); i ++) { int v1 = neighbors.get(i); if(ifVisited(v1) == false) DFS(v1); //use recursion! } postVisit(v); //e.g., postNum.add(v) for postVisit enumeration of the nodes //when search finishes examining v's neighbors }Cost of the DFS? DFS processes each edge once in a directed graph. In an undirected graph, DFS processes each edge from both directions. Each vertex must be visited, but only once, so the total cost is $\Theta(|V| + |E|)$.
preVisit() vs postVisit() functions; E.g., postVisit enumeration of the nodes for topological sort.
How does DFS work?
- the algorithm is recursive
- it's an algorithm that can be used to solve a maze, i.e.
walk through a maze leaving "bread crumbs" behind - at each node, keep track of:
- which edges have already been visited
- which edges still have to be visited
- backtrack if necessary
- do not repeat visiting vertices
Breath-First Search (BFS)
BFS examines all vertices connected to the start vertex before visiting vertices further away. Similarly, the BFS process results in a breadth-first search treeBFS is implemented similarly to DFS, except that a queue replaces the recursion stack.
How does it work?
- Start from a node s in the graph. (1 node visited in 0 steps)
-
Visit all nodes that can be reached by only one edge. ( Adj[s] nodes visited in 1 step)
- From each node visited in (1.), visit all nodes that can be reached by only one edge.
- Continue until you visit all nodes that are reachable from the node s.
This will work in $O(|V|+|E|)$ time.
public void BFS(int s) { ArrayList<Integer> toVisit = new ArrayList<Integer>(); toVisit.add(s); while(toVisit.size() > 0) { //first-in, first-visit; if queue is used, use its dequeue() function int v = toVisit.remove(0); setVisited(v); List<Integer> neighbors = getNeighbors(v); for(int i = 0; i < neighbors.size(); i ++) { int v1 = neighbors.get(i); if((ifVisited(v1) == false) && (toVisit.contains(v1) == false)) { toVisit.add(v1); } } } }
Topological Sort
- Topological sorting is to lay out the vertices of a directed acylic graph (DAG) in a linear order
to meet the prerequisite rules. In other words, a topological sort
(sometimes abbreviated topsort or toposort) or topological ordering of a
DAG is a linear ordering of its vertices such that for every directed
edge uv from vertex u to vertex v, u comes before v in the ordering.
- Applications of topological sort: Scheduling a sequence of
jobs, classes, or tasks based on their dependencies; determining the
order of compilation tasks to perform in makefiles; etc
Figure 11.14 (Shaffer): An example graph for topological sort. Seven tasks have dependencies as shown by the directed graph.
- Algorithms:
(1) Algorithm 1 (DFS-based): when visit a vertex using DFS, add the vertex to a list when the recursion pops back to that vertex (post visit action); reverse the list results a topological sort.
(2) Algorithm 2 (use a queue of eligible vertices):
- Set list of visited vertext to {}, place all vertices that have no incoming edges into a queue of eligible vertices.
- Remove a vertex from the queue of eligible vertices, add it to the list of visited, "remove" edges from this vertex to its neighbors; if any of the neighbors becomes an eligible vertex (no incoming edges), add it to the queue.
- Repeat step 2) unitl the queue becomes empty.
- If all vertices are found in the list of visitied, the topological sort is found; otherwise, the graph is cyclic and there is no solution to the topological sort problem.
- Set list of visited vertext to {}, place all vertices that have no incoming edges into a queue of eligible vertices.
- Implementations: Recursive topological sort (algorithm 1) | Queue-based topological sort (algorithm 2)
//topological sort using recursion (DFS) public void topSortRec() { postEnum = new Vector<Integer>(); walk("DFS"); //do depth-first search System.out.println("topological sort: "); //topological order is the reverse order of the post-visit enumeration of the nodes for(int i = postEnum.size() - 1; i >= 0; i --) System.out.print(nodeList.elementAt(postEnum.elementAt(i)) + " "); }
//Pseudocode for the queue-based topological sorting algorithm L <- a list that will contain sorted vertices C <- a list contains indegrees for each of the vertices Q <- a queue that contains eligible vertices (vertices that have no incoming edges) while Q is nonempty do remove a vertex u from Q add u to the end of L for each neighbor v of u do decrement C[v] by one if C[v] == 0 (v has no other incoming edges) push v to Q (queue of eligible vertices) if L contains all the vertices L is the solution else solution not found
Shortest-Paths Problems
- Single-Source Shortest Paths
- All-Pairs Shortest Paths
Minimum-Cost Spanning Trees
- Prim's Algorithm
- Kruskal's Algorithm