Disjoint-set data structure (union–find data structure)

Basics

A disjoint-set data structure, also called a union–find data structure, is a data structure that keeps track of a set of elements partitioned into a number of disjoint (non-overlapping) subsets. It provides near-constant-time operations (bounded by the inverse Ackermann function $\alpha(n)$, which is less than 5 for any practical input size n; read more about Ackermann function) for adding new sets, merging sets and finding if two elements are in the same disjoint subset.

set(x) -- set x as a disjoint subset, containing x only (parent[x] = x; //set a node point to itself)
find(x) -- find the "representative" element of the distinct set that contains x. In tree based implementation, the representative element is the root node. If find(x) == find(y), we know x and y belong to the same disjoint set.
union(x, y) -- if x and y belong to distinct disjoint sets, merge them into one

Question: how can we use these functions to turn a graph into a disjoint-set data structure?

Practical uses

Are two nodes in a graph connected? (Related problem: connected component; see AIZU problem)
Kruskal's algorithm: to check if adding an edge creates a cycle

Implementation: tree based

Basic idea: represent each disjoint set as a tree, in which each node (element) only has a pointer to its parent. The node that has no parent (or it is its own parent) is the root of the tree (the representative element of the disjoint set). The parent pointer implementation stores precisely the information for finding which disjoint set a node belongs to. The pointers can be stored as an array (parent).
```
void union(x, y):
    r1 = find(x)
    r2 = find(y)
    if(r1 != r2) parent[r1] = r2
int find(x):
    curr = x
    while(parent[curr] != curr) curr = parent[curr]
    return curr
```
To make find() as efficient as possible, the distance from each node to the root of its respective tree should be as small as possible. Ideally, each tree should have all nodes pointing directly to the root. There are some techniques that can be used to achieve this goal.
- Weighted union rule: when merging two trees, make the tree with fewer nodes point to the bigger tree.
- Path compression: when find the root for a given node x, make all its ancestors point to the root. Note path compression happens in find() operation instead of union.
```
int find(x): //with path compression
    curr = x
    if(parent[curr] == curr) return curr
    parent[curr] = find(parent[curr])  
    return parent[curr]
```