The Concept
A disjoint set (also called a union-find data structure) maintains a collection of non-overlapping sets. Each element belongs to exactly one set, and the structure answers two questions in near-constant time:
- find(x) — which set does element x belong to?
- union(a, b) — merge the sets containing a and b.
The core idea is a forest of trees. Each element starts as its own root. A find walks parent pointers until it reaches the root; the root is the set's representative. A union links one tree's root under the other's, so both sets share a single representative afterward.
Why two optimisations matter
Without any tricks, a chain of unions can produce a tree that is effectively a linked list — making find O(n).
Path compression fixes this during find: while walking to the root, point every visited node directly at the root (or at the root's parent — a variant called path halving). Future finds on those nodes now take O(1).
Union by rank prevents deep trees from the start: always attach the shorter tree under the taller one. The rank array tracks an upper bound on height; when two trees of equal rank merge, the combined rank grows by one. Without further path compression the height would grow as O(log n); with path compression it barely grows at all.
Together, the amortized cost per operation is O(α(n)), where α is the inverse Ackermann function — effectively constant for any n you will ever encounter.
Core Operations
find — O(α(n)) amortized
Walk parent pointers until parent[x] == x (the root). Use path halving (jumping two levels at a time) to flatten the tree as a by-product: parent[x] = parent[parent[x]] before advancing x.
union — O(α(n)) amortized
Call find on both elements. If they share a root they are already connected — return False (useful for cycle detection). Otherwise, attach the lower-rank root under the higher-rank one. If ranks are equal, break the tie arbitrarily and increment the winner's rank.
Initialisation — O(n)
Each of the n elements starts as its own parent (parent[i] = i) and has rank 0 (rank[i] = 0). Allocating two arrays is the entire setup.
Code Implementation
Complexity Summary
| Operation | Time | Space |
|---|---|---|
| find | O(α(n)) amortized | — |
| union | O(α(n)) amortized | — |
| Initialisation | O(n) | O(n) |
α(n) is the inverse Ackermann function. For all practical values of n (up to 10⁸⁰, the estimated atoms in the observable universe), α(n) ≤ 4. Treat it as constant.
Where Union-Find Shows Up
Counting connected components. Initialise one element per node, then call union for every edge. When done, count the number of distinct roots — that is the number of connected components.
Cycle detection in an undirected graph. Process edges one by one. Before calling union(a, b), call find(a) and find(b). If they share the same root, adding this edge would close a cycle — a direct consequence of the fact that both endpoints are already in the same connected component. This is the cleanest cycle-detection approach for undirected graphs when you are processing edges in arbitrary order.
Kruskal's Minimum Spanning Tree. Sort all edges by weight, then iterate. For each edge, union its endpoints only if they are in different components (union returns True). Skip edges that would form a cycle. After V−1 successful unions the MST is complete. The union-find handles the connectivity check in near-O(1), so the algorithm's bottleneck is the O(E log E) sort.
Interactive Labs
Watch union and find operations build and flatten the parent forest step by step.
See how repeated unions partition a graph into its connected components.
Observe how union-find detects a cycle the moment two already-connected nodes are joined.
Key Takeaways
- Forest of trees, roots are representatives.
findreturns the root; two elements in the same set always share the same root. - Path compression flattens on the fly. After a
find, visited nodes point near the root, making future finds trivially fast. - Union by rank keeps trees shallow. Attaching the shorter tree under the taller one bounds the height growth before path compression even kicks in.
- O(α(n)) per operation — effectively O(1). The combined optimisations yield the inverse Ackermann bound, the tightest known for this problem.
- Three canonical applications. Connected components, cycle detection in undirected graphs, and Kruskal's MST are the standard union-find interview patterns.