Graph Matching: Hopcroft-Karp Algorithm

- - posted in algorithm


Hopcroft-Karp algorithm also utilizes the augmenting path. The difference between the simple augmenting path algorithm is, instead of searching augmenting path one by one, it looks for many paths in the same time. The paths found at each iteration are in fact vertex disjoint path. By doing so, the number of iterations can be cut down, since there cannot be too many disjoing paths.

The major observation is that the length of augementing path grows at each step. Thus, if we keep finding a set of vertex disjoint augmenting paths, the algorithm is guaranteed to stop eventually and faster than finding the path one by one.

Finding Maximal Set of Vertex Disjoint Paths

Let \(G=(U \cup V, E)\) be the bipartite graph and \(G_M\) be the directed graph w.r.t to matching \(M\). The layered graph is contructed out of \(G_M\), where the distance at a vertex \(v\) is defined as the length of shortest path from some vertices in \(U\).

This can done by simply running a modified BFS on \(G_M\): we start by enqueing all the free vertices \(U'\) in \(U\) and label them as 0 (distance), propagate and label until one or more free vertices in \(V\) are reached. Let the label (distance) be \(k\), and denote this graph as \(L\).

We can then run a modified DFS on \(L\), starting from any of the vertices in \(U'\). Whenever we reach a free vertices in \(V\), we delete the path \(p\) from \(L\), and repeat. It can be proven that all the paths found are shortest vertex disjoint paths of length \(k\). This set of paths are the maximal set we are looking for.

Algorithm Outline

  • MAXIMAL-SET-OF-PATHS(\(G=(U \cup V, E), M\))
    • \(P \gets \varnothing\)
    • \(L \gets \) Run modified BFS on \(G_M\)
    • \(U' \gets \) free vertices in \(U\)
    • for \(u \in U'\)
      • \(p \gets \) PARTIAL-DFS(\(G, v, T\))
      • if \(p \neq \varnothing \)
        • \(P = P \cup p\)
        • Remove \(p\) from \(L\)
    • return \(P\)

    • \(M \gets \varnothing\)
    • repeat
      • \(P \gets \) MAXIMAL-SET-OF-PATHS(\(G, M\))
      • if \(P \neq \varnothing\)
        • \(M \gets M \oplus P\)
    • until \(P = \varnothing\)
    • return \(M\)


It can be shown that the loop will be executed at most \(O(\sqrt{|V|})\) times, and running the DFS and BFS requires \(O(|E|)\) time. Thus the overall run time is \(O(\sqrt{|V|} |E|)\).