336 — BB-Graph: A New Subgraph Isomorphism Algorithm for Efficiently Querying Big Graph Databases
Read on 22 July 2018The subgraph isomorphism problem — determining where (if anywhere) a graph can be embedded in another larger graph — is an extremely common applied graph theory task. But it is also unfortunately a very computationally expensive task (NP hard, in fact) and the current state-of-the-art solutions still take a great deal of time to run even on pretty serious hardware.
Although it’s impossible (assuming P≠NP) to reduce the time-complexity of this algorithm much further, even very small improvements to the efficiency of graph-matching can have enormous impact.
That’s why the authors — not content with existing systems like GraphQL or Cypher (Neo4j’s implementation) — wrote BB-Graph.
Most subisomorphism algorithms in use today start by identifying a large set of candidate “seed” nodes in the graph, and the begin filtering by checking each node as a candidate in the subgraph.
This filtering step is crucial: The wall-clock runtime of the algorithm is superlinear with regards to this count.
BB-Graph uses this same “branch-and-bound” technique, but further reduces the start-node count by using some cleverer filters. And because this algorithm supports backtracking — the ability to establish a ‘dead end’ when iteratively querying for a subgraph and double back to an earlier decision point — it requires dramatically less memory than other methods as well.
The runtime of BB-Graph is still $O(\vert V_G\vert + \vert V_Q\vert + \vert E_Q\vert \times max(deg_G))$ where $G = \langle V, E\rangle$ is the “large” graph and $Q = \langle V,E\rangle$ is the query graph (and $deg_G$ is the map of degree of nodes in graph $G$), but this means that it has consistently shorter runtimes than Cypher and GraphQL in almost all tested cases (and comparable performance in the other cases).