257 — Dynamic Structural Similarity on Graphs

Castrillo et al (1805.01419)

Read on 04 May 2018
#graphs  #graph-theory  #similarity  #graph-structure  #NP  #networks 

When using networks to represent complex systems, one of the many difficult-to-perform analyses is that of determining network similarities. For example, given two graphs $A$ and $B$, how similar are these graphs? For small graphs, you can simply check every possible configuration of graph $B$ and see if it overlays nicely on top of graph $A$. But this is still an NP-hard challenge that scales supramultiplicatively with the complexity of $A$ as well as $B$, and if you have a sufficiently large $A$ or $B$, many graph-similarity algorithms become prohibitively expensive to run.

There are many methods that instead look at local features of a graph. These methods include local degree centrality, closeness centrality, betweenness, bridgeness, Jaccard similarity, Cosine, Dice…

These systems usually incorporate information about a single vertex and its immediate neighborhood. Other systems, like PageRank or SimRank, use more complex and less local metrics.

The authors of this paper propose a new method to determine the similarity index of two vertices given that they are ① connected, and ② share a structurally similar neighborhood.

Given a graph $G = (V, E)$ and a similarity metric of either $f : V \mapsto \mathbb{R}$ or $f : E \mapsto \mathbb{R}$, the neighborhood $N(u)$ is the neighborhood of the node $u \in V$, and $N[u]$ is the neighborhood of $u$ including $u$ itself. The Dynamic Structural Similarity of a node is a similarity function $f$ that is defined recursively on an edge $(u, v)$ as:

\[DSS(u, v) = \frac{ \sum\limits_{x\in N[u]\cap N[v]} DSS(u, x) + DSS(v, x) }{ \sqrt{ \sum\limits_{x\in N(u)} DSS(u, x) \times \sum\limits_{y\in N(v)} DSS(v, y) } }\]

In other words, the similarity of $u$ and $v$ is defined not only by the direct DSS of $(u, v)$, but also by the cumulative DSS of all shared edges ($x$). This means that the similarity of the neighborhoods of $u$ and $v$ contribute positively to the DSS score, and the total self-similarity of each of the vertices’ neighborhoods negatively contributes, which has the effect of controlling for vertices that are not common neighbors between $u$ and $v$ (i.e. one of $(u, z)$ or $(z, v) \notin E$).

This recursive definition means that the DSS is progressively refined through iteration, which in turn means that it can be used at varying levels of confidence or computational complexity, as needed by a particular graph similarity task.