104 — Knowledge Graph Embedding with Iterative Guidance from Soft Rules

Guo et al (1711/11231)

Read on 02 December 2017
#knowledge-graph  #graphs  #vector-embedding  #inference 

Knowledge graphs (KGs) are one data structure commonly used to represent large, complex datasets of highly structured data. For example, Freebase or YAGO are two examples of public, freely available KGs. I explained KGs first in this post.

One disadvantage of KGs — at least when storing dynamic datasets — is that every connection must be hand-entered: it is conventionally very difficult to “guess” connections in a KG. For example, when adding a new entity, such as “Brooklyn,” to an otherwise complete KG describing bridges between boroughs of New York City, you must ALSO add all relations that connect Brooklyn to its neighbors, or you break the “completeness” of the graph. Furthermore, there is no high-confidence way to guess these connections. They must be added manually.

In contrast, n-dimensional vector embedding is a data structure strategy which relies on a latent, fundamental organization to a dataset. word2vec may be the most popular example of this type of data structure.

Naturally, it is very useful to be able to convert KGs to vector embeddings, or at least assign an embedding to the graph itself. Although it’s a poor metaphor, I like envisioning this as drawing a graph with force-directed positioning of nodes, where the dimension space expands until all edges can be exactly the right length. (A four node graph with four equilateral triangles cannot be represented in 2D, but can easily be represented as a pyramid in 3D.)

While many methods exist to perform this conversion, few are able to use the graph itself to dynamically recompute an embedding (important if the dataset changes) and even fewer can use “soft” rules: that is, they rely on manual triple generation.

This paper presents RUGE, a RUle-Guided Embedding. RUGE uses the rules of which it can be sure, but then tries to extrapolate new “soft” rules by inverting rules it knows to be true. This larger ruleset can then be used to generate a more informed embedding even though no more original data were provided.

This graph can then be subsampled in order to determine the effectiveness of the embedding. By removing links and trying to predict them using only the embedding, RUGE appears to outperform other state of the art embedding algorithms on Freebase and YAGO KGs.

This work means that existing systems that leverage KGs can be automatically expanded by converting to vector embedding form, either maintaining the graph structure or switching to embedding alone depending on the use case.