218 — Tell Me Why Is It So? Explaining Knowledge Graph Relationships by Finding Descriptive Support Passages

Bhatia, Dwivedi, & Kaur (1803.06555)

Read on 26 March 2018
#machine-learning  #knowledge-graph  #wikidata  #wikipedia  #explainability  #unsupervised  #group:IBM 

Knowledge graphs are a great way of representing information, but the graph structure itself carries little information about the source data that was used to generate the graph. Usually, this is a strength: The graph is a distilled version of the source data and presents a cleaner interface to access knowledge.

But in some applications, this can be a weakness: Not being able to explain or point to primary sources in healthcare knowledge graphs can be a deal-breaker.

Bhatia, Dwivedi, & Kaur present a system that, given a source text corpus and a source KG, produces an ordered, probablistic list of the text passages in the corpus most likely to substantiate a fact. This process is unsupervised, and can be extended to other KGs and text corpuses easily.

It’s easy to see how a knowledge graph derived from a medical textbook would benefit greatly from this work: A machine-generated diagnosis could be looked up for substantiating evidence and documentation.

Prior systems were often confused by rarely-occurring terms: If a query included a rare word, many existing systems were likely to find that word in the corpus and return those passages with high confidence, whether or not it actually substantiated the fact.

This system doesn’t generally suffer from this same shortcoming, but the authors note it sometimes has the opposite issue: Occasionally, the text passage it returns is too abstractly relevant, and while it may demonstrate usefulness when understanding the fact, it doesn’t necessarily explain it. One example of this from the paper is asking if a person is a film producer, and receiving a list of Hollywood awards and accomplishments instead of a simple sentence containing this fact.