143 — SemMedDB: a PubMed-scale repository of biomedical semantic predications

Kilicoglu et al (10.1093/bioinformatics/bts591)

Read on 10 January 2018
#NLP  #medicine  #healthcare  #database  #knowledge-graph  #semantics 

SemMedDB is a database that lists semantic predictions that have been automatically extracted from PubMed-listed article titles and abstracts. For example, it might read this title and decide that (SemMedDB DATABASE_OF biomedical semantic predictions)… admittedly an underwhelming example… A better example is this one, from the paper:

Infection-CAUSES-Guillain-Barre Syndrome

from the sentence

Infections can trigger GBS.

SemRep, the engine used to extract semantics from free text, looks for common predicates to fill its knowledge-graph such as INHIBITS, TREATS, STIMULATES, etc. The database includes gene-disease relationships, pharmacogenomics, synonymization, and more… and achieves mid-fifties to low-eighties precision and recall when compared against human responses. Of the data used to construct the KG, almost all are parsed sentences; some other data-sources are lists of citations (suggesting associations between topics) and summary-statistics computed over the KG itself.

This system is stored and queriable as a SQL database (which is interesting, since there are so many graph databases available now); it can be reached at https://skr3.nlm.nih.gov/SemMedDB/.

This sort of platform is very interesting for automating medical insights; even though the low precision/recall means that it’s not necessarily quite useful in a clinical setting yet, improvements to SemRep (or swapping out for another NLP parser) could make big improvements in automatable literature search and understanding.