143 — SemMedDB: a PubMed-scale repository of biomedical semantic predications
Kilicoglu et al (10.1093/bioinformatics/bts591)
Read on 10 January 2018SemMedDB is a database that lists semantic predictions that have been automatically extracted from PubMed-listed article titles and abstracts. For example, it might read this title and decide that (SemMedDB
DATABASE_OF
biomedical semantic predictions
)… admittedly an underwhelming example… A better example is this one, from the paper:
Infection-CAUSES-Guillain-Barre Syndrome
from the sentence
Infections can trigger GBS.
SemRep, the engine used to extract semantics from free text, looks for common predicates to fill its knowledge-graph such as INHIBITS
, TREATS
, STIMULATES
, etc. The database includes gene-disease relationships, pharmacogenomics, synonymization, and more… and achieves mid-fifties to low-eighties precision and recall when compared against human responses. Of the data used to construct the KG, almost all are parsed sentences; some other data-sources are lists of citations (suggesting associations between topics) and summary-statistics computed over the KG itself.
This system is stored and queriable as a SQL database (which is interesting, since there are so many graph databases available now); it can be reached at https://skr3.nlm.nih.gov/SemMedDB/.
This sort of platform is very interesting for automating medical insights; even though the low precision/recall means that it’s not necessarily quite useful in a clinical setting yet, improvements to SemRep (or swapping out for another NLP parser) could make big improvements in automatable literature search and understanding.