315 — A comparative study of artificial intelligence and human doctors for the purpose of triage and diagnosis

Razzaki et al (1806.10698)

Read on 01 July 2018
#medicine  #diagnosis  #triage  #AI  #machine-learning  #graph-model  #PGM  #BabylonAI 

We’re a far way away from automating doctors out of the art of medicine. Even if diagnostic capabilities were easily emulated by a robot, there’s still more to medicine than comparing patients with textbooks.

But.

Much of the mental heavy-lifting of medicine could be aided by a diagnostic artificial intelligence. One such system that aims to reduce the mental acrobatics of medical practice is a system called Babylon AI. Babylon is built upon a probablistic graph model, which means it’s both deterministic and — more or less — explainable. The authors of Babylon compared its triage and diagnostic ability to that of physicians.

Clinical “vignettes” — designed by hand by physicians — are presented both to humans and to the Babylon AI. Of the seven physicians, an F1 score of 57% was achieved (mean N=56.6). Babylon achieved an F1 of 57.1% (N=100).

So Babylon’s assessment of these vignettes is comparable to the average of several physicians in terms of precision and recall (though AI scored a bit lower on a subjective test evaluated by experts).