315 — A comparative study of artificial intelligence and human doctors for the purpose of triage and diagnosis
Read on 01 July 2018We’re a far way away from automating doctors out of the art of medicine. Even if diagnostic capabilities were easily emulated by a robot, there’s still more to medicine than comparing patients with textbooks.
But.
Much of the mental heavy-lifting of medicine could be aided by a diagnostic artificial intelligence. One such system that aims to reduce the mental acrobatics of medical practice is a system called Babylon AI. Babylon is built upon a probablistic graph model, which means it’s both deterministic and — more or less — explainable. The authors of Babylon compared its triage and diagnostic ability to that of physicians.
Clinical “vignettes” — designed by hand by physicians — are presented both to humans and to the Babylon AI. Of the seven physicians, an F1 score of 57% was achieved (mean N=56.6). Babylon achieved an F1 of 57.1% (N=100).
So Babylon’s assessment of these vignettes is comparable to the average of several physicians in terms of precision and recall (though AI scored a bit lower on a subjective test evaluated by experts).