42 — “Why Should I Trust You?” Explaining the Predictions of Any Classifier
Read on 02 October 2017A common issue with machine-learning models is trust: ML algorithms are commonly black boxes and it is generally impossible to evaluate the decisions or explain the predictions of a classifier. This paper proposes LIME, which is an explanation technique for — the authors claim — any classifier. This enables users to understand how a classifier “thinks”, and if it is trustable or not. They also demonstrate SP-LIME, which picks exemplars and gives explanations to “address the ‘trusting the model’ problem”.
The paper cites Kaufman et al as an example-case where LIME would help determine “trustablity” of a model: In that paper, the authors discuss a medical study where patient ID happened to correlate with target class (perhaps the control subjects were all loaded first and had smaller ID numbers). This would be very hard to find “in the wild” and would likely result in bad classifications — but LIME would show patient ID as a strong correlate of target grouping, and a human evaluator could decide that this model was invalid.
LIME does this by finding close instance points to the one under scrutiny (e.g. select a single post from a large list of newsgroup posts; LIME finds posts that embed near to that one) and delivers them for review to the human user. In the case of text-based SVMs, this system might return a list of words or positions in a text that most highly correlate with a decision; in an image — and there’s a really great figure of this in the paper — LIME might show masks of that original image, highlighting a floofy ear where a model recognizes a dog, or highlighting a fretboard where the model sees a guitar.
Though this has some great uses in NLP or other ML tasks, I see its greatest utility in medical imaging, where physicians input a scan, get back a ‘positive’ result, and can’t currently see why: LIME could highlight the responsible pixels for a radiologist and clearly label what made the algorithm pick positive/negative.