51 — A Neural Clickbait Detection Engine: The Tuna Clickbait Detector at the Clickbait Challenge 2017

Gairola et al (1710.01507)

Read on 11 October 2017
#clickbait  #internet  #culture  #machine-learning  #neural-network  #nlp 

Gairola et al present a machine-learning approach to detect clickbait headlines. Citing the spread of clickbait-like channels to grab clicks on Facebook and Google or other advertising behemoths, the authors combine a series of data-embeddings to form one high-dimensional embedding for a post, and use this vector to learn a scalar clickbaitiness level.

This paper combines word2vec, doc2vec, and pre-trained imagenet classifier features to develop a joint image and text classifier for a headline and a post’s image. This means that, unlike most prior work, this technique can glean extra information from image content.

Among other (primarily text-driven) evaluations, this technique leverages the appropriateness of an image given the headline text to determine if a post’s image corresponds in any way to its headline.

I think this type of system has a lot of potential: Adding image classification and evaluation to a net seems to improve the results of the classifier dramatically. I’d be interested to see if the net could perform any better using a “memory” of images it’s seen before; clickbait articles so commonly recycle old or stock images, and I suspect that better results come from what is currently a very human interpretation of “stock-photoiness” or “recycledness” of image content.