114 — FlagIt: A System for Minimally Supervised Human Trafficking Indicator Mining

Kejriwal et al (1712.03086)

Read on 12 December 2017
#sex-trafficking  #nlp  #web-crawler 

One very challenging aspect of stopping human sex trafficking is the difficulty of combing through vast quantities of mostly-harmless webpages in search of intentionally masked or hidden messages.

Using datasets collected by participants in the DARPA MEMEX program to train, the authors created FlagIt, a text-tagging system that automatically flags potential sex trafficking websites, advertisements, or text to be further inspected by humans.

To parse and interpret text, FlagIt extracts text from a web domain using Readability Text Extrator (RTE), which is then passed to the Lightweight Expert System (LES) which uses basic pattern-matching to identify strong indications that a webpage is responsible for illicit activity. High-probability words are then finally passed to a collection of machine learning modules for further interpretation. This streaming-text pipeline is highly scalable, and improves upon state-of-the-art.

This system is already in use by anonymous search engines used by US law enforcement, which is very encouraging because it suggests that the system is already catching criminals with some reliability.