298 — CrisisMMD: Multimodal Twitter Datasets from Natural Disasters

Alam, Ofli, & Imran (1805.00713)

Read on 14 June 2018
#dataset  #twitter  #society  #NLP  #imagery  #humanitarian-aid  #public-health 

Twitter is a hugely useful resource for humanitarian aid during natural disasters or other crises. But extracting information from a natural language corpus such as Twitter is really really hard.

This work looked at several recent disasters, including the 2017 Sri Lankan floods, hurricanes Harvey and Maria, wildfires, and earthquakes. The researchers polled Twitter for “multimodal” content, filtering posts for those that contained both text and imagery. The dataset that they producd aims to satisfy three main goals: it provides “humanitarian” categories — each image is categorized into the type of damage or recovery that it shows (such as “vehicle damage”, “rescue”, “missing / found people”, etc); it provides a level of informativeness, as gauged by human annotators (useful vs not-useful markings); and it provides an assessment of the severity of the damage shown in the picture (as graded by a rubric). All annotations were performed by crowd-sourced human annotators, with more than one annotator grading each image.

This dataset, dubbed CrisisMMD, is available for public use in order to train humans or machines to better respond to disasters quickly, to reduce harm to humans by improving the response speed and filtering efficacy of digital humanitarian efforts.