77 — Framework for evaluation of sound event detection in web videos

Badlani et al (1711.00804)

Read on 06 November 2017
#sound  #signal-processing  #video  #machine-learning 

Badlani et al explore a system for indexing websites based on video content. Because videos are rarely annotated in a machine-readable way, this task is left to a machine-intelligence system: The proposed framework crawls videos using sound queries — essentially acting as a reverse-search for sound information.

The presented system takes a three-step approach:

Crawl: The sounds are first pulled from a small downloaded corpus.

Hear: The sounds are categorized using various feature extraction libraries.

Feedback: Human response refines the effectiveness of the labels assigned to the sounds.

Current state-of-the-art (YouTube) annotation algorithms are not openly available, which makes vetting this algorithm difficult. However, the authors demonstrate that the algorithm achieves high agreement with human proofreaders. This work can be used for content aggregation as well as quick parsing of video content which would otherwise require expensive and time-consuming human intervention.