105 — Now Playing: Continuous low-power music recognition

Agüera y Arcas et al (1711.10958)

Read on 03 December 2017
#music  #audio  #signal-processing  #fingerprinting  #group:google 

The current state of the art music recognition systems generally transmit the fingerprint of the audio they receive to a remote server which contains a database of fingerprints against which it can be compared. This naturally requires a robust internet connection, and also suffers from the shortcoming that the server and database provider has access to a list of songs (and perhaps other metadata) relevant to each user.

The authors of this paper present a local processing pipeline that uses a secondary processor (a DSP) that only wakes the primary processor when music is detected. This is done using a computed log Mel feature vector of a windowed subset of the audio, which is trained on the same database used by the matching system.

Once music is detected, the digital signal processor sends a signal to wake the primary processor, which is then used to match the encountered fingerprint.

Aside from the obvious privacy advantages of this method, I also see this sort of technology being useful for domain-specific applications such as detecting the species of bird by its call or decoding aquatic mammal sounds by looking them up in an index in an environment where no internet connection is available.