281 — Deep Watershed Detector for Music Object Recognition

Tuggener et al (1805.10548)

Read on 28 May 2018
#OMR  #typography  #neural-network  #machine-learning  #watershed  #computer-vision  #CV  #MUSCIMA++  #DeepScores  #music  #sheet-music  #notation 

Recognizing music from drawn notation is something that humans are… well… honestly, we’re really bad at it. Especially when we’re given a time constraint, this is a notoriously difficult task. (You probably know it by the name “sight-reading.”) And machines aren’t much better, especially when the notation is hand-written: Hand-written notation is much harder to understand because it doesn’t follow the exact notational typesetting procedures (think, “kerning” or “ligatures”) followed by software solutions.

This looks like a job for deep learning!

This technology uses a watershed transform with an engineered energy surface: This energy surface is metaphorically equivalent to “hoisting” regions of the input onto “dry land”: In the final segmentation results, these regions will not fall into a watershed, and will instead be considered part of a boundary class.

Tested on MUSCIMA++ and DeepScores datasets, this system achieves state of the art performance, with the greatest difficulty detecting note heads along ledger lines (which I imagine some hand-tooled training could fix).