281 — Deep Watershed Detector for Music Object Recognition
Read on 28 May 2018Recognizing music from drawn notation is something that humans are… well… honestly, we’re really bad at it. Especially when we’re given a time constraint, this is a notoriously difficult task. (You probably know it by the name “sight-reading.”) And machines aren’t much better, especially when the notation is hand-written: Hand-written notation is much harder to understand because it doesn’t follow the exact notational typesetting procedures (think, “kerning” or “ligatures”) followed by software solutions.
This looks like a job for deep learning!
This technology uses a watershed transform with an engineered energy surface: This energy surface is metaphorically equivalent to “hoisting” regions of the input onto “dry land”: In the final segmentation results, these regions will not fall into a watershed, and will instead be considered part of a boundary class.
Tested on MUSCIMA++ and DeepScores datasets, this system achieves state of the art performance, with the greatest difficulty detecting note heads along ledger lines (which I imagine some hand-tooled training could fix).