236 — STAIR Actions: A Video Dataset of Everyday Home Actions

Yoshikawa, Lin, & Takeuchi (1804.04326)

Read on 13 April 2018
#dataset  #home  #action  #video  #labels  #recognition  #moments  #annotation 

I rarely read about datasets, but I decided to make an exception today both because this has been on my mind recently as well as because it is 10:43 at night and I’ve already had a very strong Rickety Cricket at Sugarvale and I’m not emotionally prepared to read another paper.

This dataset contains 100 categories across more than 100,000 video sequences that contain clips of ~5s videos. Each action is a common activity that takes place at home. Unlike similar datasets like Moments in Time (#145), this dataset distinguishes between labels even when they share a verb: For example, “OPEN a door” and “OPEN a beer” are different actions, even though they share the verb “OPEN”.

This has been on my mind lately because I’m counting down the days until I get my pre-ordered Amazon DeepLens. This camera seems only sorta useful — until you pair it with a comprehensive database of common actions for it to recognize. I’m not sure that 1000 videos-per-label is a high enough $n$ to accurately represent an action taking place in any environment in any house, but I think that it gets systems like DeepLens (or DIY versions using home computers or raspberry pis) dramatically closer to being a useful home observer.