TY - GEN
T1 - Sensor substitution for video-based action recognition
AU - Rupprecht, Christian
AU - Lea, Colin
AU - Tombari, Federico
AU - Navab, Nassir
AU - Hager, Gregory D.
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/11/28
Y1 - 2016/11/28
N2 - There are many applications where domainspecific sensing, such as accelerometers, kinematics, or force sensing, provide unique and important information for control or for analysis of motion. However, it is not always the case that these sensors can be deployed or accessed beyond laboratory environments. For example, it is possible to instrument humans or robots to measure motion in the laboratory in ways that it is not possible to replicate in the wild. An alternative, which we explore in this paper, is to address situations where accurate sensing is available while training an algorithm, but for which only video is available for deployment. We present two examples of this sensory substitution methodology. The first variation trains a convolutional neural network to regress real-valued signals, including robot end-effector pose, from video. The second example regresses binary signals derived from accelerometer data which signifies when specific objects are in motion. We evaluate these on the JIGSAWS dataset for robotic surgery training assessment and the 50 Salads dataset for modeling complex structured cooking tasks. We evaluate the trained models for video-based action recognition and show that the trained models provide information that is comparable to the sensory signals they replace.
AB - There are many applications where domainspecific sensing, such as accelerometers, kinematics, or force sensing, provide unique and important information for control or for analysis of motion. However, it is not always the case that these sensors can be deployed or accessed beyond laboratory environments. For example, it is possible to instrument humans or robots to measure motion in the laboratory in ways that it is not possible to replicate in the wild. An alternative, which we explore in this paper, is to address situations where accurate sensing is available while training an algorithm, but for which only video is available for deployment. We present two examples of this sensory substitution methodology. The first variation trains a convolutional neural network to regress real-valued signals, including robot end-effector pose, from video. The second example regresses binary signals derived from accelerometer data which signifies when specific objects are in motion. We evaluate these on the JIGSAWS dataset for robotic surgery training assessment and the 50 Salads dataset for modeling complex structured cooking tasks. We evaluate the trained models for video-based action recognition and show that the trained models provide information that is comparable to the sensory signals they replace.
UR - http://www.scopus.com/inward/record.url?scp=85006513295&partnerID=8YFLogxK
U2 - 10.1109/IROS.2016.7759769
DO - 10.1109/IROS.2016.7759769
M3 - Conference contribution
AN - SCOPUS:85006513295
T3 - IEEE International Conference on Intelligent Robots and Systems
SP - 5230
EP - 5237
BT - IROS 2016 - 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2016
Y2 - 9 October 2016 through 14 October 2016
ER -