TY - GEN
T1 - Emotion sensitive speech control for human-robot interaction in minimal invasive surgery
AU - Schuller, Björn
AU - Rigoll, Gerhard
AU - Can, Salman
AU - Feussner, Hubertus
PY - 2008
Y1 - 2008
N2 - Minimal Invasive Surgery demands for utmost precise and reliable camera control to prevent any harm to the patient during operations. We therefore introduce a robot-driven camera that can be controlled either manually by a joystick, or by speech to ensure free hands and feet, and reduced cognitive workload of the surgeon. Speech control is chosen as simple, yet highly robust command and control application. However, due to high stress, and partially fatigue, emotional factors can play a life decisive role in the operational situation. As any misunderstanding of the surgeon's intent can easily lead to patient injuries by mis-movement of the camera, emotional factors are integrated in the human-robot interaction. In this work we therefore discuss the recording of a 3,035 turns database of spontaneous emotional speech in real life surgical operations. Known to be a challenge, we employ a high dimensional acoustic feature space, and subset optimization for recognition of positive versus negative emotion for interaction adaptation, surgeon self-monitoring, and potential adaptation of acoustic models within speech recognition. Promising 75.5% mean accuracy can be reported in a cross-operation recognition task given the severe condition of usage in real medical operations.
AB - Minimal Invasive Surgery demands for utmost precise and reliable camera control to prevent any harm to the patient during operations. We therefore introduce a robot-driven camera that can be controlled either manually by a joystick, or by speech to ensure free hands and feet, and reduced cognitive workload of the surgeon. Speech control is chosen as simple, yet highly robust command and control application. However, due to high stress, and partially fatigue, emotional factors can play a life decisive role in the operational situation. As any misunderstanding of the surgeon's intent can easily lead to patient injuries by mis-movement of the camera, emotional factors are integrated in the human-robot interaction. In this work we therefore discuss the recording of a 3,035 turns database of spontaneous emotional speech in real life surgical operations. Known to be a challenge, we employ a high dimensional acoustic feature space, and subset optimization for recognition of positive versus negative emotion for interaction adaptation, surgeon self-monitoring, and potential adaptation of acoustic models within speech recognition. Promising 75.5% mean accuracy can be reported in a cross-operation recognition task given the severe condition of usage in real medical operations.
UR - http://www.scopus.com/inward/record.url?scp=52949090823&partnerID=8YFLogxK
U2 - 10.1109/ROMAN.2008.4600708
DO - 10.1109/ROMAN.2008.4600708
M3 - Conference contribution
AN - SCOPUS:52949090823
SN - 9781424422135
T3 - Proceedings of the 17th IEEE International Symposium on Robot and Human Interactive Communication, RO-MAN
SP - 453
EP - 458
BT - Proceedings of the 17th IEEE International Symposium on Robot and Human Interactive Communication, RO-MAN
T2 - 17th IEEE International Symposium on Robot and Human Interactive Communication, RO-MAN
Y2 - 1 August 2008 through 3 August 2008
ER -