TY - JOUR
T1 - NeuroGrasp
T2 - Multimodal Neural Network With Euler Region Regression for Neuromorphic Vision-Based Grasp Pose Estimation
AU - Cao, Hu
AU - Chen, Guang
AU - Li, Zhijun
AU - Hu, Yingbai
AU - Knoll, Alois
N1 - Publisher Copyright:
© 1963-2012 IEEE.
PY - 2022
Y1 - 2022
N2 - Grasp pose estimation is a crucial procedure in robotic manipulation. Most of the current robot grasp manipulation systems are built on frame-based cameras like RGB-D cameras. However, the traditional frame-based grasp pose estimation methods have encountered challenges in scenarios such as low dynamic range and low power consumption. In this work, a neuromorphic vision sensor-dynamic and active-pixel vision sensor (DAVIS)-is introduced to the field of robotic grasp. DAVIS is an event-based bio-inspired vision sensor that records asynchronous streams of local pixel-level light intensity changes, called events. The strengths of DAVIS are it can provide high temporal resolution, high dynamic range, low power consumption, and no motion blur. We construct a neuromorphic vision-based robotic grasp dataset with 154 moving objects, named NeuroGrasp, which is the first RGB-Event multimodality grasp dataset (to the best of our knowledge). This dataset records both RGB frames and the corresponding event streams, providing frame data with rich color and texture information and event streams with high temporal resolution and high dynamic range. Based on the NeuroGrasp dataset, we further develop a multimodal neural network with a specific Euler region regression sub-network (ERRN) to perform grasp pose estimation. Combined with frame-based and event-based vision, the proposed method achieves better performance than the method that only takes RGB frames or event streams as input on the NeuroGrasp dataset.
AB - Grasp pose estimation is a crucial procedure in robotic manipulation. Most of the current robot grasp manipulation systems are built on frame-based cameras like RGB-D cameras. However, the traditional frame-based grasp pose estimation methods have encountered challenges in scenarios such as low dynamic range and low power consumption. In this work, a neuromorphic vision sensor-dynamic and active-pixel vision sensor (DAVIS)-is introduced to the field of robotic grasp. DAVIS is an event-based bio-inspired vision sensor that records asynchronous streams of local pixel-level light intensity changes, called events. The strengths of DAVIS are it can provide high temporal resolution, high dynamic range, low power consumption, and no motion blur. We construct a neuromorphic vision-based robotic grasp dataset with 154 moving objects, named NeuroGrasp, which is the first RGB-Event multimodality grasp dataset (to the best of our knowledge). This dataset records both RGB frames and the corresponding event streams, providing frame data with rich color and texture information and event streams with high temporal resolution and high dynamic range. Based on the NeuroGrasp dataset, we further develop a multimodal neural network with a specific Euler region regression sub-network (ERRN) to perform grasp pose estimation. Combined with frame-based and event-based vision, the proposed method achieves better performance than the method that only takes RGB frames or event streams as input on the NeuroGrasp dataset.
KW - Euler region regression sub-network (ERRN)
KW - Grasp pose estimation
KW - Multimodal fusion
KW - Vision-based robotic manipulation
UR - http://www.scopus.com/inward/record.url?scp=85131731350&partnerID=8YFLogxK
U2 - 10.1109/TIM.2022.3179469
DO - 10.1109/TIM.2022.3179469
M3 - Article
AN - SCOPUS:85131731350
SN - 0018-9456
VL - 71
JO - IEEE Transactions on Instrumentation and Measurement
JF - IEEE Transactions on Instrumentation and Measurement
M1 - 2511111
ER -