TY - JOUR
T1 - Efficient Grasp Detection Network With Gaussian-Based Grasp Representation for Robotic Manipulation
AU - Cao, Hu
AU - Chen, Guang
AU - Li, Zhijun
AU - Feng, Qian
AU - Lin, Jianjie
AU - Knoll, Alois
N1 - Publisher Copyright:
© 1996-2012 IEEE.
PY - 2023/6/1
Y1 - 2023/6/1
N2 - Deep learning methods have achieved excellent results in the field of grasp detection. However, deep learning-based models for general object detection lack the proper balance of accuracy and inference speed, resulting in poor performance in real-time grasp tasks. This work proposes an efficient grasp detection network with n-channel images as inputs for robotic grasp. The proposed network is a lightweight generative structure for grasp detection in one stage. Specifically, a Gaussian kernel-based grasp representation is introduced to encode the training samples, embodying the maximum center that possesses the highest grasp confidence. A receptive field block is plugged into the bottleneck to improve the model's feature discriminability. In addition, pixel-based and channel-based attention mechanisms are used to construct a multidimensional attention fusion network to fuse valuable semantic information, achieved by suppressing noisy features and highlighting object features. The proposed method is evaluated on the Cornell, Jacquard, and extended OCID grasp datasets. The experimental results show that our method achieves excellent balancing accuracy and running speed performance. The network gets a running speed of 6ms, achieving better performance on the Cornell, Jacquard, and extended OCID grasp datasets with 97.8, 95.6, and 76.4% accuracy, respectively. Subsequently, an excellent grasp success rate in a physical environment is obtained using the UR5 robot arm.
AB - Deep learning methods have achieved excellent results in the field of grasp detection. However, deep learning-based models for general object detection lack the proper balance of accuracy and inference speed, resulting in poor performance in real-time grasp tasks. This work proposes an efficient grasp detection network with n-channel images as inputs for robotic grasp. The proposed network is a lightweight generative structure for grasp detection in one stage. Specifically, a Gaussian kernel-based grasp representation is introduced to encode the training samples, embodying the maximum center that possesses the highest grasp confidence. A receptive field block is plugged into the bottleneck to improve the model's feature discriminability. In addition, pixel-based and channel-based attention mechanisms are used to construct a multidimensional attention fusion network to fuse valuable semantic information, achieved by suppressing noisy features and highlighting object features. The proposed method is evaluated on the Cornell, Jacquard, and extended OCID grasp datasets. The experimental results show that our method achieves excellent balancing accuracy and running speed performance. The network gets a running speed of 6ms, achieving better performance on the Cornell, Jacquard, and extended OCID grasp datasets with 97.8, 95.6, and 76.4% accuracy, respectively. Subsequently, an excellent grasp success rate in a physical environment is obtained using the UR5 robot arm.
KW - Efficient grasp detection
KW - Gaussian-based grasp representation (GGR)
KW - fully convolutional neural network
KW - multidimension attention fusion
KW - receptive field block (RFB)
UR - http://www.scopus.com/inward/record.url?scp=85144743435&partnerID=8YFLogxK
U2 - 10.1109/TMECH.2022.3224314
DO - 10.1109/TMECH.2022.3224314
M3 - Article
AN - SCOPUS:85144743435
SN - 1083-4435
VL - 28
SP - 1384
EP - 1394
JO - IEEE/ASME Transactions on Mechatronics
JF - IEEE/ASME Transactions on Mechatronics
IS - 3
ER -