TY - GEN
T1 - When Regression Meets Manifold Learning for Object Recognition and Pose Estimation
AU - Bui, Mai
AU - Zakharov, Sergey
AU - Albarqouni, Shadi
AU - Ilic, Slobodan
AU - Navab, Nassir
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/9/10
Y1 - 2018/9/10
N2 - In this work, we propose a method for object recognition and pose estimation from depth images using convolutional neural networks. Previous methods addressing this problem rely on manifold learning to learn low dimensional viewpoint descriptors and employ them in a nearest neighbor search on an estimated descriptor space. In comparison we create an efficient multi-task learning framework combining manifold descriptor learning and pose regression. By combining the strengths of manifold learning using triplet loss and pose regression, we could either estimate the pose directly reducing the complexity compared to NN search, or use the learned descriptor for the NN descriptor matching. By in depth experimental evaluation of the novel loss function we observed that the view descriptors learned by the network are much more discriminative resulting in almost 30% increase regarding relative pose accuracy compared to related works. On the other hand, regarding directly regressed poses we obtained important improvement compared to simple pose regression. By leveraging the advantages of both manifold learning and regression tasks, we are able to improve the current state-of-the-art for object recognition and pose retrieval.
AB - In this work, we propose a method for object recognition and pose estimation from depth images using convolutional neural networks. Previous methods addressing this problem rely on manifold learning to learn low dimensional viewpoint descriptors and employ them in a nearest neighbor search on an estimated descriptor space. In comparison we create an efficient multi-task learning framework combining manifold descriptor learning and pose regression. By combining the strengths of manifold learning using triplet loss and pose regression, we could either estimate the pose directly reducing the complexity compared to NN search, or use the learned descriptor for the NN descriptor matching. By in depth experimental evaluation of the novel loss function we observed that the view descriptors learned by the network are much more discriminative resulting in almost 30% increase regarding relative pose accuracy compared to related works. On the other hand, regarding directly regressed poses we obtained important improvement compared to simple pose regression. By leveraging the advantages of both manifold learning and regression tasks, we are able to improve the current state-of-the-art for object recognition and pose retrieval.
UR - http://www.scopus.com/inward/record.url?scp=85063127690&partnerID=8YFLogxK
U2 - 10.1109/ICRA.2018.8460654
DO - 10.1109/ICRA.2018.8460654
M3 - Conference contribution
AN - SCOPUS:85063127690
T3 - Proceedings - IEEE International Conference on Robotics and Automation
SP - 6140
EP - 6146
BT - 2018 IEEE International Conference on Robotics and Automation, ICRA 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2018 IEEE International Conference on Robotics and Automation, ICRA 2018
Y2 - 21 May 2018 through 25 May 2018
ER -