TY - JOUR
T1 - Multi-path learning for object pose estimation across domains
AU - Sundermeyer, Martin
AU - Durner, Maximilian
AU - Puang, En Yen
AU - Marton, Zoltan Csaba
AU - Vaskevicius, Narunas
AU - Arras, Kai O.
AU - Triebel, Rudolph
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020
Y1 - 2020
N2 - We introduce a scalable approach for object pose estimation trained on simulated RGB views of multiple 3D models together. We learn an encoding of object views that does not only describe an implicit orientation of all objects seen during training, but can also relate views of untrained objects. Our single-encoder-multi-decoder network is trained using a technique we denote 'multi-path learning': While the encoder is shared by all objects, each decoder only reconstructs views of a single object. Consequently, views of different instances do not have to be separated in the latent space and can share common features. The resulting encoder generalizes well from synthetic to real data and across various instances, categories, model types and datasets. We systematically investigate the learned encodings, their generalization, and iterative refinement strategies on the ModelNet40 and T-LESS dataset. Despite training jointly on multiple objects, our 6D Object Detection pipeline achieves state-of-the-art results on T-LESS at much lower runtimes than competing approaches.
AB - We introduce a scalable approach for object pose estimation trained on simulated RGB views of multiple 3D models together. We learn an encoding of object views that does not only describe an implicit orientation of all objects seen during training, but can also relate views of untrained objects. Our single-encoder-multi-decoder network is trained using a technique we denote 'multi-path learning': While the encoder is shared by all objects, each decoder only reconstructs views of a single object. Consequently, views of different instances do not have to be separated in the latent space and can share common features. The resulting encoder generalizes well from synthetic to real data and across various instances, categories, model types and datasets. We systematically investigate the learned encodings, their generalization, and iterative refinement strategies on the ModelNet40 and T-LESS dataset. Despite training jointly on multiple objects, our 6D Object Detection pipeline achieves state-of-the-art results on T-LESS at much lower runtimes than competing approaches.
UR - http://www.scopus.com/inward/record.url?scp=85094819513&partnerID=8YFLogxK
U2 - 10.1109/CVPR42600.2020.01393
DO - 10.1109/CVPR42600.2020.01393
M3 - Conference article
AN - SCOPUS:85094819513
SN - 1063-6919
SP - 13913
EP - 13922
JO - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
JF - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
M1 - 9157613
T2 - 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020
Y2 - 14 June 2020 through 19 June 2020
ER -