TY - JOUR
T1 - Estimation of 6D Pose of Objects Based on a Variant Adversarial Autoencoder
AU - Huang, Dan
AU - Ahn, Hyemin
AU - Li, Shile
AU - Hu, Yueming
AU - Lee, Dongheui
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
PY - 2023/12
Y1 - 2023/12
N2 - The goal of this paper is to estimate object’s 6D pose based on the texture-less dataset. The pose of each projection view is obtained by rendering the 3D model of each object, and then the orientation feature of the object is implicitly represented by the latent space obtained from the RGB image. The 3D rotation of the object is estimated by establishing the codebook based on a template matching architecture. To build the latent space from the RGB images, this paper proposes a network based on a variant Adversarial Autoencoder (Makhzani et al. in Computer Science, 2015). To train the network, we use the dataset without pose annotation, and the encoder and decoder do not have a structural symmetry. The encoder is inspired by the existing model (Yang et al. in proceedings of IJCAI, 2018), (Yang et al. in proceedings 11 of CVPR, 2019) that incorporates the function of feature extraction from two different streams. Based on this network, the latent feature vector that implicitly represents the orientation of the object is obtained from the RGB image. Experimental results show that the method in this paper can realize the 6D pose estimation of the object and the result accuracy is better than the advanced method (Sundermeyer et al. in proceedings of ECCV, 2018).
AB - The goal of this paper is to estimate object’s 6D pose based on the texture-less dataset. The pose of each projection view is obtained by rendering the 3D model of each object, and then the orientation feature of the object is implicitly represented by the latent space obtained from the RGB image. The 3D rotation of the object is estimated by establishing the codebook based on a template matching architecture. To build the latent space from the RGB images, this paper proposes a network based on a variant Adversarial Autoencoder (Makhzani et al. in Computer Science, 2015). To train the network, we use the dataset without pose annotation, and the encoder and decoder do not have a structural symmetry. The encoder is inspired by the existing model (Yang et al. in proceedings of IJCAI, 2018), (Yang et al. in proceedings 11 of CVPR, 2019) that incorporates the function of feature extraction from two different streams. Based on this network, the latent feature vector that implicitly represents the orientation of the object is obtained from the RGB image. Experimental results show that the method in this paper can realize the 6D pose estimation of the object and the result accuracy is better than the advanced method (Sundermeyer et al. in proceedings of ECCV, 2018).
KW - 6D pose
KW - Adversarial autoencoder
KW - RGB image
KW - Self-supervised learning
UR - http://www.scopus.com/inward/record.url?scp=85149822755&partnerID=8YFLogxK
U2 - 10.1007/s11063-023-11215-2
DO - 10.1007/s11063-023-11215-2
M3 - Article
AN - SCOPUS:85149822755
SN - 1370-4621
VL - 55
SP - 9581
EP - 9596
JO - Neural Processing Letters
JF - Neural Processing Letters
IS - 7
ER -