TY - JOUR
T1 - Learning visual overlapping image pairs for SfM via CNN fine-tuning with photogrammetric geometry information
AU - Hou, Qianbao
AU - Xia, Rui
AU - Zhang, Jiahuan
AU - Feng, Yu
AU - Zhan, Zongqian
AU - Wang, Xin
N1 - Publisher Copyright:
© 2022 The Author(s)
PY - 2023/2
Y1 - 2023/2
N2 - Efficient and accurate identification of visual overlapping image pairs is an ongoing challenge for large-scale Structure from Motion (SfM). Recently, CNN-based methods have demonstrated the ability to find visually similar image pairs. BoW (Bag-of-Word) or Visual Vocabulary tree (VoC) with hand-crafted or learning-based local features is yet widely embedded in 3D reconstruction tasks. To explore the corresponding differences, in this work, we fine-tuned several popular CNNs (AlexNet, VGG, ResNet) according to the regularities which are tailored for determining visual overlapping image pairs for SfM. More specifically, a new training dataset (called LOIP) consisting of regular photogrammetric images and crowdsourced images from the Internet is generated by fully considering photogrammetric requirements and 3D mesh models. The local regional overlapping information from paired images was employed in fine-tuning procedure. To aggregate feature maps from various channels, learnable multiple NetVLADs for each regional information are employed to further improve the retrieval performance. Comprehensive experiments have been conducted and the obtained results demonstrate that the image retrieval performance is improved, and the cost time of image matching is significantly reduced by applying the identifications of visual overlapping pairs. Furthermore, the SfM results are basically on par with several state-of-the-art CNN-based and VoC methods.1
AB - Efficient and accurate identification of visual overlapping image pairs is an ongoing challenge for large-scale Structure from Motion (SfM). Recently, CNN-based methods have demonstrated the ability to find visually similar image pairs. BoW (Bag-of-Word) or Visual Vocabulary tree (VoC) with hand-crafted or learning-based local features is yet widely embedded in 3D reconstruction tasks. To explore the corresponding differences, in this work, we fine-tuned several popular CNNs (AlexNet, VGG, ResNet) according to the regularities which are tailored for determining visual overlapping image pairs for SfM. More specifically, a new training dataset (called LOIP) consisting of regular photogrammetric images and crowdsourced images from the Internet is generated by fully considering photogrammetric requirements and 3D mesh models. The local regional overlapping information from paired images was employed in fine-tuning procedure. To aggregate feature maps from various channels, learnable multiple NetVLADs for each regional information are employed to further improve the retrieval performance. Comprehensive experiments have been conducted and the obtained results demonstrate that the image retrieval performance is improved, and the cost time of image matching is significantly reduced by applying the identifications of visual overlapping pairs. Furthermore, the SfM results are basically on par with several state-of-the-art CNN-based and VoC methods.1
KW - CNN-based fine-tuning
KW - Image retrieval
KW - NetVLAD
KW - Visual overlapping image pairs
UR - http://www.scopus.com/inward/record.url?scp=85144455024&partnerID=8YFLogxK
U2 - 10.1016/j.jag.2022.103162
DO - 10.1016/j.jag.2022.103162
M3 - Review article
AN - SCOPUS:85144455024
SN - 1569-8432
VL - 116
JO - International Journal of Applied Earth Observation and Geoinformation
JF - International Journal of Applied Earth Observation and Geoinformation
M1 - 103162
ER -