Efficient Stereo Matching Using Swin Transformer and Multilevel Feature Consistency in Autonomous Mobile Systems

Xiaojie Su, Shimin Liu, Rui Li, Zhenshan Bing, Alois Knoll

Publikation: Beitrag in FachzeitschriftArtikelBegutachtung

Abstract

In this article, we propose a Swin Transformer and multilevel Feature Consistency based Network (STFC-Net), which is a multilevel cascade stereo matching method to predict the disparity in a coarse-to-fine manner. 1) To alleviate the problem of the limited receptive field of existing convolutional neural network (CNN)-based methods, inspired by the capability of modeling the large-scale dependence of transformer, we adopt a multilevel feature extraction module combining CNN and Swin Transformer to capture long-range context information; a multiscale cascaded cost aggregation module is used to cover different image regions with less memory consumption. 2) To make full use of the hierarchical features, we checked the multilevel left-right feature consistency in an unsupervised manner to improve the disparity accuracy. The experimental results show that our method outperforms some previous CNN methods on the Scene Flow and KITTI datasets with lower computational time complexity. Moreover, it generalizes well in some unknown and challenging real-world scenarios.

OriginalspracheEnglisch
Seiten (von - bis)7957-7965
Seitenumfang9
FachzeitschriftIEEE Transactions on Industrial Informatics
Jahrgang20
Ausgabenummer5
DOIs
PublikationsstatusVeröffentlicht - 1 Mai 2024

Fingerprint

Untersuchen Sie die Forschungsthemen von „Efficient Stereo Matching Using Swin Transformer and Multilevel Feature Consistency in Autonomous Mobile Systems“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren