TY - GEN
T1 - CloudAttention
T2 - 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2022
AU - Saleh, Mahdi
AU - Wang, Yige
AU - Navab, Nassir
AU - Busam, Benjamin
AU - Tombari, Federico
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Processing 3D data efficiently has always been a challenge. Spatial operations on large-scale point clouds, stored as sparse data, require extra cost. Attracted by the success of transformers, researchers are using multi-head attention for vision tasks. However, attention calculations in transformers come with quadratic complexity in the number of inputs and miss spatial intuition on sets like point clouds. We redesign set transformers in this work and incorporate them into a hierarchical framework for shape classification and part and scene segmentation. We propose our local attention unit, which captures features in a spatial neighborhood. We also compute efficient and dynamic global cross attentions by leveraging sampling and grouping at each iteration. Finally, to mitigate the non-heterogeneity of point clouds, we propose an efficient Multi-Scale Tokenization (MST), which extracts scale-invariant tokens for attention operations. The proposed hierarchical model achieves state-of-the-art shape classification in mean accuracy and yields results on par with the previous segmentation methods while requiring significantly fewer computations. Our proposed architecture predicts segmentation labels with around half the latency and parameter count of the previous most effi-cient method with comparable performance. The code is available at https://github.com/YigeWang-WHU/CloudAttention.
AB - Processing 3D data efficiently has always been a challenge. Spatial operations on large-scale point clouds, stored as sparse data, require extra cost. Attracted by the success of transformers, researchers are using multi-head attention for vision tasks. However, attention calculations in transformers come with quadratic complexity in the number of inputs and miss spatial intuition on sets like point clouds. We redesign set transformers in this work and incorporate them into a hierarchical framework for shape classification and part and scene segmentation. We propose our local attention unit, which captures features in a spatial neighborhood. We also compute efficient and dynamic global cross attentions by leveraging sampling and grouping at each iteration. Finally, to mitigate the non-heterogeneity of point clouds, we propose an efficient Multi-Scale Tokenization (MST), which extracts scale-invariant tokens for attention operations. The proposed hierarchical model achieves state-of-the-art shape classification in mean accuracy and yields results on par with the previous segmentation methods while requiring significantly fewer computations. Our proposed architecture predicts segmentation labels with around half the latency and parameter count of the previous most effi-cient method with comparable performance. The code is available at https://github.com/YigeWang-WHU/CloudAttention.
UR - https://www.scopus.com/pages/publications/85146314770
U2 - 10.1109/IROS47612.2022.9982276
DO - 10.1109/IROS47612.2022.9982276
M3 - Conference contribution
AN - SCOPUS:85146314770
T3 - IEEE International Conference on Intelligent Robots and Systems
SP - 1986
EP - 1992
BT - 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2022
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 23 October 2022 through 27 October 2022
ER -