TY - GEN
T1 - Flar-SVD
T2 - 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2025
AU - Thoma, Moritz
AU - Villasante, Jorge
AU - Aghajanzadeh, Emad
AU - Sampath, Shambhavi Balamuthu
AU - Mori, Pierpaolo
AU - Groetzinger, Maximilian
AU - Dylkin, Daniil
AU - Vemparala, Manoj Rohit
AU - Fasfous, Nael
AU - Frickenstein, Alexander
AU - Mueller-Gritschneder, Daniel
AU - Schlichtmann, Ulf
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Advanced deep learning architectures have achieved exceptional prediction performance but come with significant computational demands, posing challenges for deployment on resource-constrained devices such as edge devices. While pruning techniques offer a way to reduce model complexity, they often lead to substantial accuracy loss and can require extensive retraining. Alternatively, Singular Value Decomposition (SVD) provides a promising solution by decomposing model weights into lower-dimensional representations, thus maintaining a closer representation of the original features and preserving accuracy. Despite progress in this domain, approaches targeted on vision model architectures typically rely on uniform compression or slow, computationally expensive rank search methods that do not account for latency improvements. In this paper, we introduce Fast, Latency-Aware Rank Singular Value Decomposition (FLAR-SVD), a novel approach that leverages inherent SVD properties to accelerate the rank search process and incorporates latency tuning to further optimize performance for hardware targets. We demonstrate the capability of our approach across CNN, ViT and Mamba architectures on both server and edge hardware. For DeiT we achieve 81.0 % accuracy on ImageNet with only 1 epoch of fine-tuning, while reducing latency by 30 % over the baseline. Our code is available in https://github.com/MoritzTho/FLAR-SVD.
AB - Advanced deep learning architectures have achieved exceptional prediction performance but come with significant computational demands, posing challenges for deployment on resource-constrained devices such as edge devices. While pruning techniques offer a way to reduce model complexity, they often lead to substantial accuracy loss and can require extensive retraining. Alternatively, Singular Value Decomposition (SVD) provides a promising solution by decomposing model weights into lower-dimensional representations, thus maintaining a closer representation of the original features and preserving accuracy. Despite progress in this domain, approaches targeted on vision model architectures typically rely on uniform compression or slow, computationally expensive rank search methods that do not account for latency improvements. In this paper, we introduce Fast, Latency-Aware Rank Singular Value Decomposition (FLAR-SVD), a novel approach that leverages inherent SVD properties to accelerate the rank search process and incorporates latency tuning to further optimize performance for hardware targets. We demonstrate the capability of our approach across CNN, ViT and Mamba architectures on both server and edge hardware. For DeiT we achieve 81.0 % accuracy on ImageNet with only 1 epoch of fine-tuning, while reducing latency by 30 % over the baseline. Our code is available in https://github.com/MoritzTho/FLAR-SVD.
KW - convolutional neural networks
KW - deep learning
KW - edge computing
KW - latency-aware optimization
KW - model pruning
KW - singular value decomposition
UR - https://www.scopus.com/pages/publications/105017845776
U2 - 10.1109/CVPRW67362.2025.00178
DO - 10.1109/CVPRW67362.2025.00178
M3 - Conference contribution
AN - SCOPUS:105017845776
T3 - IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
SP - 1889
EP - 1898
BT - Proceedings - 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2025
PB - IEEE Computer Society
Y2 - 11 June 2025 through 12 June 2025
ER -