Flar-SVD: Fast and Latency-Aware Singular Value Decomposition for Model Compression

  • Moritz Thoma
  • , Jorge Villasante
  • , Emad Aghajanzadeh
  • , Shambhavi Balamuthu Sampath
  • , Pierpaolo Mori
  • , Maximilian Groetzinger
  • , Daniil Dylkin
  • , Manoj Rohit Vemparala
  • , Nael Fasfous
  • , Alexander Frickenstein
  • , Daniel Mueller-Gritschneder
  • , Ulf Schlichtmann

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Advanced deep learning architectures have achieved exceptional prediction performance but come with significant computational demands, posing challenges for deployment on resource-constrained devices such as edge devices. While pruning techniques offer a way to reduce model complexity, they often lead to substantial accuracy loss and can require extensive retraining. Alternatively, Singular Value Decomposition (SVD) provides a promising solution by decomposing model weights into lower-dimensional representations, thus maintaining a closer representation of the original features and preserving accuracy. Despite progress in this domain, approaches targeted on vision model architectures typically rely on uniform compression or slow, computationally expensive rank search methods that do not account for latency improvements. In this paper, we introduce Fast, Latency-Aware Rank Singular Value Decomposition (FLAR-SVD), a novel approach that leverages inherent SVD properties to accelerate the rank search process and incorporates latency tuning to further optimize performance for hardware targets. We demonstrate the capability of our approach across CNN, ViT and Mamba architectures on both server and edge hardware. For DeiT we achieve 81.0 % accuracy on ImageNet with only 1 epoch of fine-tuning, while reducing latency by 30 % over the baseline. Our code is available in https://github.com/MoritzTho/FLAR-SVD.

Original languageEnglish
Title of host publicationProceedings - 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2025
PublisherIEEE Computer Society
Pages1889-1898
Number of pages10
ISBN (Electronic)9798331599942
DOIs
StatePublished - 2025
Event2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2025 - Nashville, United States
Duration: 11 Jun 202512 Jun 2025

Publication series

NameIEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
ISSN (Print)2160-7508
ISSN (Electronic)2160-7516

Conference

Conference2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2025
Country/TerritoryUnited States
CityNashville
Period11/06/2512/06/25

Keywords

  • convolutional neural networks
  • deep learning
  • edge computing
  • latency-aware optimization
  • model pruning
  • singular value decomposition

Fingerprint

Dive into the research topics of 'Flar-SVD: Fast and Latency-Aware Singular Value Decomposition for Model Compression'. Together they form a unique fingerprint.

Cite this