TY - GEN
T1 - Contextformer
T2 - 17th European Conference on Computer Vision, ECCV 2022
AU - Koyuncu, A. Burakhan
AU - Gao, Han
AU - Boev, Atanas
AU - Gaikov, Georgii
AU - Alshina, Elena
AU - Steinbach, Eckehard
N1 - Publisher Copyright:
© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2022
Y1 - 2022
N2 - Entropy modeling is a key component for high-performance image compression algorithms. Recent developments in autoregressive context modeling helped learning-based methods to surpass their classical counterparts. However, the performance of those models can be further improved due to the underexploited spatio-channel dependencies in latent space, and the suboptimal implementation of context adaptivity. Inspired by the adaptive characteristics of the transformers, we propose a transformer-based context model, named Contextformer, which generalizes the de facto standard attention mechanism to spatio-channel attention. We replace the context model of a modern compression framework with the Contextformer and test it on the widely used Kodak, CLIC2020, and Tecnick image datasets. Our experimental results show that the proposed model provides up to 11% rate savings compared to the standard Versatile Video Coding (VVC) Test Model (VTM) 16.2, and outperforms various learning-based models in terms of PSNR and MS-SSIM.
AB - Entropy modeling is a key component for high-performance image compression algorithms. Recent developments in autoregressive context modeling helped learning-based methods to surpass their classical counterparts. However, the performance of those models can be further improved due to the underexploited spatio-channel dependencies in latent space, and the suboptimal implementation of context adaptivity. Inspired by the adaptive characteristics of the transformers, we propose a transformer-based context model, named Contextformer, which generalizes the de facto standard attention mechanism to spatio-channel attention. We replace the context model of a modern compression framework with the Contextformer and test it on the widely used Kodak, CLIC2020, and Tecnick image datasets. Our experimental results show that the proposed model provides up to 11% rate savings compared to the standard Versatile Video Coding (VVC) Test Model (VTM) 16.2, and outperforms various learning-based models in terms of PSNR and MS-SSIM.
KW - Context model
KW - Learned image compression
KW - Transformers
UR - http://www.scopus.com/inward/record.url?scp=85142707477&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-19800-7_26
DO - 10.1007/978-3-031-19800-7_26
M3 - Conference contribution
AN - SCOPUS:85142707477
SN - 9783031197994
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 447
EP - 463
BT - Computer Vision – ECCV 2022 - 17th European Conference, Proceedings
A2 - Avidan, Shai
A2 - Brostow, Gabriel
A2 - Cissé, Moustapha
A2 - Farinella, Giovanni Maria
A2 - Hassner, Tal
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 23 October 2022 through 27 October 2022
ER -