TY - GEN
T1 - GRAtt-VIS
T2 - 27th International Conference on Pattern Recognition, ICPR 2024
AU - Hannan, Tanveer
AU - Koner, Rajat
AU - Bernhard, Maximilian
AU - Shit, Suprosanna
AU - Menze, Bjoern
AU - Tresp, Volker
AU - Schubert, Matthias
AU - Seidl, Thomas
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
PY - 2025
Y1 - 2025
N2 - Video Instance Segmentation (VIS) has seen a growing reliance on query propagation-based methods to model complex and lengthy videos. While these methods dominate the performance, they do not explicitly model discrete events, e.g., occlusion, disappearance, and reappearance. Such events often results in degraded object features over time. We believe learning these events end-to-end with the propagation network would prevent the degradation. To this end, we propose a novel propagation method that models these discrete events with a gating mechanism. First, the gate identifies degraded object features caused by these events. Second, we apply a residual configuration to rectify the feature degradation, alleviating the need for a conventional memory bank. Third, we restrict interaction between relevant and degraded objects with a novel gated self-attention. The gated residual configuration and self-attention forms GRAtt block, which can easily be integrated into the existing propagation frameworks. GRAtt-VIS performs on par with the state-of-the-art methods on YTVIS-19,-21,-22 and challenging OVIS datasets by significantly improving performance over previous methods. The code is available in the supplementary.
AB - Video Instance Segmentation (VIS) has seen a growing reliance on query propagation-based methods to model complex and lengthy videos. While these methods dominate the performance, they do not explicitly model discrete events, e.g., occlusion, disappearance, and reappearance. Such events often results in degraded object features over time. We believe learning these events end-to-end with the propagation network would prevent the degradation. To this end, we propose a novel propagation method that models these discrete events with a gating mechanism. First, the gate identifies degraded object features caused by these events. Second, we apply a residual configuration to rectify the feature degradation, alleviating the need for a conventional memory bank. Third, we restrict interaction between relevant and degraded objects with a novel gated self-attention. The gated residual configuration and self-attention forms GRAtt block, which can easily be integrated into the existing propagation frameworks. GRAtt-VIS performs on par with the state-of-the-art methods on YTVIS-19,-21,-22 and challenging OVIS datasets by significantly improving performance over previous methods. The code is available in the supplementary.
KW - Multi Object Tracking
KW - Video Instance Segmentation
UR - http://www.scopus.com/inward/record.url?scp=85212469804&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-78444-6_18
DO - 10.1007/978-3-031-78444-6_18
M3 - Conference contribution
AN - SCOPUS:85212469804
SN - 9783031784439
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 268
EP - 282
BT - Pattern Recognition - 27th International Conference, ICPR 2024, Proceedings
A2 - Antonacopoulos, Apostolos
A2 - Chaudhuri, Subhasis
A2 - Chellappa, Rama
A2 - Liu, Cheng-Lin
A2 - Bhattacharya, Saumik
A2 - Pal, Umapada
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 1 December 2024 through 5 December 2024
ER -