TY - GEN
T1 - One-Shot Medical Video Object Segmentation via Temporal Contrastive Memory Networks
AU - Chen, Yaxiong
AU - Hu, Junjian
AU - Li, Chunlei
AU - Zheng, Zixuan
AU - Hu, Jingliang
AU - Shi, Yilei
AU - Xiong, Shengwu
AU - Zhu, Xiao Xiang
AU - Mou, Lichao
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
PY - 2025
Y1 - 2025
N2 - Video object segmentation is crucial for the efficient analysis of complex medical video data, yet it faces significant challenges in data availability and annotation. We introduce the task of one-shot medical video object segmentation, which requires separating foreground and background pixels throughout a video given only the mask annotation of the first frame. To address this problem, we propose a temporal contrastive memory network comprising image and mask encoders to learn feature representations, a temporal contrastive memory bank that aligns embeddings from adjacent frames while pushing apart distant ones to explicitly model inter-frame relationships and stores these features, and a decoder that fuses encoded image features and memory readouts for segmentation. We also collect a diverse, multi-source medical video dataset spanning various modalities and anatomies to benchmark this task. Extensive experiments demonstrate state-of-the-art performance in segmenting both seen and unseen structures from a single exemplar, showing ability to generalize from scarce labels. This highlights the potential to alleviate annotation burdens for medical video analysis. Code is available at https://github.com/MedAITech/TCMN.
AB - Video object segmentation is crucial for the efficient analysis of complex medical video data, yet it faces significant challenges in data availability and annotation. We introduce the task of one-shot medical video object segmentation, which requires separating foreground and background pixels throughout a video given only the mask annotation of the first frame. To address this problem, we propose a temporal contrastive memory network comprising image and mask encoders to learn feature representations, a temporal contrastive memory bank that aligns embeddings from adjacent frames while pushing apart distant ones to explicitly model inter-frame relationships and stores these features, and a decoder that fuses encoded image features and memory readouts for segmentation. We also collect a diverse, multi-source medical video dataset spanning various modalities and anatomies to benchmark this task. Extensive experiments demonstrate state-of-the-art performance in segmenting both seen and unseen structures from a single exemplar, showing ability to generalize from scarce labels. This highlights the potential to alleviate annotation burdens for medical video analysis. Code is available at https://github.com/MedAITech/TCMN.
KW - medical imaging
KW - memory network
KW - one-shot learning
KW - temporal contrastive learning
KW - video object segmentation
UR - http://www.scopus.com/inward/record.url?scp=85219187533&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-82007-6_23
DO - 10.1007/978-3-031-82007-6_23
M3 - Conference contribution
AN - SCOPUS:85219187533
SN - 9783031820069
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 241
EP - 251
BT - Applications of Medical Artificial Intelligence - 3rd International Workshop, AMAI 2024, Held in Conjunction with MICCAI 2024, Proceedings
A2 - Wu, Shandong
A2 - Shabestari, Behrouz
A2 - Xing, Lei
PB - Springer Science and Business Media Deutschland GmbH
T2 - 3rd International Workshop on Applications of Medical Artificial Intelligence, AMAI 2024 held in conjunction with the 27th International Conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 2024
Y2 - 6 October 2024 through 6 October 2024
ER -