TY - JOUR
T1 - HumanRF
T2 - High-Fidelity Neural Radiance Fields for Humans in Motion
AU - Işlk, Mustafa
AU - Rünz, Martin
AU - Georgopoulos, Markos
AU - Khakhulin, Taras
AU - Starck, Jonathan
AU - Agapito, Lourdes
AU - Nießner, Matthias
N1 - Publisher Copyright:
© 2023 ACM.
PY - 2023/8/1
Y1 - 2023/8/1
N2 - Representing human performance at high-fidelity is an essential building block in diverse applications, such as film production, computer games or videoconferencing. To close the gap to production-level quality, we introduce HumanRF1, a 4D dynamic neural scene representation that captures full-body appearance in motion from multi-view video input, and enables playback from novel, unseen viewpoints. Our novel representation acts as a dynamic video encoding that captures fine details at high compression rates by factorizing space-time into a temporal matrix-vector decomposition. This allows us to obtain temporally coherent reconstructions of human actors for long sequences, while representing high-resolution details even in the context of challenging motion. While most research focuses on synthesizing at resolutions of 4MP or lower, we address the challenge of operating at 12MP. To this end, we introduce ActorsHQ, a novel multi-view dataset that provides 12MP footage from 160 cameras for 16 sequences with high-fidelity, per-frame mesh reconstructions2. We demonstrate challenges that emerge from using such high-resolution data and show that our newly introduced HumanRF effectively leverages this data, making a significant step towards production-level quality novel view synthesis.
AB - Representing human performance at high-fidelity is an essential building block in diverse applications, such as film production, computer games or videoconferencing. To close the gap to production-level quality, we introduce HumanRF1, a 4D dynamic neural scene representation that captures full-body appearance in motion from multi-view video input, and enables playback from novel, unseen viewpoints. Our novel representation acts as a dynamic video encoding that captures fine details at high compression rates by factorizing space-time into a temporal matrix-vector decomposition. This allows us to obtain temporally coherent reconstructions of human actors for long sequences, while representing high-resolution details even in the context of challenging motion. While most research focuses on synthesizing at resolutions of 4MP or lower, we address the challenge of operating at 12MP. To this end, we introduce ActorsHQ, a novel multi-view dataset that provides 12MP footage from 160 cameras for 16 sequences with high-fidelity, per-frame mesh reconstructions2. We demonstrate challenges that emerge from using such high-resolution data and show that our newly introduced HumanRF effectively leverages this data, making a significant step towards production-level quality novel view synthesis.
KW - free-view video synthesis
KW - neural rendering
UR - http://www.scopus.com/inward/record.url?scp=85163934151&partnerID=8YFLogxK
U2 - 10.1145/3592415
DO - 10.1145/3592415
M3 - Article
AN - SCOPUS:85163934151
SN - 0730-0301
VL - 42
JO - ACM Transactions on Graphics
JF - ACM Transactions on Graphics
IS - 4
M1 - 98
ER -