Analysis on Temporal Dimension of Inputs for 3D Convolutional Neural Networks

Okan Köpüklü, Gerhard Rigoll

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Scopus citations

Abstract

3D ConvNets provide a dedicated spatiotemporal representation in order to incorporate motion patterns within video frames. However, compared to 2D convolutions, the 3D convolution kernels increase the number of parameters in the architecture and the floating point operations during inference time, which are of critical importance for real-time applications requiring faster runtime. In this paper, we show a sparse sampling and stacking strategy to span large time intervals for 3D ConvNet architectures that can attain multiple times less inference time by relinquishing little amount of classification accuracy. The proposed approach is validated on action and gesture recognition tasks using two recent video datasets: Jester and Something-Something datasets.

Original languageEnglish
Title of host publicationIEEE 3rd International Conference on Image Processing, Applications and Systems, IPAS 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages79-84
Number of pages6
ISBN (Electronic)9781728102474
DOIs
StatePublished - 2 Jul 2018
Event3rd IEEE International Conference on Image Processing, Applications and Systems, IPAS 2018 - Sophia Antipolis, France
Duration: 12 Dec 201814 Dec 2018

Publication series

NameIEEE 3rd International Conference on Image Processing, Applications and Systems, IPAS 2018

Conference

Conference3rd IEEE International Conference on Image Processing, Applications and Systems, IPAS 2018
Country/TerritoryFrance
CitySophia Antipolis
Period12/12/1814/12/18

Fingerprint

Dive into the research topics of 'Analysis on Temporal Dimension of Inputs for 3D Convolutional Neural Networks'. Together they form a unique fingerprint.

Cite this