TY - GEN
T1 - Multimodal People Detection and Tracking in Crowded Scenes
AU - Spinello, Luciano
AU - Triebel, Rudolph
AU - Siegwart, Roland
N1 - Publisher Copyright:
Copyright © 2008, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2008
Y1 - 2008
N2 - This paper presents a novel people detection and tracking method based on a multi-modal sensor fusion approach that utilizes 2D laser range and camera data. The data points in the laser scans are clustered using a novel graph-based method and an SVM based version of the cascaded AdaBoost classifier is trained with a set of geometrical features of these clusters. In the detection phase, the classified laser data is projected into the camera image to define a region of interest for the vision-based people detector. This detector is a fast version of the Implicit Shape Model (ISM) that learns an appearance codebook of local SIFT descriptors from a set of hand-labeled images of pedestrians and uses them in a voting scheme to vote for centers of detected people. The extension consists in a fast and detailed analysis of the spatial distribution of voters per detected person. Each detected person is tracked using a greedy data association method and multiple Extended Kalman Filters that use different motion models. This way, the filter can cope with a variety of different motion patterns. The tracker is asynchronously updated by the detections from the laser and the camera data. Experiments conducted in real-world outdoor scenarios with crowds of pedestrians demonstrate the usefulness of our approach.
AB - This paper presents a novel people detection and tracking method based on a multi-modal sensor fusion approach that utilizes 2D laser range and camera data. The data points in the laser scans are clustered using a novel graph-based method and an SVM based version of the cascaded AdaBoost classifier is trained with a set of geometrical features of these clusters. In the detection phase, the classified laser data is projected into the camera image to define a region of interest for the vision-based people detector. This detector is a fast version of the Implicit Shape Model (ISM) that learns an appearance codebook of local SIFT descriptors from a set of hand-labeled images of pedestrians and uses them in a voting scheme to vote for centers of detected people. The extension consists in a fast and detailed analysis of the spatial distribution of voters per detected person. Each detected person is tracked using a greedy data association method and multiple Extended Kalman Filters that use different motion models. This way, the filter can cope with a variety of different motion patterns. The tracker is asynchronously updated by the detections from the laser and the camera data. Experiments conducted in real-world outdoor scenarios with crowds of pedestrians demonstrate the usefulness of our approach.
UR - http://www.scopus.com/inward/record.url?scp=78651474495&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:78651474495
T3 - Proceedings of the 23rd AAAI Conference on Artificial Intelligence, AAAI 2008
SP - 1409
EP - 1414
BT - Proceedings of the 23rd AAAI Conference on Artificial Intelligence, AAAI 2008
PB - AAAI Press
T2 - 23rd AAAI Conference on Artificial Intelligence, AAAI 2008
Y2 - 13 July 2008 through 17 July 2008
ER -