TY - GEN
T1 - Trajectory Similarity using Compression
AU - Dax, Gabriel
AU - Werner, Martin
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/6
Y1 - 2021/6
N2 - In this paper, we present a novel approach for trajectory similarity based on Kolmogorov complexity approximated by a lossy compression of the original trajectory data using selected features compressed into a concise memory representation by means of a Bloom filter. Given the importance of trajectory data, a linear-Time distance measure with all theoretical guarantees implied by a proper metric is very powerful if it is capturing enough detail for important trajectory mining tasks. This stack of feature extraction combined with feature embedding in a Bloom filter constitutes a lossy compression for trajectory data which can easily be extended with other discrete data like travel mode. In addition, this compression has the needed properties for efficient calculation of a normalized compression distance (NCD) which approximates Kolmogorov complexity. We evaluate this novel trajectory distance measurement using very simple features and k-nearest-neighbor classification on selected realworld datasets with remarkable classification accuracies. Furthermore, we argue that the distance measure is very suited to geospatial big data applications as each trajectory is first transformed into few bits using the lossy compression stack. At time of comparison, the original trajectory geometry is not needed, instead, the sketches suffice. Despite very compressible parameters (equal or less than 1024 bit per trajectory) and very simple features, we already reach classification accuracies for real-world trajectory classification tasks of more than 80% across various datasets.
AB - In this paper, we present a novel approach for trajectory similarity based on Kolmogorov complexity approximated by a lossy compression of the original trajectory data using selected features compressed into a concise memory representation by means of a Bloom filter. Given the importance of trajectory data, a linear-Time distance measure with all theoretical guarantees implied by a proper metric is very powerful if it is capturing enough detail for important trajectory mining tasks. This stack of feature extraction combined with feature embedding in a Bloom filter constitutes a lossy compression for trajectory data which can easily be extended with other discrete data like travel mode. In addition, this compression has the needed properties for efficient calculation of a normalized compression distance (NCD) which approximates Kolmogorov complexity. We evaluate this novel trajectory distance measurement using very simple features and k-nearest-neighbor classification on selected realworld datasets with remarkable classification accuracies. Furthermore, we argue that the distance measure is very suited to geospatial big data applications as each trajectory is first transformed into few bits using the lossy compression stack. At time of comparison, the original trajectory geometry is not needed, instead, the sketches suffice. Despite very compressible parameters (equal or less than 1024 bit per trajectory) and very simple features, we already reach classification accuracies for real-world trajectory classification tasks of more than 80% across various datasets.
KW - Bloom Filter
KW - Compression Distance
KW - Kolmogorov Complexity
KW - Similarity Measure
KW - Trajectories
UR - http://www.scopus.com/inward/record.url?scp=85112388928&partnerID=8YFLogxK
U2 - 10.1109/MDM52706.2021.00035
DO - 10.1109/MDM52706.2021.00035
M3 - Conference contribution
AN - SCOPUS:85112388928
T3 - Proceedings - IEEE International Conference on Mobile Data Management
SP - 169
EP - 174
BT - Proceedings - 2021 22nd IEEE International Conference on Mobile Data Management, MDM 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 22nd IEEE International Conference on Mobile Data Management, MDM 2021
Y2 - 15 June 2021 through 18 June 2021
ER -