TY - GEN
T1 - A Compact and Efficient Neural Data Structure for Mutual Information Estimation in Large Timeseries
AU - Farokhmanesh, Fatemeh
AU - Neuhauser, Christoph
AU - Westermann, Rüdiger
N1 - Publisher Copyright:
© 2024 Copyright held by the owner/author(s)
PY - 2024/7/10
Y1 - 2024/7/10
N2 - Database systems face challenges when using mutual information (MI) for analyzing non-linear relationships between large timeseries, due to computational and memory requirements. Interactive workflows are especially hindered by long response times. To address these challenges, we present timeseries neural MI fields (TNMIFs), a compact data structure that has been trained to reconstruct MI efficiently across various time-windows and window positions in large timeseries. We demonstrate learning and reconstruction with a large timeseries dataset comprising 1420 timeseries, each storing data at 1639 timesteps. While the learned data structure consumes only 45 megabytes, it answers queries for the MI estimates between the windows in a selected timeseries and the corresponding windows in all other timeseries within 44 milliseconds. Given a measure of similarity between timeseries based on windowed MI estimates, even the matrix showing all mutual timeseries similarities can be computed in less than 32 seconds. To support measuring dependence between lagged timeseries, an extended data structure learns to reconstruct MI to positively (future) and negatively (past) lagged windows. Using a maximum lag of 64 in both directions decreases query times by about a factor of 10.
AB - Database systems face challenges when using mutual information (MI) for analyzing non-linear relationships between large timeseries, due to computational and memory requirements. Interactive workflows are especially hindered by long response times. To address these challenges, we present timeseries neural MI fields (TNMIFs), a compact data structure that has been trained to reconstruct MI efficiently across various time-windows and window positions in large timeseries. We demonstrate learning and reconstruction with a large timeseries dataset comprising 1420 timeseries, each storing data at 1639 timesteps. While the learned data structure consumes only 45 megabytes, it answers queries for the MI estimates between the windows in a selected timeseries and the corresponding windows in all other timeseries within 44 milliseconds. Given a measure of similarity between timeseries based on windowed MI estimates, even the matrix showing all mutual timeseries similarities can be computed in less than 32 seconds. To support measuring dependence between lagged timeseries, an extended data structure learns to reconstruct MI to positively (future) and negatively (past) lagged windows. Using a maximum lag of 64 in both directions decreases query times by about a factor of 10.
KW - Mutual Information
KW - Neural Data Structures
KW - Timeseries Analysis
UR - http://www.scopus.com/inward/record.url?scp=85205002359&partnerID=8YFLogxK
U2 - 10.1145/3676288.3676295
DO - 10.1145/3676288.3676295
M3 - Conference contribution
AN - SCOPUS:85205002359
T3 - ACM International Conference Proceeding Series
BT - Scientific and Statistical Database Management
A2 - Ibrahim, Shadi
A2 - Byna, Suren
A2 - Allard, Tristan
A2 - Lofstead, Jay
A2 - Zhou, Amelie Chi
A2 - Bouadi, Tassadit
A2 - Boukhobza, Jalil
A2 - Moise, Diana
A2 - Tedeschi, Cedric
A2 - Bez, Jean Luca
PB - Association for Computing Machinery
T2 - 36th International Conference on Scientific and Statistical Database Management, SSDBM 2024
Y2 - 10 July 2024 through 12 July 2024
ER -