TY - GEN
T1 - Waveform signal entropy and compression study of whole-building energy datasets
AU - Kriechbaumer, Thomas
AU - Jorde, Daniel
AU - Jacobsen, Hans Arno
N1 - Publisher Copyright:
© 2019 Association for Computing Machinery.
PY - 2019/6/15
Y1 - 2019/6/15
N2 - Electrical energy consumption has been an ongoing research area since the coming of smart homes and Internet of Things. Consumption characteristics and usages profiles are directly influenced by building occupants and their interaction with electrical appliances. Data analysis together with machine learning models can be utilized to extract valuable information for the benefit of occupants themselves (conserve energy and increase comfort levels), power plants (maintenance), and grid operators (stability). Public energy datasets provide a scientific foundation to develop and benchmark these algorithms and techniques. With datasets exceeding tens of terabytes, we present a novel study of five whole-building energy datasets with high sampling rates, their signal entropy, and how a well-calibrated measurement can have a significant effect on the overall storage requirements. We show that some datasets do not fully utilize the available measurement precision, therefore leaving potential accuracy and space savings untapped. We benchmark a comprehensive list of 365 file formats, transparent data transformations, and lossless compression algorithms. The primary goal is to reduce the overall dataset size while maintaining an easy-to-use file format and access API. We show that with careful selection of file format and encoding scheme,we can reduce the size of some datasets by up to 73%.
AB - Electrical energy consumption has been an ongoing research area since the coming of smart homes and Internet of Things. Consumption characteristics and usages profiles are directly influenced by building occupants and their interaction with electrical appliances. Data analysis together with machine learning models can be utilized to extract valuable information for the benefit of occupants themselves (conserve energy and increase comfort levels), power plants (maintenance), and grid operators (stability). Public energy datasets provide a scientific foundation to develop and benchmark these algorithms and techniques. With datasets exceeding tens of terabytes, we present a novel study of five whole-building energy datasets with high sampling rates, their signal entropy, and how a well-calibrated measurement can have a significant effect on the overall storage requirements. We show that some datasets do not fully utilize the available measurement precision, therefore leaving potential accuracy and space savings untapped. We benchmark a comprehensive list of 365 file formats, transparent data transformations, and lossless compression algorithms. The primary goal is to reduce the overall dataset size while maintaining an easy-to-use file format and access API. We show that with careful selection of file format and encoding scheme,we can reduce the size of some datasets by up to 73%.
KW - Electricity aggregate
KW - Energy dataset
KW - File format
KW - High sampling rate
KW - Non-intrusive load monitoring
KW - Waveform compression
UR - http://www.scopus.com/inward/record.url?scp=85068670969&partnerID=8YFLogxK
U2 - 10.1145/3307772.3328285
DO - 10.1145/3307772.3328285
M3 - Conference contribution
AN - SCOPUS:85068670969
T3 - e-Energy 2019 - Proceedings of the 10th ACM International Conference on Future Energy Systems
SP - 58
EP - 67
BT - e-Energy 2019 - Proceedings of the 10th ACM International Conference on Future Energy Systems
PB - Association for Computing Machinery, Inc
T2 - 10th ACM International Conference on Future Energy Systems, e-Energy 2019
Y2 - 25 June 2019 through 28 June 2019
ER -