TY - GEN
T1 - Living on the edge
T2 - 21st IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2021
AU - Karlstetter, Roman
AU - Raoofy, Amir
AU - Radev, Martin
AU - Trinitis, Carsten
AU - Hermann, Jakob
AU - Schulz, Martin
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/5
Y1 - 2021/5
N2 - Real-time sensor monitoring is critical in many industrial applications and is, e.g., used to model and predict operating conditions to optimize operations as well as to prevent damage in machinery and systems. In many cases, this data is generated by a myriad of sensors and stored or transmitted for post-processing by data analysts. Handling this data near its origin - on the edge - imposes significant challenges for storage and compression: it is necessary to store it in a format that is suitable for large data analytics algorithms, which in most cases means columnar storage. Furthermore, to provide efficient storage and transmission of such sensor data, it must be compressed efficiently. However, existing solutions do not address these challenges sufficiently. In this work, we present a holistic approach for fast streaming of large scale sensor data directly into columnar storage and integrate it with a proven compression scheme. Our approach uses a pipelined scheme for streaming and transposing the data layout, combined with a byte-level transformation of data representation and compression, which we evaluate in comprehensive experiments. As a result, our approach enables transformation of large scale sensor data streams into an efficient, analytics-friendly format already at the sensor site, i.e., on the edge, at data ingestion time. By implementing our optimized approach in the open and widely used columnar storage format Apache Parquet, which we already partly upstreamed, we ensure its accessibility to the community.
AB - Real-time sensor monitoring is critical in many industrial applications and is, e.g., used to model and predict operating conditions to optimize operations as well as to prevent damage in machinery and systems. In many cases, this data is generated by a myriad of sensors and stored or transmitted for post-processing by data analysts. Handling this data near its origin - on the edge - imposes significant challenges for storage and compression: it is necessary to store it in a format that is suitable for large data analytics algorithms, which in most cases means columnar storage. Furthermore, to provide efficient storage and transmission of such sensor data, it must be compressed efficiently. However, existing solutions do not address these challenges sufficiently. In this work, we present a holistic approach for fast streaming of large scale sensor data directly into columnar storage and integrate it with a proven compression scheme. Our approach uses a pipelined scheme for streaming and transposing the data layout, combined with a byte-level transformation of data representation and compression, which we evaluate in comprehensive experiments. As a result, our approach enables transformation of large scale sensor data streams into an efficient, analytics-friendly format already at the sensor site, i.e., on the edge, at data ingestion time. By implementing our optimized approach in the open and widely used columnar storage format Apache Parquet, which we already partly upstreamed, we ensure its accessibility to the community.
KW - edge computing
KW - sensor data streaming
UR - http://www.scopus.com/inward/record.url?scp=85114872717&partnerID=8YFLogxK
U2 - 10.1109/CCGrid51090.2021.00010
DO - 10.1109/CCGrid51090.2021.00010
M3 - Conference contribution
AN - SCOPUS:85114872717
T3 - Proceedings - 21st IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2021
SP - 1
EP - 10
BT - Proceedings - 21st IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2021
A2 - Lefevre, Laurent
A2 - Patterson, Stacy
A2 - Lee, Young Choon
A2 - Shen, Haiying
A2 - Ilager, Shashikant
A2 - Goudarzi, Mohammad
A2 - Toosi, Adel N.
A2 - Buyya, Rajkumar
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 10 May 2021 through 13 May 2021
ER -