TY - GEN
T1 - Harnessing Temporal Information for Efficient Edge AI
AU - Sponner, Max
AU - Servadei, Lorenzo
AU - Waschneck, Bernd
AU - Wille, Robert
AU - Kumar, Akash
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Deep Learning is becoming increasingly relevant in edge and Internet-of-Things applications. However, deploying models on embedded devices is challenging due to their limited resources, requiring trade-offs between accuracy, latency and power consumption. Early Exit Neural Networks address this by dynamically adjusting model depth during inference, optimizing for each input.However, this requires an at-runtime decision mechanism to select the optimal configuration. Unfortunately, current methods do not fully exploit all available information for decision-making. We introduce the Difference Detection and Temporal Patience mechanisms, which leverage temporal correlations in sensor data to improve inference efficiency. We extend the approach to directly operate on intermediate results, making it applicable to traditional neural networks, further reducing inference and implementation costs.Evaluated on representative edge intelligence use-cases, our methods achieved up to a 32% reduction in latency on ECG data, with an accuracy loss of only 0.13 percentage points. When applied to a speech command detection task, latency reduction reached 44.3%, accompanied by a 1.34 percentage point decrease in accuracy.
AB - Deep Learning is becoming increasingly relevant in edge and Internet-of-Things applications. However, deploying models on embedded devices is challenging due to their limited resources, requiring trade-offs between accuracy, latency and power consumption. Early Exit Neural Networks address this by dynamically adjusting model depth during inference, optimizing for each input.However, this requires an at-runtime decision mechanism to select the optimal configuration. Unfortunately, current methods do not fully exploit all available information for decision-making. We introduce the Difference Detection and Temporal Patience mechanisms, which leverage temporal correlations in sensor data to improve inference efficiency. We extend the approach to directly operate on intermediate results, making it applicable to traditional neural networks, further reducing inference and implementation costs.Evaluated on representative edge intelligence use-cases, our methods achieved up to a 32% reduction in latency on ECG data, with an accuracy loss of only 0.13 percentage points. When applied to a speech command detection task, latency reduction reached 44.3%, accompanied by a 1.34 percentage point decrease in accuracy.
KW - Deep Learning
KW - Edge AI
KW - Internet of Things
KW - Low-power design
KW - Neural Networks
UR - http://www.scopus.com/inward/record.url?scp=85208124644&partnerID=8YFLogxK
U2 - 10.1109/FMEC62297.2024.10710223
DO - 10.1109/FMEC62297.2024.10710223
M3 - Conference contribution
AN - SCOPUS:85208124644
T3 - 2024 9th International Conference on Fog and Mobile Edge Computing, FMEC 2024
SP - 5
EP - 13
BT - 2024 9th International Conference on Fog and Mobile Edge Computing, FMEC 2024
A2 - Quwaider, Muhannad
A2 - Alawadi, Sadi
A2 - Jararweh, Yaser
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 9th International Conference on Fog and Mobile Edge Computing, FMEC 2024
Y2 - 2 September 2024 through 5 September 2024
ER -