Hardware and software innovations in energy-efficient system-reliability monitoring

Vasileios Tenentes, Charles Leech, Graeme M. Bragg, Geoff Merrett, Bashir M. Al-Hashimi, Hussam Amrouch, Jorg Henkel, Shidhartha Das

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Many threats that can undermine the reliability of a system can be realized at design, while others only during its online operation. As the availability of system monitoring sensors and run-time software increases in heterogeneous platforms, there is a demand for a novel platform-independent framework that can capture and deliver, in a holistic way, system level self-assessment and adaptation capabilities at run-time. In this paper, two groups from academia and one from industry present the following three contributions. First, system reliability is considered from the perspective of novel timing guardband designs for aging mitigation. Effective timing guardband models are presented from the physical to the system level, while targeting multiple wear-out mechanisms. Second, a technique for correlating complex software and micro-architectural events with power integrity loss is presented. The presented technique uses an embedded voltage noise sensor, a power-network model and a genetic algorithm for identifying workload that triggers power-network resonances which can ultimately lead to system failures. Third, the 'PRiME' cross-layer programming framework is presented that unites available sensors and dynamic-voltage and frequency scaling actuators with learning-based run-time process mapping and scheduling algorithms. Scenarios on exploring the energy efficiency and reliability of heterogeneous platforms using run-time software derived from the developed framework are also reviewed.

Original languageEnglish
Title of host publication2017 IEEE Int. Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems, DFT 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1-5
Number of pages5
ISBN (Electronic)9781538603628
DOIs
StatePublished - 28 Jun 2017
Externally publishedYes
Event13th IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems, DFT 2017 - Cambridge, United Kingdom
Duration: 23 Oct 201725 Oct 2017

Publication series

Name2017 IEEE Int. Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems, DFT 2017
Volume2018-January

Conference

Conference13th IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems, DFT 2017
Country/TerritoryUnited Kingdom
CityCambridge
Period23/10/1725/10/17

Fingerprint

Dive into the research topics of 'Hardware and software innovations in energy-efficient system-reliability monitoring'. Together they form a unique fingerprint.

Cite this