An Optical Transceiver Reliability Study based on SFP Monitoring and OS-level Metric Data

Paolo Notaro, Qiao Yu, Soroush Haeri, Jorge Cardoso, Michael Gerndt

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

The increasing demand for cloud computing drives the expansion in scale of datacenters and their internal optical network, in a strive for increasing bandwidth, high reliability, and lower latency. Optical transceivers are essential elements of optical networks, whose reliability has not been well-studied compared to other hardware components. In this paper, we leverage high quantities of monitoring data from optical transceivers and OS-level metrics to provide statistical insights about the occurrence of optical transceiver failures. We estimate transceiver failure rates and normal operating ranges for monitored attributes, correlate early-observable patterns to known failure symptoms, and finally develop failure prediction models based on our analyses. Our results enable network administrators to deploy early-warning systems and enact predictive maintenance strategies, such as replacement or traffic re-routing, reducing the number of incidents and their associated costs.

Original languageEnglish
Title of host publicationProceedings - 23rd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2023
EditorsYogesh Simmhan, Ilkay Altintas, Ana-Lucia Varbanescu, Pavan Balaji, Abhinandan S. Prasad, Lorenzo Carnevale
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1-12
Number of pages12
ISBN (Electronic)9798350301199
DOIs
StatePublished - 2023
Event23rd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2023 - Bangalore, India
Duration: 1 May 20234 May 2023

Publication series

NameProceedings - 23rd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2023

Conference

Conference23rd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2023
Country/TerritoryIndia
CityBangalore
Period1/05/234/05/23

Keywords

  • cloud computing
  • datacenters
  • failure study
  • hardware reliability
  • optical network
  • optical transceiver

Fingerprint

Dive into the research topics of 'An Optical Transceiver Reliability Study based on SFP Monitoring and OS-level Metric Data'. Together they form a unique fingerprint.

Cite this