Rollback-recovery without checkpoints in distributed event processing systems

Boris Koldehofe, Ruben Mayer, Umakishore Ramachandran, Kurt Rothermel, Marco Völz

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

32 Scopus citations

Abstract

Reliability is of critical importance to many applications involving distributed event processing systems. Especially the use of stateful operators makes it challenging to provide efficient recovery from failures and to ensure consistent event streams. Even during failure-free execution, state-of-the-art methods for achieving reliability incur significant overhead at run-time concerning computational resources, event traffic, and event detection time. This paper proposes a novel method for rollback-recovery that allows for recovery from multiple simultaneous operator failures, but eliminates the need for persistent checkpoints. Thereby, the operator state is preserved in savepoints at points in time when its execution solely depends on the state of incoming event streams which are reproducible by predecessor operators. We propose an expressive event processing model to determine save-points and algorithms for their coordination in a distributed operator network. Evaluations show that very low overhead at failure-free execution in comparison to other approaches is achieved.

Original languageEnglish
Title of host publicationDEBS 2013 - Proceedings of the 7th ACM International Conference on Distributed Event-Based Systems
Pages27-38
Number of pages12
DOIs
StatePublished - 2013
Externally publishedYes
Event7th ACM International Conference on Distributed Event-Based Systems, DEBS 2013 - Arlington, TX, United States
Duration: 29 Jun 20133 Jul 2013

Publication series

NameDEBS 2013 - Proceedings of the 7th ACM International Conference on Distributed Event-Based Systems

Conference

Conference7th ACM International Conference on Distributed Event-Based Systems, DEBS 2013
Country/TerritoryUnited States
CityArlington, TX
Period29/06/133/07/13

Keywords

  • Complex event processing
  • Recovery
  • Reliability

Fingerprint

Dive into the research topics of 'Rollback-recovery without checkpoints in distributed event processing systems'. Together they form a unique fingerprint.

Cite this