TY - GEN
T1 - Reliable data-center scale computations
AU - Bhatotia, Pramod
AU - Wieder, Alexander
AU - Rodrigues, Rodrigo
AU - Junqueira, Flavio
AU - Reed, Benjamin
PY - 2010
Y1 - 2010
N2 - Neither of the two broad classes of fault models considered by traditional fault tolerance techniques - - crash and Byzantine faults - - suit the environment of systems that run in today's data centers. On the one hand, assuming Byzantine faults is considered overkill due to the assumption of a worst-case adversarial behavior, and the use of other techniques to guard against malicious attacks. On the other hand, the crash fault model is insufficient since it does not capture non-crash faults that may result from a variety of unexpected conditions that are commonplace in this setting. In this paper, we present the case for a more practical approach at handling non-crash (but non-adversarial) faults in data-center scale computations. In this context, we discuss how such problem can be tackled for an important class of data-center scale systems: systems for large-scale processing of data, with a particular focus on the Pig programming framework. Such an approach not only covers a significant fraction of the processing jobs that run in today's data centers, but is potentially applicable to a broader class of applications.
AB - Neither of the two broad classes of fault models considered by traditional fault tolerance techniques - - crash and Byzantine faults - - suit the environment of systems that run in today's data centers. On the one hand, assuming Byzantine faults is considered overkill due to the assumption of a worst-case adversarial behavior, and the use of other techniques to guard against malicious attacks. On the other hand, the crash fault model is insufficient since it does not capture non-crash faults that may result from a variety of unexpected conditions that are commonplace in this setting. In this paper, we present the case for a more practical approach at handling non-crash (but non-adversarial) faults in data-center scale computations. In this context, we discuss how such problem can be tackled for an important class of data-center scale systems: systems for large-scale processing of data, with a particular focus on the Pig programming framework. Such an approach not only covers a significant fraction of the processing jobs that run in today's data centers, but is potentially applicable to a broader class of applications.
KW - Byzantine faults
KW - Pig
KW - data center
KW - data processing
KW - fault detection
KW - non-adversarial faults
UR - http://www.scopus.com/inward/record.url?scp=78649466282&partnerID=8YFLogxK
U2 - 10.1145/1859184.1859186
DO - 10.1145/1859184.1859186
M3 - Conference contribution
AN - SCOPUS:78649466282
SN - 9781450304061
T3 - Proceedings of the 4th ACM/SIGOPS Workshop on Large-Scale Distributed Systems and Middleware, LADIS 2010
SP - 1
EP - 6
BT - Proceedings of the 4th ACM/SIGOPS Workshop on Large-Scale Distributed Systems and Middleware, LADIS 2010
T2 - 4th ACM/SIGOPS Workshop on Large-Scale Distributed Systems and Middleware, LADIS 2010
Y2 - 28 July 2010 through 29 July 2010
ER -