TY - GEN
T1 - Incoop
T2 - 2nd ACM Symposium on Cloud Computing, SOCC 2011
AU - Bhatotia, Pramod
AU - Wieder, Alexander
AU - Rodrigues, Rodrigo
AU - Acar, Umut A.
AU - Pasquini, Rafael
PY - 2011
Y1 - 2011
N2 - Many online data sets evolve over time as new entries are slowly added and existing entries are deleted or modified. Taking advantage of this, systems for incremental bulk data processing, such as Google's Percolator, can achieve efficient updates. To achieve this efficiency, however, these systems lose compatibility with the simple programming models of- fered by non-incremental systems, e.g., MapReduce, and more importantly, requires the programmer to implement application-specific dynamic algorithms, ultimately increas- ing algorithm and code complexity. In this paper, we describe the architecture, implementa- tion, and evaluation of Incoop, a generic MapReduce frame- work for incremental computations. Incoop detects changes to the input and automatically updates the output by em- ploying an efficient, fine-grained result reuse mechanism. To achieve efficiency without sacrificing transparency, we adopt recent advances in the area of programming languages to identify the shortcomings of task-level memoization ap- proaches, and to address these shortcomings by using several novel techniques: a storage system, a contraction phase for Reduce tasks, and an affinity-based scheduling algorithm. We have implemented Incoop by extending the Hadoop frame- work, and evaluated it by considering several applications and case studies. Our results show significant performance improvements without changing a single line of application code.
AB - Many online data sets evolve over time as new entries are slowly added and existing entries are deleted or modified. Taking advantage of this, systems for incremental bulk data processing, such as Google's Percolator, can achieve efficient updates. To achieve this efficiency, however, these systems lose compatibility with the simple programming models of- fered by non-incremental systems, e.g., MapReduce, and more importantly, requires the programmer to implement application-specific dynamic algorithms, ultimately increas- ing algorithm and code complexity. In this paper, we describe the architecture, implementa- tion, and evaluation of Incoop, a generic MapReduce frame- work for incremental computations. Incoop detects changes to the input and automatically updates the output by em- ploying an efficient, fine-grained result reuse mechanism. To achieve efficiency without sacrificing transparency, we adopt recent advances in the area of programming languages to identify the shortcomings of task-level memoization ap- proaches, and to address these shortcomings by using several novel techniques: a storage system, a contraction phase for Reduce tasks, and an affinity-based scheduling algorithm. We have implemented Incoop by extending the Hadoop frame- work, and evaluated it by considering several applications and case studies. Our results show significant performance improvements without changing a single line of application code.
KW - Memoization
KW - Self-adjusting computation
KW - Stability
UR - http://www.scopus.com/inward/record.url?scp=82155187187&partnerID=8YFLogxK
U2 - 10.1145/2038916.2038923
DO - 10.1145/2038916.2038923
M3 - Conference contribution
AN - SCOPUS:82155187187
SN - 9781450309769
T3 - Proceedings of the 2nd ACM Symposium on Cloud Computing, SOCC 2011
BT - Proceedings of the 2nd ACM Symposium on Cloud Computing, SOCC 2011
Y2 - 26 October 2011 through 28 October 2011
ER -