Large-scale incremental data processing with change propagation

Pramod Bhatotia, Alexander Wieder, Istemi Ekin Akkuş, Rodrigo Rodrigues, Umut A. Acar

Research output: Contribution to conferencePaperpeer-review

31 Scopus citations

Abstract

Incremental processing of large-scale data is an increasingly important problem, given that many processing jobs run repeatedly with similar inputs, and that the de facto standard programming model (MapReduce) was not designed to efficiently process small updates. As a result, new systems specifically targeting this problem (e.g., Google Percolator, or Yahoo! CBP) have been proposed. Unfortunately, these approaches require the adoption of a new programming model, breaking compatibility with existing programs, and increasing the burden on the programmer, who now is required to devise an incremental update mechanism. We claim that automatic incremental processing of large-scale data is possible by leveraging previous results from the algorithms and programming languages communities. As an example, we describe how MapReduce can be improved to efficiently handle small input changes by automatically incrementalizing existing MapReduce computations, without breaking backward compatibility or demanding programmers to adopt a new programming approach.

Original languageEnglish
StatePublished - 2011
Externally publishedYes
Event3rd USENIX Workshop on Hot Topics in Cloud Computing, HotCloud 2011 - Portland, United States
Duration: 14 Jun 201115 Jun 2011

Conference

Conference3rd USENIX Workshop on Hot Topics in Cloud Computing, HotCloud 2011
Country/TerritoryUnited States
CityPortland
Period14/06/1115/06/11

Fingerprint

Dive into the research topics of 'Large-scale incremental data processing with change propagation'. Together they form a unique fingerprint.

Cite this