MapReduce with Deltas

Status
In proceedings of PDPTA 2011

Authors
Ralf Lämmel and David Saile

Abstract
The MapReduce programming model is extended conservatively to deal with deltas for input data such that recurrent MapReduce computations can be more efficient for the case of input data that changes only slightly over time. That is, the extended model enables more frequent re-execution of MapReduce computations and thereby more up-to-date results in practical applications. Deltas can also be pushed through pipelines of MapReduce computations. The achievable speedup is analyzed and found to be highly predictable. The approach has been implemented in Hadoop, and a code distribution is available online. The correctness of the extended programming model relies on a simple algebraic argument.

Bibtex entry
@misc{LaemmelS11,
  author    = {Ralf L{\"a}mmel and David Saile},
  title     = "{MapReduce with Deltas}",
  year      = {2011},
  note      = "{To appear in proceedings of PDPTA 2011}"
}

Downloads and links