The MapReduce programming model is extended conservatively to deal
with deltas for input data such that recurrent MapReduce
computations can be more efficient for the case of input data that
changes only slightly over time. That is, the extended model enables
more frequent re-execution of MapReduce computations and thereby
more up-to-date results in practical applications. Deltas can also
be pushed through pipelines of MapReduce computations. The
achievable speedup is analyzed and found to be highly predictable.
The approach has been implemented in Hadoop, and a code
distribution is available online. The correctness of the extended
programming model relies on a simple algebraic argument.
Bibtex entry
@misc{LaemmelS11,
author = {Ralf L{\"a}mmel and David Saile},
title = "{MapReduce with Deltas}",
year = {2011},
note = "{To appear in proceedings of PDPTA 2011}"
}