Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1211

Online aggregation and continuous query support

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Minor
    • Resolution: Won't Fix
    • None
    • None
    • task
    • None

    Description

      The purpose of this post is to propose a modified MapReduce architecture that allows data to be pipelined between operators. This extends the MapReduce programming model beyond batch processing, and can reduce completion times and improve system utilization for batch jobs as well. We have built a modified version of the Hadoop MapReduce framework that supports online aggregation, which allows users to see "early returns" from a job as it is being computed. Our Hadoop Online Prototype (HOP) also supports continuous queries, which enable MapReduce programs to be written for applications such as event monitoring and stream processing. HOP retains the fault tolerance properties of Hadoop, and can run unmodified user-defined MapReduce programs.

      For more information on the HOP design, please see our technical report.
      http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-136.html

      Further details are discussed in the following blog posts.
      http://databeta.wordpress.com/2009/10/18/mapreduce-online/
      http://radar.oreilly.com/2009/10/pipelining-and-real-time-analytics-with-mapreduce-online.html
      http://dbmsmusings.blogspot.com/2009/10/analysis-of-mapreduce-online-paper.html

      The HOP code has been published at the following location.
      http://code.google.com/p/hop/

      Attachments

        Activity

          People

            Unassigned Unassigned
            tcondie Tyson Condie
            Votes:
            0 Vote for this issue
            Watchers:
            27 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: