Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-4868

Allow multiple iteration for map

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.0.3-alpha, 3.0.0-alpha1
    • None
    • mrv2
    • None

    Description

      Currently, the Mapper class allows advanced users to override "public void run(Context context)" method for more control over the execution of the mapper, while Context interface limit the operations over the data which is the foundation of "more control".

      One of use cases is that when I am considering a hive optimziation problem, I want to go two passes over the input data instead of using a another job or task ( which may slower the whole process). Each pass do the same thing but with a different parameters.

      This is a new paradigm of Map Reduce usage and can be archived easily by extend Context interface a little with the more control over the data such as reset the input.

      Attachments

        Activity

          People

            Unassigned Unassigned
            jerrychenhf Haifeng Chen
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - 168h
                168h
                Remaining:
                Remaining Estimate - 168h
                168h
                Logged:
                Time Spent - Not Specified
                Not Specified