Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-5287

Create a generic InputFormat wrapping any other InputFormat, to control the number of map tasks

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: In Progress
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: mrv1, performance
    • Labels:
      None

      Description

      I wrote a generic InputFormat that wraps any other InputFormat, and creates CompositeInputSplits to reduce the number of map tasks in a controllable manner while preserving data locality. A correspondent CompositeRecordReader is written to iterate through underlying RecordReaders as created by the underlying InputFormat for each underlying raw split.

      An application to this is to group TableSplits when the raw splits are coming from multiple regions and are filtered with key ranges. We use this to shard/distribute a time based incremental access to an hbase table.

        Attachments

        1. CompositeRecordReader.java
          3 kB
          nicu marasoiu
        2. CompositeInputSplit.java
          3 kB
          nicu marasoiu
        3. AggregatingInputFormat.java
          6 kB
          nicu marasoiu

          Activity

            People

            • Assignee:
              nmarasoi nicu marasoiu
              Reporter:
              nmarasoi nicu marasoiu
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:

                Time Tracking

                Estimated:
                Original Estimate - 4h
                4h
                Remaining:
                Remaining Estimate - 4h
                4h
                Logged:
                Time Spent - Not Specified
                Not Specified