Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-5287

Create a generic InputFormat wrapping any other InputFormat, to control the number of map tasks

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: In Progress
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • mrv1, performance
    • None

    Description

      I wrote a generic InputFormat that wraps any other InputFormat, and creates CompositeInputSplits to reduce the number of map tasks in a controllable manner while preserving data locality. A correspondent CompositeRecordReader is written to iterate through underlying RecordReaders as created by the underlying InputFormat for each underlying raw split.

      An application to this is to group TableSplits when the raw splits are coming from multiple regions and are filtered with key ranges. We use this to shard/distribute a time based incremental access to an hbase table.

      Attachments

        1. CompositeRecordReader.java
          3 kB
          nicu marasoiu
        2. CompositeInputSplit.java
          3 kB
          nicu marasoiu
        3. AggregatingInputFormat.java
          6 kB
          nicu marasoiu

        Activity

          People

            nmarasoi nicu marasoiu
            nmarasoi nicu marasoiu
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - 4h
                4h
                Remaining:
                Remaining Estimate - 4h
                4h
                Logged:
                Time Spent - Not Specified
                Not Specified