Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1423

Improve performance of CombineFileInputFormat when multiple pools are configured

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: client
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      MAPREDUCE-1423. Improve performance of CombineFileInputFormat when multiple pools are configured. (Dhruba Borthakur via zshao)
      Show
      MAPREDUCE-1423 . Improve performance of CombineFileInputFormat when multiple pools are configured. (Dhruba Borthakur via zshao)
    • Tags:
      combinefileinputformat

      Description

      I have a map-reduce job that is using CombineFileInputFormat. It has configured 10000 pools and 30000 files. The time to create the splits takes more than an hour. The reaosn being that CombineFileInputFormat.getSplits() converts the same path from String to Path object multiple times, one for each instance of a pool. Similarly, it calls Path.toUri(0 multiple times. This code can be optimized.

      1. CombineFileInputFormatPerformance.txt
        6 kB
        dhruba borthakur
      2. CombineFileInputFormatPerformance.txt
        8 kB
        dhruba borthakur

        Activity

        Tom White made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Tom White made changes -
        Fix Version/s 0.21.0 [ 12314045 ]
        Fix Version/s 0.22.0 [ 12314184 ]
        Zheng Shao made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hadoop Flags [Reviewed]
        Release Note MAPREDUCE-1423. Improve performance of CombineFileInputFormat when multiple pools are configured. (Dhruba Borthakur via zshao)
        Fix Version/s 0.22.0 [ 12314184 ]
        Resolution Fixed [ 1 ]
        Tags combinefileinputformat
        dhruba borthakur made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        dhruba borthakur made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        dhruba borthakur made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        dhruba borthakur made changes -
        Attachment CombineFileInputFormatPerformance.txt [ 12435766 ]
        dhruba borthakur made changes -
        Field Original Value New Value
        Attachment CombineFileInputFormatPerformance.txt [ 12434728 ]
        dhruba borthakur created issue -

          People

          • Assignee:
            dhruba borthakur
            Reporter:
            dhruba borthakur
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development