Hadoop Common
  1. Hadoop Common
  2. HADOOP-4565

MultiFileInputSplit can use data locality information to create splits

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.20.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Improved MultiFileInputFormat so that multiple blocks from the same node or same rack can be combined into a single split.

      Description

      The MultiFileInputFormat takes a set of paths and creates splits based on file sizes. Each splits contains a few files an each split are roughly equal in size. It would be efficient if we can extend this InputFormat to create splits such each all the blocks in one split and either node-local or rack-local.

      1. CombineMultiFile.patch
        19 kB
        dhruba borthakur
      2. CombineMultiFile2.patch
        37 kB
        dhruba borthakur
      3. CombineMultiFile3.patch
        35 kB
        dhruba borthakur
      4. CombineMultiFile4.patch
        36 kB
        dhruba borthakur
      5. CombineMultiFile5.patch
        39 kB
        dhruba borthakur
      6. CombineMultiFile7.patch
        42 kB
        dhruba borthakur
      7. CombineMultiFile8.patch
        47 kB
        dhruba borthakur
      8. CombineMultiFile9.patch
        49 kB
        dhruba borthakur
      9. TestCombine.txt
        21 kB
        dhruba borthakur

        Issue Links

          Activity

          No work has yet been logged on this issue.

            People

            • Assignee:
              dhruba borthakur
              Reporter:
              dhruba borthakur
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development