Hadoop Common
  1. Hadoop Common
  2. HADOOP-4565

MultiFileInputSplit can use data locality information to create splits

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.20.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Improved MultiFileInputFormat so that multiple blocks from the same node or same rack can be combined into a single split.

      Description

      The MultiFileInputFormat takes a set of paths and creates splits based on file sizes. Each splits contains a few files an each split are roughly equal in size. It would be efficient if we can extend this InputFormat to create splits such each all the blocks in one split and either node-local or rack-local.

      1. CombineMultiFile.patch
        19 kB
        dhruba borthakur
      2. CombineMultiFile2.patch
        37 kB
        dhruba borthakur
      3. CombineMultiFile3.patch
        35 kB
        dhruba borthakur
      4. CombineMultiFile4.patch
        36 kB
        dhruba borthakur
      5. CombineMultiFile5.patch
        39 kB
        dhruba borthakur
      6. CombineMultiFile7.patch
        42 kB
        dhruba borthakur
      7. CombineMultiFile8.patch
        47 kB
        dhruba borthakur
      8. CombineMultiFile9.patch
        49 kB
        dhruba borthakur
      9. TestCombine.txt
        21 kB
        dhruba borthakur

        Issue Links

          Activity

          Gavin made changes -
          Link This issue is depended upon by MAPREDUCE-214 [ MAPREDUCE-214 ]
          Gavin made changes -
          Link This issue blocks MAPREDUCE-214 [ MAPREDUCE-214 ]
          Owen O'Malley made changes -
          Component/s mapred [ 12310690 ]
          Nigel Daley made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Nigel Daley made changes -
          Fix Version/s 0.21.0 [ 12313563 ]
          Robert Chansler made changes -
          Release Note Multiple blocks from the same node or same rack can be combined into a single split. Improved MultiFileInputFormat so that multiple blocks from the same node or same rack can be combined into a single split.
          dhruba borthakur made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Release Note HADOOP-4565. Added CombineFileInputFormat to use data locality information
          to create splits. (dhruba via zshao)
          Multiple blocks from the same node or same rack can be combined into a single split.
          Resolution Fixed [ 1 ]
          dhruba borthakur made changes -
          Status Reopened [ 4 ] Patch Available [ 10002 ]
          dhruba borthakur made changes -
          Attachment TestCombine.txt [ 12399403 ]
          dhruba borthakur made changes -
          Resolution Fixed [ 1 ]
          Status Resolved [ 5 ] Reopened [ 4 ]
          Zheng Shao made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Release Note HADOOP-4565. Added CombineFileInputFormat to use data locality information
          to create splits. (dhruba via zshao)
          Resolution Fixed [ 1 ]
          Hadoop Flags [Reviewed]
          Zheng Shao made changes -
          Fix Version/s 0.20.0 [ 12313438 ]
          Fix Version/s 0.21.0 [ 12313563 ]
          dhruba borthakur made changes -
          Attachment CombineMultiFile9.patch [ 12398869 ]
          dhruba borthakur made changes -
          Attachment CombineMultiFile8.patch [ 12398536 ]
          dhruba borthakur made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          dhruba borthakur made changes -
          Attachment CombineMultiFile7.patch [ 12397945 ]
          dhruba borthakur made changes -
          Attachment CombineMultiFile5.patch [ 12397475 ]
          dhruba borthakur made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          dhruba borthakur made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          dhruba borthakur made changes -
          Attachment CombineMultiFile4.patch [ 12396893 ]
          dhruba borthakur made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Enis Soztutar made changes -
          Link This issue blocks HADOOP-4741 [ HADOOP-4741 ]
          dhruba borthakur made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          dhruba borthakur made changes -
          Attachment CombineMultiFile3.patch [ 12394713 ]
          dhruba borthakur made changes -
          Attachment CombineMultiFile2.patch [ 12394548 ]
          dhruba borthakur made changes -
          Link This issue blocks HIVE-74 [ HIVE-74 ]
          dhruba borthakur made changes -
          Attachment CombineMultiFile.patch [ 12393695 ]
          dhruba borthakur made changes -
          Link This issue is blocked by HADOOP-4567 [ HADOOP-4567 ]
          dhruba borthakur made changes -
          Link This issue is blocked by HADOOP-3293 [ HADOOP-3293 ]
          dhruba borthakur made changes -
          Field Original Value New Value
          Assignee dhruba borthakur [ dhruba ]
          dhruba borthakur created issue -

            People

            • Assignee:
              dhruba borthakur
              Reporter:
              dhruba borthakur
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development