Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-11794

Enable distcp to copy blocks in parallel

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.21.0
    • Fix Version/s: 2.9.0, 3.0.0-alpha4
    • Component/s: tools/distcp
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      If a positive value is passed to command line switch -blocksperchunk, files with more blocks than this value will be split into chunks of `<blocksperchunk>` blocks to be transferred in parallel, and reassembled on the destination. By default, `<blocksperchunk>` is 0 and the files will be transmitted in their entirety without splitting. This switch is only applicable when both the source file system supports getBlockLocations and target supports concat.
      Show
      If a positive value is passed to command line switch -blocksperchunk, files with more blocks than this value will be split into chunks of `<blocksperchunk>` blocks to be transferred in parallel, and reassembled on the destination. By default, `<blocksperchunk>` is 0 and the files will be transmitted in their entirety without splitting. This switch is only applicable when both the source file system supports getBlockLocations and target supports concat.

      Description

      The minimum unit of work for a distcp task is a file. We have files that are greater than 1 TB with a block size of 1 GB. If we use distcp to copy these files, the tasks either take a long long long time or finally fails. A better way for distcp would be to copy all the source blocks in parallel, and then stich the blocks back to files at the destination via the HDFS Concat API (HDFS-222)

        Attachments

        1. HADOOP-11794.010.branch2.002.patch
          71 kB
          Yongjun Zhang
        2. HADOOP-11794.010.branch2.patch
          70 kB
          Yongjun Zhang
        3. HADOOP-11794.010.patch
          70 kB
          Yongjun Zhang
        4. HADOOP-11794.009.patch
          70 kB
          Yongjun Zhang
        5. HADOOP-11794.008.patch
          70 kB
          Yongjun Zhang
        6. HADOOP-11794.007.patch
          70 kB
          Yongjun Zhang
        7. HADOOP-11794.006.patch
          63 kB
          Yongjun Zhang
        8. HADOOP-11794.005.patch
          62 kB
          Yongjun Zhang
        9. HADOOP-11794.004.patch
          62 kB
          Yongjun Zhang
        10. HADOOP-11794.003.patch
          61 kB
          Yongjun Zhang
        11. HADOOP-11794.002.patch
          58 kB
          Yongjun Zhang
        12. HADOOP-11794.001.patch
          52 kB
          Yongjun Zhang
        13. MAPREDUCE-2257.patch
          62 kB
          Rosie Li

          Issue Links

            Activity

              People

              • Assignee:
                yzhangal Yongjun Zhang
                Reporter:
                dhruba dhruba borthakur
              • Votes:
                4 Vote for this issue
                Watchers:
                57 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: