Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-11794

Enable distcp to copy blocks in parallel

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.21.0
    • 2.9.0, 3.0.0-alpha4
    • tools/distcp
    • None
    • Reviewed
    • Hide
      If a positive value is passed to command line switch -blocksperchunk, files with more blocks than this value will be split into chunks of `<blocksperchunk>` blocks to be transferred in parallel, and reassembled on the destination. By default, `<blocksperchunk>` is 0 and the files will be transmitted in their entirety without splitting. This switch is only applicable when both the source file system supports getBlockLocations and target supports concat.
      Show
      If a positive value is passed to command line switch -blocksperchunk, files with more blocks than this value will be split into chunks of `<blocksperchunk>` blocks to be transferred in parallel, and reassembled on the destination. By default, `<blocksperchunk>` is 0 and the files will be transmitted in their entirety without splitting. This switch is only applicable when both the source file system supports getBlockLocations and target supports concat.

    Description

      The minimum unit of work for a distcp task is a file. We have files that are greater than 1 TB with a block size of 1 GB. If we use distcp to copy these files, the tasks either take a long long long time or finally fails. A better way for distcp would be to copy all the source blocks in parallel, and then stich the blocks back to files at the destination via the HDFS Concat API (HDFS-222)

      Attachments

        1. HADOOP-11794.001.patch
          52 kB
          Yongjun Zhang
        2. HADOOP-11794.002.patch
          58 kB
          Yongjun Zhang
        3. HADOOP-11794.003.patch
          61 kB
          Yongjun Zhang
        4. HADOOP-11794.004.patch
          62 kB
          Yongjun Zhang
        5. HADOOP-11794.005.patch
          62 kB
          Yongjun Zhang
        6. HADOOP-11794.006.patch
          63 kB
          Yongjun Zhang
        7. HADOOP-11794.007.patch
          70 kB
          Yongjun Zhang
        8. HADOOP-11794.008.patch
          70 kB
          Yongjun Zhang
        9. HADOOP-11794.009.patch
          70 kB
          Yongjun Zhang
        10. HADOOP-11794.010.branch2.002.patch
          71 kB
          Yongjun Zhang
        11. HADOOP-11794.010.branch2.patch
          70 kB
          Yongjun Zhang
        12. HADOOP-11794.010.patch
          70 kB
          Yongjun Zhang
        13. MAPREDUCE-2257.patch
          62 kB
          Rosie Li

        Issue Links

          Activity

            People

              yzhangal Yongjun Zhang
              dhruba Dhruba Borthakur
              Votes:
              4 Vote for this issue
              Watchers:
              59 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: