Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-16018

DistCp won't reassemble chunks when blocks per chunk > 0

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments


    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.2.0, 2.9.2
    • Fix Version/s: 3.0.4, 3.2.1, 2.9.3, 3.1.3
    • Component/s: tools/distcp
    • Labels:


      I was investigating why hadoop-distcp-2.9.2 won't reassemble chunks of the same file when blocks per chunk has been set > 0.

      In the CopyCommitter::commitJob, this logic can prevent chunks from reassembling if blocks per chunk is equal to 0:

      if (blocksPerChunk > 0) {

      Then in CopyCommitter's ctor, blocksPerChunk is initialised from the config:

      blocksPerChunk = context.getConfiguration().getInt(
      DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel(), 0);


      But here the config key DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel() will always returns empty string because it is constructed without config label:

      new Option("blocksperchunk", true, "If set to a positive value, files"
      + "with more blocks than this value will be split into chunks of "
      + "<blocksperchunk> blocks to be transferred in parallel, and "
      + "reassembled on the destination. By default, <blocksperchunk> is "
      + "0 and the files will be transmitted in their entirety without "
      + "splitting. This switch is only applicable when the source file "
      + "system implements getBlockLocations method and the target file "
      + "system implements concat method"))

      As a result it will fall back to the default value 0 for blocksPerChunk, and prevent the chunks from reassembling.




        1. HADOOP-16018.01.patch
          7 kB
          Kai Xie
        2. HADOOP-16018-002.patch
          3 kB
          Steve Loughran
        3. HADOOP-16018-branch-2-002.patch
          3 kB
          Steve Loughran
        4. HADOOP-16018-branch-2-002.patch
          3 kB
          Steve Loughran
        5. HADOOP-16018-branch-2-003.patch
          3 kB
          Kai Xie
        6. HADOOP-16018-branch-2-004.patch
          0.7 kB
          Kai Xie
        7. HADOOP-16018-branch-2-005.patch
          2 kB
          Kai Xie
        8. HADOOP-16018-branch-2-004.patch
          0.7 kB
          Kai Xie
        9. HADOOP-16018-branch-2-005.patch
          2 kB
          Kai Xie
        10. HADOOP-16018-branch-2-006.patch
          3 kB
          Kai Xie

        Issue Links



            • Assignee:
              kai33 Kai Xie
              kai33 Kai Xie


              • Created:

                Issue deployment