Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-16018

DistCp won't reassemble chunks when blocks per chunk > 0

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.2.0, 2.9.2
    • 3.0.4, 3.2.1, 2.9.3, 3.1.3
    • tools/distcp
    • None

    Description

      I was investigating why hadoop-distcp-2.9.2 won't reassemble chunks of the same file when blocks per chunk has been set > 0.

      In the CopyCommitter::commitJob, this logic can prevent chunks from reassembling if blocks per chunk is equal to 0:

      if (blocksPerChunk > 0) {
        concatFileChunks(conf);
      }
      

      Then in CopyCommitter's ctor, blocksPerChunk is initialised from the config:

      blocksPerChunk = context.getConfiguration().getInt(
      DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel(), 0);
      

       

      But here the config key DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel() will always returns empty string because it is constructed without config label:

      BLOCKS_PER_CHUNK("",
      new Option("blocksperchunk", true, "If set to a positive value, files"
      + "with more blocks than this value will be split into chunks of "
      + "<blocksperchunk> blocks to be transferred in parallel, and "
      + "reassembled on the destination. By default, <blocksperchunk> is "
      + "0 and the files will be transmitted in their entirety without "
      + "splitting. This switch is only applicable when the source file "
      + "system implements getBlockLocations method and the target file "
      + "system implements concat method"))
      

      As a result it will fall back to the default value 0 for blocksPerChunk, and prevent the chunks from reassembling.

       

       

      Attachments

        1. HADOOP-16018.01.patch
          7 kB
          Kai Xie
        2. HADOOP-16018-002.patch
          3 kB
          Steve Loughran
        3. HADOOP-16018-branch-2-002.patch
          3 kB
          Steve Loughran
        4. HADOOP-16018-branch-2-002.patch
          3 kB
          Steve Loughran
        5. HADOOP-16018-branch-2-003.patch
          3 kB
          Kai Xie
        6. HADOOP-16018-branch-2-004.patch
          0.7 kB
          Kai Xie
        7. HADOOP-16018-branch-2-004.patch
          0.7 kB
          Kai Xie
        8. HADOOP-16018-branch-2-005.patch
          2 kB
          Kai Xie
        9. HADOOP-16018-branch-2-005.patch
          2 kB
          Kai Xie
        10. HADOOP-16018-branch-2-006.patch
          3 kB
          Kai Xie

        Issue Links

          Activity

            People

              kai33 Kai Xie
              kai33 Kai Xie
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: