Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.2.0, 2.9.2
-
None
Description
I was investigating why hadoop-distcp-2.9.2 won't reassemble chunks of the same file when blocks per chunk has been set > 0.
In the CopyCommitter::commitJob, this logic can prevent chunks from reassembling if blocks per chunk is equal to 0:
if (blocksPerChunk > 0) {
concatFileChunks(conf);
}
Then in CopyCommitter's ctor, blocksPerChunk is initialised from the config:
blocksPerChunk = context.getConfiguration().getInt( DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel(), 0);
But here the config key DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel() will always returns empty string because it is constructed without config label:
BLOCKS_PER_CHUNK("", new Option("blocksperchunk", true, "If set to a positive value, files" + "with more blocks than this value will be split into chunks of " + "<blocksperchunk> blocks to be transferred in parallel, and " + "reassembled on the destination. By default, <blocksperchunk> is " + "0 and the files will be transmitted in their entirety without " + "splitting. This switch is only applicable when the source file " + "system implements getBlockLocations method and the target file " + "system implements concat method"))
As a result it will fall back to the default value 0 for blocksPerChunk, and prevent the chunks from reassembling.
Attachments
Attachments
Issue Links
- is caused by
-
HADOOP-15850 CopyCommitter#concatFileChunks should check that the blocks per chunk is not 0
- Resolved
- is related to
-
HADOOP-11794 Enable distcp to copy blocks in parallel
- Resolved
- links to