Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-16147

Allow CopyListing sequence file keys and values to be more easily customized



    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.3.0, 3.2.1
    • tools/distcp
    • None


      We have encountered a scenario where, when using the Crunch library to run a distributed copy (CRUNCH-660, CRUNCH-675) at the conclusion of a job we need to dynamically rename target paths to the preferred destination output part file names, rather than retaining the original source path names.

      A custom CopyListing implementation appears to be the proper solution for this. However the place where the current SimpleCopyListing logic needs to be adjusted is in a private method (writeToFileListing), so a relatively large portion of the class would need to be cloned.

      To minimize the amount of code duplication required for such a custom implementation, we propose adding two new protected methods to the CopyListing class, that can be used to change the actual keys and/or values written to the copy listing sequence file:

      protected Text getFileListingKey(Path sourcePathRoot, CopyListingFileStatus fileStatus);
      protected CopyListingFileStatus getFileListingValue(CopyListingFileStatus fileStatus);

      The SimpleCopyListing class would then be modified to consume these methods as follows,

         getFileListingKey(sourcePathRoot, fileStatus),

      The default implementations would simply preserve the present behavior of the SimpleCopyListing class, and could reside in either CopyListing or SimpleCopyListing, whichever is preferable.

      protected Text getFileListingKey(Path sourcePathRoot, CopyListingFileStatus fileStatus) {
         return new Text(DistCpUtils.getRelativePath(sourcePathRoot, fileStatus.getPath()));
      protected CopyListingFileStatus getFileListingValue(CopyListingFileStatus fileStatus) {
         return fileStatus;

      Please let me know if this proposal seems to be on the right track. If so I can provide a patch.


        1. HADOOP-16147-002.patch
          3 kB
          Andrew Olson
        2. HADOOP-16147-001.patch
          3 kB
          Andrew Olson

        Issue Links



              noslowerdna Andrew Olson
              noslowerdna Andrew Olson
              0 Vote for this issue
              5 Start watching this issue