We have encountered a scenario where, when using the Crunch library to run a distributed copy (
CRUNCH-660, CRUNCH-675) at the conclusion of a job we need to dynamically rename target paths to the preferred destination output part file names, rather than retaining the original source path names.
A custom CopyListing implementation appears to be the proper solution for this. However the place where the current SimpleCopyListing logic needs to be adjusted is in a private method (writeToFileListing), so a relatively large portion of the class would need to be cloned.
To minimize the amount of code duplication required for such a custom implementation, we propose adding two new protected methods to the CopyListing class, that can be used to change the actual keys and/or values written to the copy listing sequence file:
The SimpleCopyListing class would then be modified to consume these methods as follows,
The default implementations would simply preserve the present behavior of the SimpleCopyListing class, and could reside in either CopyListing or SimpleCopyListing, whichever is preferable.
Please let me know if this proposal seems to be on the right track. If so I can provide a patch.