Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
We have encountered a scenario where, when using the Crunch library to run a distributed copy (CRUNCH-660, CRUNCH-675) at the conclusion of a job we need to dynamically rename target paths to the preferred destination output part file names, rather than retaining the original source path names.
A custom CopyListing implementation appears to be the proper solution for this. However the place where the current SimpleCopyListing logic needs to be adjusted is in a private method (writeToFileListing), so a relatively large portion of the class would need to be cloned.
To minimize the amount of code duplication required for such a custom implementation, we propose adding two new protected methods to the CopyListing class, that can be used to change the actual keys and/or values written to the copy listing sequence file:
protected Text getFileListingKey(Path sourcePathRoot, CopyListingFileStatus fileStatus); protected CopyListingFileStatus getFileListingValue(CopyListingFileStatus fileStatus);
The SimpleCopyListing class would then be modified to consume these methods as follows,
fileListWriter.append( getFileListingKey(sourcePathRoot, fileStatus), getFileListingValue(fileStatus));
The default implementations would simply preserve the present behavior of the SimpleCopyListing class, and could reside in either CopyListing or SimpleCopyListing, whichever is preferable.
protected Text getFileListingKey(Path sourcePathRoot, CopyListingFileStatus fileStatus) { return new Text(DistCpUtils.getRelativePath(sourcePathRoot, fileStatus.getPath())); } protected CopyListingFileStatus getFileListingValue(CopyListingFileStatus fileStatus) { return fileStatus; }
Please let me know if this proposal seems to be on the right track. If so I can provide a patch.
Attachments
Attachments
- HADOOP-16147-001.patch
- 3 kB
- Andrew Olson
- HADOOP-16147-002.patch
- 3 kB
- Andrew Olson
Issue Links
- links to
Activity
-1 overall |
Vote | Subsystem | Runtime | Comment |
---|---|---|---|
0 | reexec | 0m 17s | Docker mode activated. |
Prechecks | |||
+1 | @author | 0m 0s | The patch does not contain any @author tags. |
-1 | test4tests | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. |
trunk Compile Tests | |||
+1 | mvninstall | 20m 59s | trunk passed |
+1 | compile | 0m 26s | trunk passed |
+1 | checkstyle | 0m 19s | trunk passed |
+1 | mvnsite | 0m 28s | trunk passed |
+1 | shadedclient | 10m 59s | branch has no errors when building and testing our client artifacts. |
+1 | findbugs | 0m 37s | trunk passed |
+1 | javadoc | 0m 22s | trunk passed |
Patch Compile Tests | |||
+1 | mvninstall | 0m 25s | the patch passed |
+1 | compile | 0m 23s | the patch passed |
+1 | javac | 0m 23s | the patch passed |
-0 | checkstyle | 0m 16s | hadoop-tools/hadoop-distcp: The patch generated 6 new + 42 unchanged - 0 fixed = 48 total (was 42) |
+1 | mvnsite | 0m 25s | the patch passed |
+1 | whitespace | 0m 0s | The patch has no whitespace issues. |
+1 | shadedclient | 11m 54s | patch has no errors when building and testing our client artifacts. |
+1 | findbugs | 0m 39s | the patch passed |
+1 | javadoc | 0m 18s | the patch passed |
Other Tests | |||
+1 | unit | 11m 56s | hadoop-distcp in the patch passed. |
+1 | asflicense | 0m 29s | The patch does not generate ASF License warnings. |
61m 39s |
Subsystem | Report/Notes |
---|---|
Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
JIRA Issue | |
JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12960083/HADOOP-16147-001.patch |
Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle |
uname | Linux 599d61265a1a 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
Build tool | maven |
Personality | /testptch/patchprocess/precommit/personality/provided.sh |
git revision | trunk / 9de34d2 |
maven | version: Apache Maven 3.3.9 |
Default Java | 1.8.0_191 |
findbugs | v3.1.0-RC1 |
checkstyle | https://builds.apache.org/job/PreCommit-HADOOP-Build/15973/artifact/out/diff-checkstyle-hadoop-tools_hadoop-distcp.txt |
Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/15973/testReport/ |
Max. process+thread count | 412 (vs. ulimit of 10000) |
modules | C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp |
Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/15973/console |
Powered by | Apache Yetus 0.8.0 http://yetus.apache.org |
This message was automatically generated.
-1 overall |
Vote | Subsystem | Runtime | Comment |
---|---|---|---|
0 | reexec | 1m 26s | Docker mode activated. |
Prechecks | |||
+1 | @author | 0m 0s | The patch does not contain any @author tags. |
-1 | test4tests | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. |
trunk Compile Tests | |||
+1 | mvninstall | 21m 30s | trunk passed |
+1 | compile | 0m 32s | trunk passed |
+1 | checkstyle | 0m 29s | trunk passed |
+1 | mvnsite | 0m 33s | trunk passed |
+1 | shadedclient | 13m 52s | branch has no errors when building and testing our client artifacts. |
+1 | findbugs | 0m 48s | trunk passed |
+1 | javadoc | 0m 27s | trunk passed |
Patch Compile Tests | |||
+1 | mvninstall | 0m 28s | the patch passed |
+1 | compile | 0m 25s | the patch passed |
+1 | javac | 0m 25s | the patch passed |
-0 | checkstyle | 0m 18s | hadoop-tools/hadoop-distcp: The patch generated 6 new + 42 unchanged - 0 fixed = 48 total (was 42) |
+1 | mvnsite | 0m 29s | the patch passed |
+1 | whitespace | 0m 0s | The patch has no whitespace issues. |
+1 | shadedclient | 15m 0s | patch has no errors when building and testing our client artifacts. |
+1 | findbugs | 0m 48s | the patch passed |
+1 | javadoc | 0m 21s | the patch passed |
Other Tests | |||
+1 | unit | 15m 2s | hadoop-distcp in the patch passed. |
+1 | asflicense | 0m 32s | The patch does not generate ASF License warnings. |
75m 5s |
Subsystem | Report/Notes |
---|---|
Docker | Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-517/1/artifact/out/Dockerfile |
GITHUB PR | https://github.com/apache/hadoop/pull/517 |
JIRA Issue | |
Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle |
uname | Linux 01c252a2f85e 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
Build tool | maven |
Personality | personality/hadoop.sh |
git revision | trunk / 9de34d2 |
maven | version: Apache Maven 3.3.9 |
Default Java | 1.8.0_191 |
findbugs | v3.1.0-RC1 |
checkstyle | https://builds.apache.org/job/hadoop-multibranch/job/PR-517/1/artifact/out/diff-checkstyle-hadoop-tools_hadoop-distcp.txt |
Test Results | https://builds.apache.org/job/hadoop-multibranch/job/PR-517/1/testReport/ |
Max. process+thread count | 294 (vs. ulimit of 5500) |
modules | C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp |
Console output | https://builds.apache.org/job/hadoop-multibranch/job/PR-517/1/console |
Powered by | Apache Yetus 0.9.0 http://yetus.apache.org |
This message was automatically generated.
Responding to hadoopqa,
Please justify why no new tests are needed for this patch.
Only minor refactoring of current code was done, allowing for the more specific behavior override. No functionality was modified.
Also please list what manual steps were performed to verify this patch.
In a separate project, we created a custom CopyListing implementation with the getFileListingKey method overridden to return a different key, and then successfully ran a distributed copy producing the desired alternative target paths by setting distcp.copy.listing.class to the name of that custom class.
-1 overall |
Vote | Subsystem | Runtime | Comment |
---|---|---|---|
0 | reexec | 0m 24s | Docker mode activated. |
Prechecks | |||
+1 | @author | 0m 0s | The patch does not contain any @author tags. |
-1 | test4tests | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. |
trunk Compile Tests | |||
+1 | mvninstall | 16m 13s | trunk passed |
+1 | compile | 0m 27s | trunk passed |
+1 | checkstyle | 0m 21s | trunk passed |
+1 | mvnsite | 0m 31s | trunk passed |
+1 | shadedclient | 11m 57s | branch has no errors when building and testing our client artifacts. |
+1 | findbugs | 0m 34s | trunk passed |
+1 | javadoc | 0m 16s | trunk passed |
Patch Compile Tests | |||
+1 | mvninstall | 0m 25s | the patch passed |
+1 | compile | 0m 22s | the patch passed |
+1 | javac | 0m 22s | the patch passed |
+1 | checkstyle | 0m 13s | the patch passed |
+1 | mvnsite | 0m 24s | the patch passed |
+1 | whitespace | 0m 0s | The patch has no whitespace issues. |
+1 | shadedclient | 11m 54s | patch has no errors when building and testing our client artifacts. |
+1 | findbugs | 0m 42s | the patch passed |
+1 | javadoc | 0m 19s | the patch passed |
Other Tests | |||
+1 | unit | 12m 22s | hadoop-distcp in the patch passed. |
+1 | asflicense | 0m 28s | The patch does not generate ASF License warnings. |
59m 14s |
Subsystem | Report/Notes |
---|---|
Docker | Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-517/2/artifact/out/Dockerfile |
GITHUB PR | https://github.com/apache/hadoop/pull/517 |
JIRA Issue | |
Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle |
uname | Linux 32c6e768a325 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
Build tool | maven |
Personality | personality/hadoop.sh |
git revision | trunk / 6c8c422 |
maven | version: Apache Maven 3.3.9 |
Default Java | 1.8.0_191 |
findbugs | v3.1.0-RC1 |
Test Results | https://builds.apache.org/job/hadoop-multibranch/job/PR-517/2/testReport/ |
Max. process+thread count | 445 (vs. ulimit of 5500) |
modules | C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp |
Console output | https://builds.apache.org/job/hadoop-multibranch/job/PR-517/2/console |
Powered by | Apache Yetus 0.9.0 http://yetus.apache.org |
This message was automatically generated.
-1 overall |
Vote | Subsystem | Runtime | Comment |
---|---|---|---|
0 | reexec | 0m 15s | Docker mode activated. |
Prechecks | |||
+1 | @author | 0m 0s | The patch does not contain any @author tags. |
-1 | test4tests | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. |
trunk Compile Tests | |||
+1 | mvninstall | 17m 39s | trunk passed |
+1 | compile | 0m 26s | trunk passed |
+1 | checkstyle | 0m 21s | trunk passed |
+1 | mvnsite | 0m 28s | trunk passed |
+1 | shadedclient | 11m 54s | branch has no errors when building and testing our client artifacts. |
+1 | findbugs | 0m 36s | trunk passed |
+1 | javadoc | 0m 20s | trunk passed |
Patch Compile Tests | |||
+1 | mvninstall | 0m 25s | the patch passed |
+1 | compile | 0m 23s | the patch passed |
+1 | javac | 0m 23s | the patch passed |
+1 | checkstyle | 0m 15s | the patch passed |
+1 | mvnsite | 0m 26s | the patch passed |
+1 | whitespace | 0m 0s | The patch has no whitespace issues. |
+1 | shadedclient | 12m 44s | patch has no errors when building and testing our client artifacts. |
+1 | findbugs | 0m 40s | the patch passed |
+1 | javadoc | 0m 18s | the patch passed |
Other Tests | |||
+1 | unit | 12m 8s | hadoop-distcp in the patch passed. |
+1 | asflicense | 0m 27s | The patch does not generate ASF License warnings. |
59m 58s |
Subsystem | Report/Notes |
---|---|
Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
JIRA Issue | |
JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12960394/HADOOP-16147-002.patch |
Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle |
uname | Linux f3a4db7edc96 4.4.0-139-generic #165~14.04.1-Ubuntu SMP Wed Oct 31 10:55:11 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
Build tool | maven |
Personality | /testptch/patchprocess/precommit/personality/provided.sh |
git revision | trunk / 6c8c422 |
maven | version: Apache Maven 3.3.9 |
Default Java | 1.8.0_191 |
findbugs | v3.1.0-RC1 |
Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/15988/testReport/ |
Max. process+thread count | 340 (vs. ulimit of 10000) |
modules | C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp |
Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/15988/console |
Powered by | Apache Yetus 0.8.0 http://yetus.apache.org |
This message was automatically generated.
I couldn't think of an easy way to test, and as you note, this is trivial. I
1, committed to branch-3.2
SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #16261 (See https://builds.apache.org/job/Hadoop-trunk-Commit/16261/)
HADOOP-16147. Allow CopyListing sequence file keys and values to be more (stevel: rev faba3591d32f2e4808c2faeb9472348d52619c8a)
- (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java
- (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyListing.java
Attached preliminary patch for review.
Also created pull request here: https://github.com/apache/hadoop/pull/517