Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.2.0, 2.9.2, 3.0.3, 3.1.2
-
None
Description
Copying blocks in parallel (enabled when blocks per chunk > 0) is a great DistCp improvement that can hugely speed up copying big files.
But its checksum validation is skipped, e.g. in `RetriableFileCopyCommand.java`
if (!source.isSplit()) {
compareCheckSums(sourceFS, source.getPath(), sourceChecksum,
targetFS, targetPath);
}
and this could result in checksum/data mismatch without notifying developers/users (e.g. HADOOP-16049).
I'd like to provide a patch to add the checksum validation.
Attachments
Attachments
Issue Links
- is cloned by
-
HADOOP-16536 Backport HADOOP-16158 and HADOOP-15273 to branch-2
- Patch Available
- is related to
-
HADOOP-16049 DistCp result has data and checksum mismatch when blocks per chunk > 0
- Resolved
-
HADOOP-11794 Enable distcp to copy blocks in parallel
- Resolved
-
HADOOP-16536 Backport HADOOP-16158 and HADOOP-15273 to branch-2
- Patch Available
- links to