Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.3.4
-
None
-
None
Description
When we use distcp, the sourceFS's checksum and targetFS's checksum are checked for consistency after the file transfer is complete.
However, for some files produced by ClientProcotol's concat(RPC method) on the source side, the Block Size is less than 128MB(such as sourceFS file =10MB+10MB, targetFS file = 20MB), so the checksum of the source and destination side will be inconsistent, So It waill cause distcp failed.
Case:
hadoop fs -put /etc/hosts /tmp/a.txt
hadoop fs -put /etc/hosts /tmp/b.txt
hadoop fs -put /etc/hosts /tmp/c.txt
hadoop fs -concat /tmp/a.txt /tmp/b.txt /tmp/c.txt
hdfs fsck /tmp/a.txt -files -blocks -locations | grep blk_
hadoop distcp /tmp/a.txt hdfs://kde-sts-0.com/tmp/res.txt