We would like compress the data while transferring from our source system to target system. One way to do this is to write a map/reduce job to compress that after/before being transferred. This looks inefficient.
Since distcp already reading writing data it would be better if it can accomplish while doing this.
Flip side of this is that distcp -update option can not check file size before copying data. It can only check for the existence of file.
So I propose if -compress option is given then file size is not checked.
Also when we copy file appropriate extension needs to be added to file depending on compression type.
- relates to
-
HADOOP-13114 DistCp should have option to compress data on write
-
- Patch Available
-