Details
Description
We would like compress the data while transferring from our source system to target system. One way to do this is to write a map/reduce job to compress that after/before being transferred. This looks inefficient.
Since distcp already reading writing data it would be better if it can accomplish while doing this.
Flip side of this is that distcp -update option can not check file size before copying data. It can only check for the existence of file.
So I propose if -compress option is given then file size is not checked.
Also when we copy file appropriate extension needs to be added to file depending on compression type.
Attachments
Attachments
Issue Links
- relates to
-
HADOOP-13114 DistCp should have option to compress data on write
- Patch Available