If two files are concatenated, we need to read each record in these files and write them back to the destination file. The IO cost is mostly unavoidable due to the lack of append functionality in HDFS. However the CPU cost could be significantly reduced by avoiding compression and decompression of the files.
The File Format layer should provide API that implement the block-level concatenation.
|Fix Version/s||0.6.0 [ 12314524 ]|
|Component/s||Serializers/Deserializers [ 12312585 ]|
|Status||Open [ 1 ]||Resolved [ 5 ]|
|Resolution||Fixed [ 1 ]|
|Fix Version/s||0.8.0 [ 12316178 ]|
|Status||Resolved [ 5 ]||Closed [ 6 ]|
|Transition||Time In Source Status||Execution Times||Last Executer||Last Execution Date|
|342d 1h 3m||1||He Yongqiang||21/Apr/11 01:05|
|239d 23h 51m||1||Carl Steinbach||16/Dec/11 23:56|