Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.21.0
-
None
-
None
-
Reviewed
-
Splitting support for BZip2 Text data
Description
HADOOP-4012 (https://issues.apache.org/jira/browse/HADOOP-4012) is providing support to handle BZip2 compressed data such that the input compressed file is split at arbitrary points. This JIRA uses that functionality in LineRecordReader. The benefit of this work is that, if user provides compressed BZip2 Text data, it will be split by Hadoop and hence will be processed by multiple mappers. So BZip2 compressed data will be able to fully utilize the cluster power. Currently BZip2 compressed Text file goes to one mapper and is not split. So the enhancement in this JIRA provides splitting support and a considerable performance gains.
Attachments
Attachments
Issue Links
- is blocked by
-
HADOOP-4012 Providing splitting support for bzip2 compressed files
- Closed