Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.13.0
Description
HIVE-5091 added padding to ORC files to avoid ORC stripes straddling HDFS blocks. The length of this padded bytes are not stored in stripe information. OrcInputFormat.getSplits() uses stripeInformation.getLength() for split computation. stripeInformation.getLength() is sum of index length, data length and stripe footer length. It does not account for the length of padded bytes which may result in wrong split boundary.
The fix for this is to use the offset of next stripe as the length of current stripe which includes the padded bytes as well.
Attachments
Attachments
Issue Links
- relates to
-
HIVE-5091 ORC files should have an option to pad stripes to the HDFS block boundaries
- Closed
- links to