Description
The PositionProvider offset is not updated correctly and an error like this may happen:
Caused by: java.lang.IllegalArgumentException: Seek in LENGTH to 541 is outside of the data at org.apache.orc.impl.InStream$UncompressedStream.seek(InStream.java:161) at org.apache.orc.impl.InStream$UncompressedStream.seek(InStream.java:123) at org.apache.orc.impl.RunLengthIntegerReaderV2.seek(RunLengthIntegerReaderV2.java:331) at org.apache.hadoop.hive.ql.io.orc.encoded.EncodedTreeReaderFactory$StringStreamReader.seek(EncodedTreeReaderFactory.java:298) at org.apache.hadoop.hive.ql.io.orc.encoded.EncodedTreeReaderFactory$StringStreamReader.seek(EncodedTreeReaderFactory.java:258) at org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.repositionInStreams(OrcEncodedDataConsumer.java:250) at org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:134) at org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:62)
We found this happens when ORC writes a strange stream combination - data stream for a RG has no values (the rows all have nulls), but there are values (0-s) in length stream for the same rows. That is technically a valid ORC file, although writing the 0s is completely useless.
This may be fixed separately in ORC, but since these files now exist in the wild we should handle them correctly.
Attachments
Attachments
Issue Links
- links to