Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-19479

encoded stream seek is incorrect for 0-length RGs in LLAP IO

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 3.1.0, 3.0.0
    • None
    • None

    Description

      The PositionProvider offset is not updated correctly and an error like this may happen:

      Caused by: java.lang.IllegalArgumentException: Seek in LENGTH to 541 is outside of the data
      	at org.apache.orc.impl.InStream$UncompressedStream.seek(InStream.java:161)
      	at org.apache.orc.impl.InStream$UncompressedStream.seek(InStream.java:123)
      	at org.apache.orc.impl.RunLengthIntegerReaderV2.seek(RunLengthIntegerReaderV2.java:331)
      	at org.apache.hadoop.hive.ql.io.orc.encoded.EncodedTreeReaderFactory$StringStreamReader.seek(EncodedTreeReaderFactory.java:298)
      	at org.apache.hadoop.hive.ql.io.orc.encoded.EncodedTreeReaderFactory$StringStreamReader.seek(EncodedTreeReaderFactory.java:258)
      	at org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.repositionInStreams(OrcEncodedDataConsumer.java:250)
      	at org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:134)
      	at org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:62)
      

      We found this happens when ORC writes a strange stream combination - data stream for a RG has no values (the rows all have nulls), but there are values (0-s) in length stream for the same rows. That is technically a valid ORC file, although writing the 0s is completely useless.
      This may be fixed separately in ORC, but since these files now exist in the wild we should handle them correctly.

      Attachments

        1. HIVE-19479.patch
          12 kB
          Sergey Shelukhin
        2. HIVE-19479.01.patch
          12 kB
          Sergey Shelukhin

        Issue Links

          Activity

            People

              sershe Sergey Shelukhin
              sershe Sergey Shelukhin
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: