Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-6320

Row-based ORC reader with PPD turned on dies on BufferUnderFlowException/IndexOutOfBoundsException

    XMLWordPrintableJSON

Details

    Description

      ORC data reader crashes out on a BufferUnderflowException, while trying to read data row-by-row with the predicate push-down enabled on current trunk.

      Stack trace:

      Caused by: java.nio.BufferUnderflowException
      	at java.nio.Buffer.nextGetIndex(Buffer.java:472)
      	at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:117)
      	at org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.read(InStream.java:207)
      	at org.apache.hadoop.hive.ql.io.orc.SerializationUtils.readInts(SerializationUtils.java:450)
      	at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readDirectValues(RunLengthIntegerReaderV2.java:240)
      	at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:53)
      	at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:288)
      	at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$IntTreeReader.next(RecordReaderImpl.java:510)
      	at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:1581)
      	at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:2707)
      	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:125)
      	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:101)
      

      OR it could be

      Caused by: java.lang.IndexOutOfBoundsException
              at java.nio.ByteBuffer.wrap(ByteBuffer.java:352)
              at org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.readHeader(InStream.java:180)
              at org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.read(InStream.java:197)
              at org.apache.hadoop.hive.ql.io.orc.SerializationUtils.readInts(SerializationUtils.java:450)
              at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readDirectValues(RunLengthIntegerReaderV2.java:252)
              at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:59)
              at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:300)
              at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$LongTreeReader.next(RecordReaderImpl.java:475)
              at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:1159)
              at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:2198)
              at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:108)
              at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:57)
              at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:274)
              ... 15 more
      

      The query run is

      set hive.vectorized.execution.enabled=false;
      set hive.optimize.index.filter=true;
      
      insert overwrite directory '/tmp/foo' select * from lineitem where l_orderkey is not null;
      

      Reason:
      The issue is related to generating the disk range boundaries. If two adjacent row groups have same compressed block offset then the worst case slop that was added to the end offset will contain only the current compression block. In some cases the values towards the end of this compression block will stretch beyond the boundary to fetch values causing BufferUnderFlowException or IndexOutOfBoundsException.

      Attachments

        1. HIVE-6320.1.patch
          2 kB
          Prasanth Jayachandran
        2. HIVE-6320.2.patch
          2 kB
          Prasanth Jayachandran
        3. HIVE-6320.2.patch
          2 kB
          Prasanth Jayachandran
        4. HIVE-6320.3.patch
          3 kB
          Prasanth Jayachandran

        Activity

          People

            prasanth_j Prasanth Jayachandran
            gopalv Gopal Vijayaraghavan
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: