[HIVE-6320] Row-based ORC reader with PPD turned on dies on BufferUnderFlowException/IndexOutOfBoundsException - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 0.13.0
Fix Version/s: 0.13.0
Component/s: Serializers/Deserializers
Labels:
- orcfile

Description

ORC data reader crashes out on a BufferUnderflowException, while trying to read data row-by-row with the predicate push-down enabled on current trunk.

Stack trace:

Caused by: java.nio.BufferUnderflowException
	at java.nio.Buffer.nextGetIndex(Buffer.java:472)
	at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:117)
	at org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.read(InStream.java:207)
	at org.apache.hadoop.hive.ql.io.orc.SerializationUtils.readInts(SerializationUtils.java:450)
	at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readDirectValues(RunLengthIntegerReaderV2.java:240)
	at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:53)
	at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:288)
	at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$IntTreeReader.next(RecordReaderImpl.java:510)
	at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:1581)
	at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:2707)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:125)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:101)

OR it could be

Caused by: java.lang.IndexOutOfBoundsException
        at java.nio.ByteBuffer.wrap(ByteBuffer.java:352)
        at org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.readHeader(InStream.java:180)
        at org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.read(InStream.java:197)
        at org.apache.hadoop.hive.ql.io.orc.SerializationUtils.readInts(SerializationUtils.java:450)
        at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readDirectValues(RunLengthIntegerReaderV2.java:252)
        at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:59)
        at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:300)
        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$LongTreeReader.next(RecordReaderImpl.java:475)
        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:1159)
        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:2198)
        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:108)
        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:57)
        at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:274)
        ... 15 more

The query run is

set hive.vectorized.execution.enabled=false;
set hive.optimize.index.filter=true;

insert overwrite directory '/tmp/foo' select * from lineitem where l_orderkey is not null;

Reason:
The issue is related to generating the disk range boundaries. If two adjacent row groups have same compressed block offset then the worst case slop that was added to the end offset will contain only the current compression block. In some cases the values towards the end of this compression block will stretch beyond the boundary to fetch values causing BufferUnderFlowException or IndexOutOfBoundsException.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-6320.1.patch
30/Jan/14 21:10
2 kB
Prasanth Jayachandran
HIVE-6320.2.patch
04/Feb/14 02:13
2 kB
Prasanth Jayachandran
HIVE-6320.2.patch
03/Feb/14 23:15
2 kB
Prasanth Jayachandran
HIVE-6320.3.patch
04/Feb/14 05:42
3 kB
Prasanth Jayachandran

Issue Links

links to

Reproduction test-case (54Mb file)

Review Board

Activity

People

Assignee:: Prasanth Jayachandran

Reporter:: Gopal Vijayaraghavan

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 28/Jan/14 00:53

Updated:: 01/Aug/14 23:59

Resolved:: 06/Feb/14 00:49