[HIVE-26147] OrcRawRecordMerger throws NPE when hive.acid.key.index is missing for an acid file - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 4.0.0-alpha-2
Fix Version/s: 4.0.0-alpha-2
Component/s: ORC, Transactions
Labels:
- pull-request-available

Description

When hive.acid.key.index is missing for an acid ORC file OrcRawRecordMerger throws as follows:

Caused by: java.lang.NullPointerException
        at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.discoverKeyBounds(OrcRawRecordMerger.java:795) ~[hive-exec-4.0.0-alpha-2-SNAPS
HOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.<init>(OrcRawRecordMerger.java:1053) ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.
0.0-alpha-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:2096) ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-a
lpha-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1991) ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4
.0.0-alpha-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:769) ~[hive-exec-4.0.0-alpha
-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:335) ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-
alpha-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:560) ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha
-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:529) ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-
SNAPSHOT]
        at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:150) ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.Driver.getFetchingTableResults(Driver.java:719) ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNA
PSHOT]
        at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:671) ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:233) ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha
-2-SNAPSHOT]
        at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:489) ~[hive-service-4.0.0-alpha-2-SNAPSHOT.jar:
4.0.0-alpha-2-SNAPSHOT]
        ... 24 more

For this situation to happen, the ORC file must have more than one stripe, and the offset of the element to seek should either locate it beyond the first stripe (but before the last one), or in the first one if not the last one, as the code shows:

    if (firstStripe != 0) {
      minKey = keyIndex[firstStripe - 1];
    }
    if (!isTail) {
      maxKey = keyIndex[firstStripe + stripeCount - 1];
    }

However, in the context of the detection of the original issue, the NPE was triggered even by a simple "select *" over a table with ORC files missing the hive.acid.key.index metadata information, but it was never failing for ORC files with a single stripe. The file was generated after a major compaction of acid and non-acid data.

If the "select *" is not triggering the NPE, either pick the values of the row obtained with "select * from $table limit 1", or try to select based on different values trying to get into the sought situation with a filter like this:

select * from $table where c = $value

OrcRawRecordMerger should simply leave as "null" the min and max keys when the hive.acid.key.index metadata is missing.

Attachments

Issue Links

is related to

HIVE-18817 ArrayIndexOutOfBounds exception during read of ACID table.

Closed

HIVE-26146 Handle missing hive.acid.key.index in the fixacidkeyindex utility

Closed

Testing discovered

HIVE-26150 OrcRawRecordMerger reads each row twice

Open

links to

GitHub Pull Request #3219

Activity

People

Assignee:: Alessandro Solimando

Reporter:: Alessandro Solimando

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 16/Apr/22 13:55

Updated:: 16/Nov/22 13:50

Resolved:: 19/Apr/22 15:13

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

0.5h