Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
4.0.0-alpha-2
Description
When hive.acid.key.index is missing for an acid ORC file OrcRawRecordMerger throws as follows:
Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.discoverKeyBounds(OrcRawRecordMerger.java:795) ~[hive-exec-4.0.0-alpha-2-SNAPS HOT.jar:4.0.0-alpha-2-SNAPSHOT] at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.<init>(OrcRawRecordMerger.java:1053) ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4. 0.0-alpha-2-SNAPSHOT] at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:2096) ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-a lpha-2-SNAPSHOT] at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1991) ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4 .0.0-alpha-2-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:769) ~[hive-exec-4.0.0-alpha -2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:335) ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0- alpha-2-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:560) ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha -2-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:529) ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2- SNAPSHOT] at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:150) ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.getFetchingTableResults(Driver.java:719) ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNA PSHOT] at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:671) ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT] at org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:233) ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha -2-SNAPSHOT] at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:489) ~[hive-service-4.0.0-alpha-2-SNAPSHOT.jar: 4.0.0-alpha-2-SNAPSHOT] ... 24 more
For this situation to happen, the ORC file must have more than one stripe, and the offset of the element to seek should either locate it beyond the first stripe (but before the last one), or in the first one if not the last one, as the code shows:
if (firstStripe != 0) { minKey = keyIndex[firstStripe - 1]; } if (!isTail) { maxKey = keyIndex[firstStripe + stripeCount - 1]; }
However, in the context of the detection of the original issue, the NPE was triggered even by a simple "select *" over a table with ORC files missing the hive.acid.key.index metadata information, but it was never failing for ORC files with a single stripe. The file was generated after a major compaction of acid and non-acid data.
If the "select *" is not triggering the NPE, either pick the values of the row obtained with "select * from $table limit 1", or try to select based on different values trying to get into the sought situation with a filter like this:
select * from $table where c = $value
OrcRawRecordMerger should simply leave as "null" the min and max keys when the hive.acid.key.index metadata is missing.
Attachments
Issue Links
- is related to
-
HIVE-18817 ArrayIndexOutOfBounds exception during read of ACID table.
- Closed
-
HIVE-26146 Handle missing hive.acid.key.index in the fixacidkeyindex utility
- Closed
- Testing discovered
-
HIVE-26150 OrcRawRecordMerger reads each row twice
- Open
- links to