Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-26147

OrcRawRecordMerger throws NPE when hive.acid.key.index is missing for an acid file

    XMLWordPrintableJSON

Details

    Description

      When hive.acid.key.index is missing for an acid ORC file OrcRawRecordMerger throws as follows:

      Caused by: java.lang.NullPointerException
              at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.discoverKeyBounds(OrcRawRecordMerger.java:795) ~[hive-exec-4.0.0-alpha-2-SNAPS
      HOT.jar:4.0.0-alpha-2-SNAPSHOT]
              at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.<init>(OrcRawRecordMerger.java:1053) ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.
      0.0-alpha-2-SNAPSHOT]
              at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:2096) ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-a
      lpha-2-SNAPSHOT]
              at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1991) ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4
      .0.0-alpha-2-SNAPSHOT]
              at org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:769) ~[hive-exec-4.0.0-alpha
      -2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
              at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:335) ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-
      alpha-2-SNAPSHOT]
              at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:560) ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha
      -2-SNAPSHOT]
              at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:529) ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-
      SNAPSHOT]
              at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:150) ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
              at org.apache.hadoop.hive.ql.Driver.getFetchingTableResults(Driver.java:719) ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNA
      PSHOT]
              at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:671) ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
              at org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:233) ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha
      -2-SNAPSHOT]
              at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:489) ~[hive-service-4.0.0-alpha-2-SNAPSHOT.jar:
      4.0.0-alpha-2-SNAPSHOT]
              ... 24 more
      

      For this situation to happen, the ORC file must have more than one stripe, and the offset of the element to seek should either locate it beyond the first stripe (but before the last one), or in the first one if not the last one, as the code shows:

          if (firstStripe != 0) {
            minKey = keyIndex[firstStripe - 1];
          }
          if (!isTail) {
            maxKey = keyIndex[firstStripe + stripeCount - 1];
          }
      

      However, in the context of the detection of the original issue, the NPE was triggered even by a simple "select *" over a table with ORC files missing the hive.acid.key.index metadata information, but it was never failing for ORC files with a single stripe. The file was generated after a major compaction of acid and non-acid data.

      If the "select *" is not triggering the NPE, either pick the values of the row obtained with "select * from $table limit 1", or try to select based on different values trying to get into the sought situation with a filter like this:

      select * from $table where c = $value
      

      OrcRawRecordMerger should simply leave as "null" the min and max keys when the hive.acid.key.index metadata is missing.

      Attachments

        Issue Links

          Activity

            People

              asolimando Alessandro Solimando
              asolimando Alessandro Solimando
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h