Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-14660

ArrayIndexOutOfBoundsException on delete

    XMLWordPrintableJSON

Details

    Description

      Hi,

      DELETE on an ACID table may fail on an ArrayIndexOutOfBoundsException.
      That bug occurs at Reduce phase when there are less reducers than the number of the table buckets.

      In order to reproduce, create a simple ACID table :

      CREATE TABLE test (`cle` bigint,`valeur` string)
       PARTITIONED BY (`annee` string)
       CLUSTERED BY (cle) INTO 5 BUCKETS
       TBLPROPERTIES ('transactional'='true');
      

      Populate it with lines distributed among all buckets, with random values and a few partitions.
      Force the Reducers to be less than the buckets :

      set mapred.reduce.tasks=1;
      

      Then execute a delete that will remove many lines from all the buckets.

      DELETE FROM test WHERE valeur<'some_value';
      

      Then you will get an ArrayIndexOutOfBoundsException :

      2016-08-22 21:21:02,500 [FATAL] [TezChild] |tez.ReduceRecordSource|: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":{"transactionid":119,"bucketid":0,"rowid":0}},"value":{"_col0":"4"}}
              at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:352)
              at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:274)
              at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:252)
              at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
              at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
              at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
              at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181)
              at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
              at java.security.AccessController.doPrivileged(Native Method)
              at javax.security.auth.Subject.doAs(Subject.java:415)
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
              at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
              at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
              at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
              at java.util.concurrent.FutureTask.run(FutureTask.java:262)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
              at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.ArrayIndexOutOfBoundsException: 5
              at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:769)
              at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
              at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
              at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:343)
              ... 17 more
      

      Adding logs into FileSinkOperator, one sees the operator deals with buckets 0, 1, 2, 3, 4, then 0 again and it fails at line 769 : actually each time you switch bucket, you move forwards in a 5 (number of buckets) elements array. So when you get bucket 0 for the second time, you get out of the array...

      Attachments

        1. HIVE-14660.1-banch-1.2.patch
          2 kB
          Benjamin BONNET

        Activity

          People

            bbonnet Benjamin BONNET
            bbonnet Benjamin BONNET
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 20m
                20m