Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-18284

NPE when inserting data with 'distribute by' clause with dynpart sort optimization

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 3.0.0, 2.3.1, 2.3.2, 4.0.0, 3.1.1, 3.1.2
    • Fix Version/s: None
    • Component/s: Query Processor

      Description

      A Null Pointer Exception occurs when inserting data with 'distribute by' clause. The following snippet query reproduces this issue:
      (non-vectorized , non-llap mode)

      create table table1 (col1 string, datekey int);
      insert into table1 values ('ROW1', 1), ('ROW2', 2), ('ROW3', 1);
      create table table2 (col1 string) partitioned by (datekey int);
      
      set hive.vectorized.execution.enabled=false;
      set hive.optimize.sort.dynamic.partition=true;
      set hive.exec.dynamic.partition.mode=nonstrict;
      insert into table table2
      PARTITION(datekey)
      select col1,
      datekey
      from table1
      distribute by datekey ;
      

      I could run the insert query without the error if I remove Distribute By or use Cluster By clause.
      It seems that the issue happens because Distribute By does not guarantee clustering or sorting properties on the distributed keys.

      FileSinkOperator removes the previous fsp. FileSinkOperator will remove the previous fsp which might be re-used when we use Distribute By.
      https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L972

      The following stack trace is logged.

      Vertex failed, vertexName=Reducer 2, vertexId=vertex_1513111717879_0056_1_01, diagnostics=[Task failed, taskId=task_1513111717879_0056_1_01_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : attempt_1513111717879_0056_1_01_000000_0:java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{},"value":{"_col0":"ROW3","_col1":1}}
      	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
      	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
      	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
      	at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
      	at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:422)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
      	at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
      	at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
      	at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{},"value":{"_col0":"ROW3","_col1":1}}
      	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:365)
      	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:250)
      	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:317)
      	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
      	... 14 more
      Caused by: java.lang.NullPointerException
      	at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:762)
      	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
      	at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
      	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:356)
      	... 17 more
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                srahman Syed Shameerur Rahman
                Reporter:
                tanakahda Aki Tanaka
              • Votes:
                2 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h
                  2h