Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
2.3.1, 2.3.2, 3.0.0, 3.1.1, 3.1.2, 4.0.0
Description
A Null Pointer Exception occurs when inserting data with 'distribute by' clause. The following snippet query reproduces this issue:
(non-vectorized , non-llap mode)
create table table1 (col1 string, datekey int); insert into table1 values ('ROW1', 1), ('ROW2', 2), ('ROW3', 1); create table table2 (col1 string) partitioned by (datekey int); set hive.vectorized.execution.enabled=false; set hive.optimize.sort.dynamic.partition=true; set hive.exec.dynamic.partition.mode=nonstrict; insert into table table2 PARTITION(datekey) select col1, datekey from table1 distribute by datekey ;
I could run the insert query without the error if I remove Distribute By or use Cluster By clause.
It seems that the issue happens because Distribute By does not guarantee clustering or sorting properties on the distributed keys.
FileSinkOperator removes the previous fsp. FileSinkOperator will remove the previous fsp which might be re-used when we use Distribute By.
https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L972
The following stack trace is logged.
Vertex failed, vertexName=Reducer 2, vertexId=vertex_1513111717879_0056_1_01, diagnostics=[Task failed, taskId=task_1513111717879_0056_1_01_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : attempt_1513111717879_0056_1_01_000000_0:java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{},"value":{"_col0":"ROW3","_col1":1}} at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{},"value":{"_col0":"ROW3","_col1":1}} at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:365) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:250) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:317) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185) ... 14 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:762) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:356) ... 17 more
Attachments
Issue Links
- is caused by
-
HIVE-13260 ReduceSinkDeDuplication throws exception when pRS key is empty
- Closed
- Is contained by
-
HIVE-26751 Bug Fixes and Improvements for 3.2.0 release
- Open
- links to