Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
1.1.0
-
None
Description
In some hive versions, when
set hive.auto.convert.join=false;
set hive.vectorized.execution.enabled = true;
Some join queries fail with ClassCastException:
The stack:
Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyStringObjectInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableStringObjectInspector at org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory.genVectorExpressionWritable(VectorExpressionWriterFactory.java:419) at org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory.processVectorInspector(VectorExpressionWriterFactory.java:1102) at org.apache.hadoop.hive.ql.exec.vector.VectorReduceSinkOperator.initializeOp(VectorReduceSinkOperator.java:55) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425) at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:193) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385) at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:431) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:126) ... 22 more
It can not be reproduced in hive 2.0 and 1.3 because of different code path.
Reproduce:
CREATE TABLE test1 ( id string) PARTITIONED BY ( cr_year bigint, cr_month bigint) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileOutputFormat' TBLPROPERTIES ( 'serialization.null.format'='' ); CREATE TABLE test2( id string ) PARTITIONED BY ( cr_year bigint, cr_month bigint) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileOutputFormat' TBLPROPERTIES ( 'serialization.null.format'='' ); set hive.auto.convert.join=false; set hive.vectorized.execution.enabled = true; SELECT cr.id1 , cr.id2 FROM (SELECT t1.id id1, t2.id id2 from (select * from test1 ) t1 left outer join test2 t2 on t1.id=t2.id) cr;