Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
2.1.0
-
None
Description
Column Pruning generates out of order columns in SelectOperator which cause ArrayIndexOutOfBoundsException.
2016-07-26 21:49:24,390 FATAL [main] org.apache.hadoop.hive.ql.exec.mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:507) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ArrayIndexOutOfBoundsException at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:397) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:497) ... 9 more Caused by: java.lang.ArrayIndexOutOfBoundsException at java.lang.System.arraycopy(Native Method) at org.apache.hadoop.io.Text.set(Text.java:225) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryString.init(LazyBinaryString.java:48) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.uncheckedGetField(LazyBinaryStruct.java:264) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:201) at org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:64) at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate(ExprNodeColumnEvaluator.java:94) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.makeValueWritable(ReduceSinkOperator.java:550) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:377) ... 13 more
The exception is because the serialization and deserialization doesn't match.
The serialization by LazyBinarySerDe from previous MapReduce job used different order of columns. When the current MapReduce job deserialized the intermediate sequence file generated by previous MapReduce job, it will get corrupted data from the deserialization using wrong order of columns by LazyBinaryStruct. The unmatched columns between serialization and deserialization is caused by SelectOperator's Column Pruning ColumnPrunerSelectProc.