Details
-
Bug
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
0.13.0
-
None
-
MapR Hive 0.13
Description
We experience IndexOutOfBoundsException in a SMBJoin in the case on the tables used for the JOIN is uninitialized. Everything works if both are uninitialized or initialized.
2015-03-24 09:12:58,967 ERROR [main]: ql.Driver (SessionState.java:printError(545)) - FAILED: IndexOutOfBoundsException Index: 0, Size: 0 java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at org.apache.hadoop.hive.ql.optimizer.AbstractBucketJoinProc.fillMappingBigTableBucketFileNameToSmallTableBucketFileNames(AbstractBucketJoinProc.java:486) at org.apache.hadoop.hive.ql.optimizer.AbstractBucketJoinProc.convertMapJoinToBucketMapJoin(AbstractBucketJoinProc.java:429) at org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.convertJoinToBucketMapJoin(AbstractSMBJoinProc.java:540) at org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.convertJoinToSMBJoin(AbstractSMBJoinProc.java:549) at org.apache.hadoop.hive.ql.optimizer.SortedMergeJoinProc.process(SortedMergeJoinProc.java:51) [...]
Simplest way to reproduce:
SET hive.enforce.sorting=true; SET hive.enforce.bucketing=true; SET hive.exec.dynamic.partition=true; SET mapreduce.reduce.import.limit=-1; SET hive.optimize.bucketmapjoin=true; SET hive.optimize.bucketmapjoin.sortedmerge=true; SET hive.auto.convert.join=true; SET hive.auto.convert.sortmerge.join=true; SET hive.auto.convert.sortmerge.join.noconditionaltask=true; CREATE DATABASE IF NOT EXISTS tmp; USE tmp; CREATE TABLE `test1` ( `foo` bigint ) CLUSTERED BY ( foo) SORTED BY ( foo ASC) INTO 384 BUCKETS stored as orc; CREATE TABLE `test2`( `foo` bigint ) CLUSTERED BY ( foo) SORTED BY ( foo ASC) INTO 384 BUCKETS STORED AS ORC; -- Initialize ONE table of the two tables with any data. INSERT INTO TABLE test1 SELECT foo FROM table_with_some_content LIMIT 100; SELECT t1.foo, t2.foo FROM test1 t1 INNER JOIN test2 t2 ON (t1.foo = t2.foo);
I took a look at the Procedure fillMappingBigTableBucketFileNameToSmallTableBucketFileNames in AbstractBucketJoinProc.java and it does not seem to have changed from our MapR Hive 0.13 to current snapshot, so this should be also an error in the current Version.