Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
None
-
None
Description
When we have > 2 tables joined together, if we put the join predicates in the WHERE clause, the query may get a join sequence where two tables joined together do not have any join condition. This will lead to error in Merge Join OP, which assume join always have a join condition.
For example, the following query:
SELECT S.S_ACCTBAL, S.S_NAME
FROM
( SELECT _MAP['P_PARTKEY'] as P_PARTKEY,
_MAP['P_MFGR'] as P_MFGR
FROM "/Users/jni//work/tpc-h-parquet/part") P,
( SELECT _MAP['S_SUPPKEY'] AS S_SUPPKEY,
_MAP['S_NATIONKEY'] AS S_NATIONKEY,
_MAP['S_ACCTBAL'] AS S_ACCTBAL,
_MAP['S_NAME'] AS S_NAME,
_MAP['S_ADDRESS'] AS S_ADDRESS,
_MAP['S_PHONE'] AS S_PHONE,
_MAP['S_COMMENT'] AS S_COMMENT
FROM "/Users/jni//work/tpc-h-parquet/supplier") S,
(SELECT _MAP['PS_PARTKEY'] AS PS_PARTKEY,
_MAP['PS_SUPPKEY'] AS PS_SUPPKEY
FROM "/Users/jni//work/tpc-h-parquet/partsupp") PS
WHERE P.P_PARTKEY = PS.PS_PARTKEY and
S.S_SUPPKEY = PS.PS_SUPPKEY
LIMIT 100;
The join sequence in logical and physical plan is : P -> S -> PS. However, since there is no direct predicate between P and S, the Merge Join between P and S will have no join conditions. This lead to the following error in MergeJoinBatch.java:
"Failure while setting up Foreman. < ArrayIndexOutOfBoundsException:[ 0 ]
Since almost all TPC-H queries have multiple tables joins, it's important that we get this issue resolved, in order to run TPC-H queries.