Description
The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo enabled:
STEPS TO REPRODUCE:
Step 1: Create a table ct1 create table ct1 (a1 string,b1 string); Step 2: Create a table ct2 create table ct2 (a2 string); Step 3 : Insert following data into table ct1 insert into table ct1 (a1) values ('1'); Step 4 : Insert following data into table ct2 insert into table ct2 (a2) values ('1'); Step 5 : Execute the following query select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
ACTUAL RESULT:
The query returns nothing;
EXPECTED RESULT:
1 NULL 1
The issue seems to be because of the incorrect query plan. In the plan we can see:
predicate:(a1 is not null and b1 is not null)
which does not look correct. As a result, it is filtering out all the rows is any column mentioned in the COALESCE has null value.
Please find the query plan below:
Plan optimized by CBO. Vertex dependency in root stage Map 1 <- Map 2 (BROADCAST_EDGE) Stage-0 Fetch Operator limit:-1 Stage-1 Map 1 File Output Operator [FS_10] Map Join Operator [MAPJOIN_15] (rows=1 width=4) Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"] <-Map 2 [BROADCAST_EDGE] BROADCAST [RS_7] PartitionCols:_col0 Select Operator [SEL_5] (rows=1 width=1) Output:["_col0"] Filter Operator [FIL_14] (rows=1 width=1) predicate:a2 is not null TableScan [TS_3] (rows=1 width=1) default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"] <-Select Operator [SEL_2] (rows=1 width=4) Output:["_col0","_col1"] Filter Operator [FIL_13] (rows=1 width=4) predicate:(a1 is not null and b1 is not null) TableScan [TS_0] (rows=1 width=4) default@ct1,c1,Tbl:COMPLETE,Col:NONE,Output:["a1","b1"]
This happens only if join is inner type, otherwise HiveJoinAddNotRule which creates this problem is skipped.