Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.9.0
-
None
Description
Two tables, parquet files (attached in the bug):
0: jdbc:drill:schema=dfs> select * from t1; +------------+------------+------------+ | a1 | b1 | c1 | +------------+------------+------------+ | 1 | aaaaa | 2015-01-01 | | 2 | bbbbb | 2015-01-02 | | 3 | ccccc | 2015-01-03 | | 4 | null | 2015-01-04 | | 5 | eeeee | 2015-01-05 | | 6 | fffff | 2015-01-06 | | 7 | ggggg | 2015-01-07 | | null | hhhhh | 2015-01-08 | | 9 | iiiii | null | | 10 | jjjjj | 2015-01-10 | +------------+------------+------------+ 10 rows selected (0.119 seconds) 0: jdbc:drill:schema=dfs> select * from t2; +------------+------------+------------+ | a2 | b2 | c2 | +------------+------------+------------+ | 0 | zzz | 2014-12-31 | | 1 | aaaaa | 2015-01-01 | | 2 | bbbbb | 2015-01-02 | | 2 | bbbbb | 2015-01-02 | | 2 | bbbbb | 2015-01-02 | | 3 | ccccc | 2015-01-03 | | 4 | ddddd | 2015-01-04 | | 5 | eeeee | 2015-01-05 | | 6 | fffff | 2015-01-06 | | 7 | ggggg | 2015-01-07 | | 7 | ggggg | 2015-01-07 | | 8 | hhhhh | 2015-01-08 | | 9 | iiiii | 2015-01-09 | +------------+------------+------------+ 13 rows selected (0.116 seconds)
Disable hash join and set slice_target = 1:
alter session set `planner.enable_hashjoin` = false;
alter session set `planner.slice_target` = 1;
Correct result:
0: jdbc:drill:schema=dfs> select * from t1 where b1 not in (select b2 from t2); +------------+------------+------------+ | a1 | b1 | c1 | +------------+------------+------------+ | 10 | jjjjj | 2015-01-10 | +------------+------------+------------+ 1 row selected (0.625 seconds)
Swap tables and you get an error:
0: jdbc:drill:schema=dfs> select * from t2 where b2 not in (select b1 from t1); +------------+------------+------------+ | a1 | b1 | c1 | +------------+------------+------------+ Query failed: SYSTEM ERROR: Join only supports implicit casts between 1. Numeric data 2. Varchar, Varbinary data Left type: INT, Right type: VARCHAR. Add explicit casts to avoid this error Fragment 1:0 [1a83aa50-39aa-452c-91dd-970bf4a8f03d on atsqa4-133.qa.lab:31010] java.lang.RuntimeException: java.sql.SQLException: Failure while executing query. at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514) at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148) at sqlline.SqlLine.print(SqlLine.java:1809) at sqlline.SqlLine$Commands.execute(SqlLine.java:3766) at sqlline.SqlLine$Commands.sql(SqlLine.java:3663) at sqlline.SqlLine.dispatch(SqlLine.java:889) at sqlline.SqlLine.begin(SqlLine.java:763) at sqlline.SqlLine.start(SqlLine.java:498) at sqlline.SqlLine.main(SqlLine.java:460)
Explain plan for the query with an error:
0: jdbc:drill:schema=dfs> explain plan for select * from t2 where b2 not in (select b1 from t1); +------------+------------+ | text | json | +------------+------------+ | 00-00 Screen 00-01 Project(*=[$0]) 00-02 UnionExchange 01-01 Project(T27¦¦*=[$0]) 01-02 SelectionVectorRemover 01-03 Filter(condition=[NOT(CASE(=($2, 0), false, IS NOT NULL($6), true, IS NULL($4), null, <($3, $2), null, false))]) 01-04 MergeJoin(condition=[=($4, $5)], joinType=[left]) 01-06 SelectionVectorRemover 01-08 Sort(sort0=[$4], dir0=[ASC]) 01-10 Project(T27¦¦*=[$0], b2=[$1], $f0=[$2], $f1=[$3], b20=[$4]) 01-12 HashToRandomExchange(dist0=[[$4]]) 02-01 UnorderedMuxExchange 04-01 Project(T27¦¦*=[$0], b2=[$1], $f0=[$2], $f1=[$3], b20=[$4], E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($4))]) 04-02 Project(T27¦¦*=[$0], b2=[$1], $f0=[$2], $f1=[$3], b20=[$1]) 04-03 NestedLoopJoin(condition=[true], joinType=[inner]) 04-05 Project(T27¦¦*=[$0], b2=[$1]) 04-06 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t2]], selectionRoot=/drill/testdata/aggregation/t2, numFiles=1, columns=[`*`]]]) 04-04 BroadcastExchange 06-01 StreamAgg(group=[{}], agg#0=[$SUM0($0)], agg#1=[$SUM0($1)]) 06-02 UnionExchange 07-01 StreamAgg(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)]) 07-02 Project(b1=[$0], $f1=[true]) 07-03 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`]]]) 01-05 Project(b1=[$0], $f10=[$1]) 01-07 SelectionVectorRemover 01-09 Sort(sort0=[$0], dir0=[ASC]) 01-11 HashAgg(group=[{0}], agg#0=[MIN($1)]) 01-13 Project(b1=[$0], $f1=[$1]) 01-14 HashToRandomExchange(dist0=[[$0]]) 03-01 UnorderedMuxExchange 05-01 Project(b1=[$0], $f1=[$1], E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($0))]) 05-02 HashAgg(group=[{0}], agg#0=[MIN($1)]) 05-03 Project(b1=[$0], $f1=[true]) 05-04 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`]]])
Correct result and correct plan with hash join distributed plan ( planner.slice_target = 1)
alter session set `planner.enable_hashjoin` = true;
alter session set `planner.slice_target` = 1;
0: jdbc:drill:schema=dfs> select * from t2 where b2 not in (select b1 from t1); +------------+------------+------------+ | a2 | b2 | c2 | +------------+------------+------------+ +------------+------------+------------+ No rows selected (0.458 seconds) 0: jdbc:drill:schema=dfs> explain plan for select * from t2 where b2 not in (select b1 from t1); +------------+------------+ | text | json | +------------+------------+ | 00-00 Screen 00-01 Project(*=[$0]) 00-02 Project(T25¦¦*=[$0]) 00-03 SelectionVectorRemover 00-04 Filter(condition=[NOT(CASE(=($2, 0), false, IS NOT NULL($6), true, IS NULL($4), null, <($3, $2), null, false))]) 00-05 HashJoin(condition=[=($4, $5)], joinType=[left]) 00-07 Project(T25¦¦*=[$0], b2=[$1], $f0=[$2], $f1=[$3], b20=[$1]) 00-09 NestedLoopJoin(condition=[true], joinType=[inner]) 00-11 Project(T25¦¦*=[$0], b2=[$1]) 00-12 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t2]], selectionRoot=/drill/testdata/aggregation/t2, numFiles=1, columns=[`*`]]]) 00-10 BroadcastExchange 01-01 StreamAgg(group=[{}], agg#0=[$SUM0($0)], agg#1=[$SUM0($1)]) 01-02 UnionExchange 03-01 StreamAgg(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)]) 03-02 Project(b1=[$0], $f1=[true]) 03-03 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`]]]) 00-06 Project(b1=[$0], $f10=[$1]) 00-08 BroadcastExchange 02-01 HashAgg(group=[{0}], agg#0=[MIN($1)]) 02-02 Project(b1=[$0], $f1=[$1]) 02-03 HashToRandomExchange(dist0=[[$0]]) 04-01 UnorderedMuxExchange 05-01 Project(b1=[$0], $f1=[$1], E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($0))]) 05-02 HashAgg(group=[{0}], agg#0=[MIN($1)]) 05-03 Project(b1=[$0], $f1=[true]) 05-04 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`]]])
Same error with the columns of date, time and timestamp types.