Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
For TPCH Q13, Drill may discard some qualified rows, since Drill pulls up a local RHS filter in a left outer join, and put it on top of Join. There means the local RHS is treated as a post-join condition, which will incorrectly discard some qualified rows.
select
c_count,
count as custdist
from
(
select
c.c_custkey,
count(o.o_orderkey)
from
cp.`tpch/customer.parquet` c
left outer join cp.`tpch/orders.parquet` o
on c.c_custkey = o.o_custkey
and o.o_comment not like '%special%requests%'
group by
c.c_custkey
) as orders (c_custkey, c_count)
group by
c_count
order by
custdist desc,
c_count desc;
Drill Physical :
.......................
02-06 Filter(condition=[$3]): rowcount = 3750.0, cumulative cost =
, id = 2649
02-07 HashJoin(condition=[=($0, $1)], joinType=[left]): rowcount = 15000.0, cumulative cost =
, id = 2648
02-09 HashToRandomExchange(dist0=[[$0]]): rowcount = 1500.0, cumulative cost =
, id = 2644
03-01 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/tpch/customer.parquet]], selectionRoot=/tpch/customer.parquet, columns=[SchemaPath [`c_custkey`]]]]): rowcount = 1500.0, cumulative cost =
, id = 2643
02-08 HashToRandomExchange(dist0=[[$0]]): rowcount = 15000.0, cumulative cost =
, id = 2647
04-01 Project(o_custkey=[$1], o_orderkey=[$0], $f4=[NOT(LIKE($2, '%special%requests%'))]): rowcount = 15000.0, cumulative cost =
, id = 2646
04-02 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/tpch/orders.parquet]], selectionRoot=/tpch/orders.parquet, columns=[SchemaPath [`o_custkey`], SchemaPath [`o_orderkey`], SchemaPath [`o_comment`]]]]): rowcount = 15000.0, cumulative cost =
, id = 2645