Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
The following repo was based on a testcase provided by Arjun Rajan(arajan@mapr.com).
Drill returns incorrect query result, when the query has a correlated subquery and querying against a view defined with select *, or querying a subquery with select *.
Case 1: Querying view with select * + correlated subquery
create view dfs.tmp.nation_view as select * from cp.`tpch/nation.parquet`;
//Q1 : return 25 rows. The correct answer is 0 row.
SELECT n_nationkey, n_name FROM dfs.tmp.nation_view a WHERE NOT EXISTS (SELECT 1 FROM cp.`tpch/region.parquet` b WHERE b.r_regionkey = a.n_regionkey ) +--------------+-----------------+ | n_nationkey | n_name | +--------------+-----------------+ | 0 | ALGERIA | | 1 | ARGENTINA | | 2 | BRAZIL | ... | 24 | UNITED STATES | +--------------+-----------------+ 25 rows selected (0.614 seconds)
// Q2: return 0 row. The correct answer is 25 rows.
SELECT n_nationkey, n_name FROM dfs.tmp.nation_view a WHERE EXISTS (SELECT 1 FROM cp.`tpch/region.parquet` b WHERE b.r_regionkey = a.n_regionkey ) +--------------+---------+ | n_nationkey | n_name | +--------------+---------+ +--------------+---------+ No rows selected (0.4 seconds)
Case 2: Querying a table expression with select *
// Q3: return 25 rows. The correct result is 0 row
SELECT n_nationkey, n_name FROM ( SELECT * FROM cp.`tpch/nation.parquet` ) a WHERE NOT EXISTS (SELECT 1 FROM cp.`tpch/region.parquet` b WHERE b.r_regionkey = a.n_regionkey ) +--------------+-----------------+ | n_nationkey | n_name | +--------------+-----------------+ | 0 | ALGERIA | | 1 | ARGENTINA | ... | 24 | UNITED STATES | +--------------+-----------------+ 25 rows selected (0.451 seconds)
Q4: return 0 row. The correct result is 25 rows.
SELECT n_nationkey, n_name FROM ( SELECT * FROM cp.`tpch/nation.parquet` ) a WHERE EXISTS (SELECT 1 FROM cp.`tpch/region.parquet` b WHERE b.r_regionkey = a.n_regionkey ) +--------------+---------+ | n_nationkey | n_name | +--------------+---------+ +--------------+---------+ No rows selected (0.515 seconds)
All cases can be reproduced without view usage, using sub-select with star is enough.
For example:
SELECT n_nationkey, n_name FROM (select * from cp.`tpch/nation.parquet`) a WHERE NOT EXISTS (SELECT 1 FROM cp.`tpch/region.parquet` b WHERE b.r_regionkey = a.n_regionkey )