Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-5681

Incorrect query result when query uses star and correlated subquery

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      The following repo was based on a testcase provided by Arjun Rajan(arajan@mapr.com).

      Drill returns incorrect query result, when the query has a correlated subquery and querying against a view defined with select *, or querying a subquery with select *.

      Case 1: Querying view with select * + correlated subquery

      create view dfs.tmp.nation_view as select * from cp.`tpch/nation.parquet`;
      

      //Q1 : return 25 rows. The correct answer is 0 row.

      SELECT n_nationkey, n_name
      FROM  dfs.tmp.nation_view a
      WHERE NOT EXISTS (SELECT 1
      FROM cp.`tpch/region.parquet` b
      WHERE b.r_regionkey =  a.n_regionkey
      )
      
      +--------------+-----------------+
      | n_nationkey  |     n_name      |
      +--------------+-----------------+
      | 0            | ALGERIA         |
      | 1            | ARGENTINA       |
      | 2            | BRAZIL          |
      ...
      | 24           | UNITED STATES   |
      +--------------+-----------------+
      25 rows selected (0.614 seconds)
      

      // Q2: return 0 row. The correct answer is 25 rows.

      SELECT n_nationkey, n_name
      FROM  dfs.tmp.nation_view a
      WHERE EXISTS (SELECT 1
      FROM cp.`tpch/region.parquet` b
      WHERE b.r_regionkey =  a.n_regionkey
      )
      +--------------+---------+
      | n_nationkey  | n_name  |
      +--------------+---------+
      +--------------+---------+
      No rows selected (0.4 seconds)
      

      Case 2: Querying a table expression with select *
      // Q3: return 25 rows. The correct result is 0 row

      SELECT n_nationkey, n_name
      FROM  (
        SELECT * FROM cp.`tpch/nation.parquet`
      ) a
      WHERE NOT EXISTS (SELECT 1
      FROM cp.`tpch/region.parquet` b
      WHERE b.r_regionkey =  a.n_regionkey
      )
      +--------------+-----------------+
      | n_nationkey  |     n_name      |
      +--------------+-----------------+
      | 0            | ALGERIA         |
      | 1            | ARGENTINA       |
      ...
      | 24           | UNITED STATES   |
      +--------------+-----------------+
      25 rows selected (0.451 seconds)
      

      Q4: return 0 row. The correct result is 25 rows.

      SELECT n_nationkey, n_name
      FROM  (
        SELECT * FROM cp.`tpch/nation.parquet`
      ) a
      WHERE EXISTS (SELECT 1
      FROM cp.`tpch/region.parquet` b
      WHERE b.r_regionkey =  a.n_regionkey
      )
      +--------------+---------+
      | n_nationkey  | n_name  |
      +--------------+---------+
      +--------------+---------+
      No rows selected (0.515 seconds)
      

      All cases can be reproduced without view usage, using sub-select with star is enough.
      For example:

      SELECT n_nationkey, n_name
      FROM  (select * from cp.`tpch/nation.parquet`) a
      WHERE NOT EXISTS (SELECT 1
      FROM cp.`tpch/region.parquet` b
      WHERE b.r_regionkey =  a.n_regionkey
      )
      

      Attachments

        Activity

          People

            vitalii Vitalii Diravka
            jni Jinfeng Ni
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: