Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-2988

Correlated exists subquery returns wrong result if join columns in subquery are not fully qualified

    XMLWordPrintableJSON

Details

    Description

      Two parquet tables:

      0: jdbc:drill:schema=dfs> select * from t1;
      +------------+------------+------------+
      |     a1     |     b1     |     c1     |
      +------------+------------+------------+
      | 1          | aaaaa      | 2015-01-01 |
      | 2          | bbbbb      | 2015-01-02 |
      | 3          | ccccc      | 2015-01-03 |
      | 4          | null       | 2015-01-04 |
      | 5          | eeeee      | 2015-01-05 |
      | 6          | fffff      | 2015-01-06 |
      | 7          | ggggg      | 2015-01-07 |
      | null       | hhhhh      | 2015-01-08 |
      | 9          | iiiii      | null       |
      | 10         | jjjjj      | 2015-01-10 |
      +------------+------------+------------+
      10 rows selected (0.109 seconds)
      
      0: jdbc:drill:schema=dfs> select * from t3;
      +------------+------------+------------+
      |     a3     |     b3     |     c3     |
      +------------+------------+------------+
      | 1          | aaaaa      | 2015-01-01 |
      | 2          | bbbbb      | 2015-01-02 |
      | 3          | ccccc      | 2015-01-03 |
      +------------+------------+------------+
      3 rows selected (0.106 seconds)
      

      When column name is not qualified with the table name in the join condition, we get wrong result:

      0: jdbc:drill:schema=dfs> select * from t1 where exists (select * from t3 where a1 = a3);
      +------------+------------+------------+
      |     a1     |     b1     |     c1     |
      +------------+------------+------------+
      | 1          | aaaaa      | 2015-01-01 |
      | 2          | bbbbb      | 2015-01-02 |
      | 3          | ccccc      | 2015-01-03 |
      | 4          | null       | 2015-01-04 |
      | 5          | eeeee      | 2015-01-05 |
      | 6          | fffff      | 2015-01-06 |
      | 7          | ggggg      | 2015-01-07 |
      | null       | hhhhh      | 2015-01-08 |
      | 9          | iiiii      | null       |
      | 10         | jjjjj      | 2015-01-10 |
      +------------+------------+------------+
      10 rows selected (0.177 seconds)
      

      Works correctly when column names are fully qualified ...

      0: jdbc:drill:schema=dfs> select * from t1 where exists (select * from t3 where t1.a1 = t3.a3);
      +------------+------------+------------+
      |     a1     |     b1     |     c1     |
      +------------+------------+------------+
      | 1          | aaaaa      | 2015-01-01 |
      | 2          | bbbbb      | 2015-01-02 |
      | 3          | ccccc      | 2015-01-03 |
      +------------+------------+------------+
      3 rows selected (0.353 seconds)
      

      Plan for the query with the wrong result:

      0: jdbc:drill:schema=dfs> explain plan for select * from t1 where exists (select * from t3 where a1 = a3);
      +------------+------------+
      |    text    |    json    |
      +------------+------------+
      | 00-00    Screen
      00-01      Project(*=[$0])
      00-02        Project(T124¦¦*=[$0])
      00-03          NestedLoopJoin(condition=[true], joinType=[inner])
      00-05            Project(T124¦¦*=[$0])
      00-07              Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`*`]]])
      00-04            SelectionVectorRemover
      00-06              Filter(condition=[IS NOT NULL($0)])
      00-08                StreamAgg(group=[{}], agg#0=[MIN($0)])
      00-09                  Project($f0=[true])
      00-10                    SelectionVectorRemover
      00-11                      Filter(condition=[=($0, $1)])
      00-12                        Project(a1=[$1], a3=[$0])
      00-13                          Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t3]], selectionRoot=/drill/testdata/aggregation/t3, numFiles=1, columns=[`a1`, `a3`]]])
      

      Attachments

        Issue Links

          Activity

            People

              jni Jinfeng Ni
              vicky Victoria Markman
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: