Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-6141

JOIN query that uses USING clause returns incorrect results

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • 1.12.0
    • None
    • None

    Description

      Join query that uses USING clause returns incorrect results.

      Postgres 9.2.23 returns only one occurrence of the "id" column

      postgres=# create table t1(id int, name varchar(30));
      CREATE TABLE
      postgres=# create table t2(id int, name varchar(30));
      CREATE TABLE
      
      postgres=# select * from t1;
       id | name
      ----+-------
       10 | John
       13 | Kevin
       15 | Susan
      (3 rows)
      
      postgres=# select * from t2;
       id | name
      ----+-------
       19 | Kyle
       13 | Kevin
       1 | Bob
       17 | Kumar
      (4 rows)
      
      postgres=# select * from t1 JOIN t2 USING(id);
       id | name | name
      ----+-------+-------
       13 | Kevin | Kevin
      (1 row)
      
      

      results from Drill 1.12.0-mapr commit : 2de42491be795721bcb4059bd46e27fc33272309

      
      0: jdbc:drill:schema=dfs.tmp> create table t1 as select cast(columns[0] as int) c1, cast(columns[1] as varchar(30)) c2 from `t1.csv`;
      +-----------+----------------------------+
      | Fragment | Number of records written |
      +-----------+----------------------------+
      | 0_0 | 3 |
      +-----------+----------------------------+
      1 row selected (0.213 seconds)
      0: jdbc:drill:schema=dfs.tmp> create table t2 as select cast(columns[0] as int) c1, cast(columns[1] as varchar(30)) c2 from `t2.csv`;
      +-----------+----------------------------+
      | Fragment | Number of records written |
      +-----------+----------------------------+
      | 0_0 | 4 |
      +-----------+----------------------------+
      1 row selected (0.168 seconds)
      
      0: jdbc:drill:schema=dfs.tmp> select * from t1;
      +-----+--------+
      | c1 | c2 |
      +-----+--------+
      | 10 | John |
      | 13 | Kevin |
      | 15 | Susan |
      +-----+--------+
      3 rows selected (0.15 seconds)
      0: jdbc:drill:schema=dfs.tmp> select * from t2;
      +-----+--------+
      | c1 | c2 |
      +-----+--------+
      | 19 | Kyle |
      | 13 | Kevin |
      | 1 | Bob |
      | 17 | Kumar |
      +-----+--------+
      4 rows selected (0.171 seconds)
      
      ## Note that Drill returns an extra column, unlike Postgres, for the same query over same data
      
      0: jdbc:drill:schema=dfs.tmp> select * from t1 JOIN t2 USING(c1);
      +-----+--------+------+--------+
      | c1 | c2 | c10 | c20 |
      +-----+--------+------+--------+
      | 13 | Kevin | 13 | Kevin |
      +-----+--------+------+--------+
      1 row selected (0.256 seconds)
      
      ## explain plan for above query
      
      0: jdbc:drill:schema=dfs.tmp> explain plan for select * from t1 JOIN t2 USING(c1);
      +------+------+
      | text | json |
      +------+------+
      | 00-00 Screen
      00-01 ProjectAllowDup(*=[$0], *0=[$1])
      00-02 Project(T49¦¦*=[$0], T48¦¦*=[$2])
      00-03 Project(T49¦¦*=[$2], c10=[$3], T48¦¦*=[$0], c1=[$1])
      00-04 HashJoin(condition=[=($3, $1)], joinType=[inner])
      00-06 Project(T48¦¦*=[$0], c1=[$1])
      00-08 Scan(table=[[dfs, tmp, t2]], groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///tmp/t2]], selectionRoot=maprfs:/tmp/t2, numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`*`]]])
      00-05 Project(T49¦¦*=[$0], c10=[$1])
      00-07 Project(T49¦¦*=[$0], c1=[$1])
      00-09 Scan(table=[[dfs, tmp, t1]], groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///tmp/t1]], selectionRoot=maprfs:/tmp/t1, numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`*`]]])
      
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            khfaraaz Khurram Faraaz
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: