Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-6893

Invalid output for star and self-join queries for RDBMS Storage Plugin

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.14.0
    • 1.16.0
    • Storage - JDBC
    • None
    • mysql-5.7.23-0ubuntu0.18.04.1
      mysql-connector-java-5.1.39-bin.jar

    Description

      Invalid output for star and self-join queries for RDBMS Storage Plugin:

      0: jdbc:drill:zk=local> SELECT * FROM (SELECT * FROM mysql.`testdb`.`mscIdentities3` WHERE `PersonID` = 10) AS `t` INNER JOIN (SELECT * FROM mysql.`testdb`.`mscIdentities3` WHERE `PersonID` = 10) AS `t0` ON `t`.`PersonID` = `t0`.`PersonID` ;
      +-----------+----------+---------+----------+------------+-----------+----------+-----------+
      | PersonID  | OrderID  | ItemID  | GroupID  | PersonID0  | OrderID0  | ItemID0  | GroupID0  |
      +-----------+----------+---------+----------+------------+-----------+----------+-----------+
      | 10        | 10       | 10      | 10       | null       | null      | null     | null      |
      +-----------+----------+---------+----------+------------+-----------+----------+-----------+
      1 row selected (1.402 seconds)
      0: jdbc:drill:zk=local> select * from sys.version;
      +------------------+-------------------------------------------+------------------------------------------------+----------------------------+----------------------------+----------------------------+
      |     version      |                 commit_id                 |                 commit_message                 |        commit_time         |        build_email         |         build_time         |
      +------------------+-------------------------------------------+------------------------------------------------+----------------------------+----------------------------+----------------------------+
      | 1.15.0-SNAPSHOT  | 100a68b314230d4cf327477f7a10f9c650720513  | DRILL-540: Allow querying hive views in Drill  | 30.11.2018 @ 10:50:46 EET  | vitalii.diravka@gmail.com  | 10.12.2018 @ 15:46:54 EET  |
      +------------------+-------------------------------------------+------------------------------------------------+----------------------------+----------------------------+----------------------------+
      1 row selected (0.302 seconds)
      

      The same result in older 1.11.0 Drill version:

      0: jdbc:drill:zk=local> SELECT * FROM (SELECT * FROM mysql.`testdb`.`mscIdentities3` WHERE `PersonID` = 10) AS `t` INNER JOIN (SELECT * FROM mysql.`testdb`.`mscIdentities3` WHERE `PersonID` = 10) AS `t0` ON `t`.`PersonID` = `t0`.`PersonID`;
      +-----------+----------+---------+----------+------------+-----------+----------+-----------+
      | PersonID  | OrderID  | ItemID  | GroupID  | PersonID0  | OrderID0  | ItemID0  | GroupID0  |
      +-----------+----------+---------+----------+------------+-----------+----------+-----------+
      | 10        | 10       | 10      | 10       | null       | null      | null     | null      |
      +-----------+----------+---------+----------+------------+-----------+----------+-----------+
      1 row selected (1.344 seconds)
      0: jdbc:drill:zk=local> select * from sys.version;
      +----------+-------------------------------------------+------------------------------------------------------+-----------------------------+----------------------------+----------------------------+
      | version  |                 commit_id                 |                    commit_message                    |         commit_time         |        build_email         |         build_time         |
      +----------+-------------------------------------------+------------------------------------------------------+-----------------------------+----------------------------+----------------------------+
      | 1.11.0   | 4220fb2fffbc81883df3e5fea575fa0a584852b3  | [maven-release-plugin] prepare release drill-1.11.0  | 24.07.2017 @ 16:47:07 EEST  | vitalii.diravka@gmail.com  | 06.12.2018 @ 14:36:39 EET  |
      +----------+-------------------------------------------+------------------------------------------------------+-----------------------------+----------------------------+----------------------------+
      1 row selected (0.271 seconds)
      

      But the same query in MySQL:

      mysql> select * from `mscIdentities3` t1 join `mscIdentities3` t2 on t1.`PersonId` = t2.`PersonID` where t1.`PersonID` = 10;
      +----------+---------+--------+---------+----------+---------+--------+---------+
      | PersonID | OrderID | ItemID | GroupID | PersonID | OrderID | ItemID | GroupID |
      +----------+---------+--------+---------+----------+---------+--------+---------+
      |       10 |      10 |     10 |      10 |       10 |      10 |     10 |      10 |
      +----------+---------+--------+---------+----------+---------+--------+---------+
      1 row in set (1.27 sec)
      

      Drill physical plan for above query is:

      00-00    Screen : rowType = RecordType(INTEGER PersonID, INTEGER OrderID, INTEGER ItemID, INTEGER GroupID, INTEGER PersonID0, INTEGER OrderID0, INTEGER ItemID0, INTEGER GroupID0): rowcount = 15.0, cumulative cost = {31.5 rows, 136.5 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 735
      00-01      Project(PersonID=[$0], OrderID=[$1], ItemID=[$2], GroupID=[$3], PersonID0=[$4], OrderID0=[$5], ItemID0=[$6], GroupID0=[$7]) : rowType = RecordType(INTEGER PersonID, INTEGER OrderID, INTEGER ItemID, INTEGER GroupID, INTEGER PersonID0, INTEGER OrderID0, INTEGER ItemID0, INTEGER GroupID0): rowcount = 15.0, cumulative cost = {30.0 rows, 135.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 734
      00-02        Jdbc(sql=[SELECT * FROM (SELECT * FROM `testdb`.`mscIdentities3` WHERE `PersonID` = 10) AS `t` INNER JOIN (SELECT * FROM `testdb`.`mscIdentities3` WHERE `PersonID` = 10) AS `t0` ON `t`.`PersonID` = `t0`.`PersonID` ]) : rowType = RecordType(INTEGER PersonID, INTEGER OrderID, INTEGER ItemID, INTEGER GroupID, INTEGER PersonID0, INTEGER OrderID0, INTEGER ItemID0, INTEGER GroupID0): rowcount = 15.0, cumulative cost = {15.0 rows, 15.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 695
      

      It is clear from above plan that Drill submits modified query to MySQL DB. But even that query works fine in MySQL:

      mysql> SELECT * FROM (SELECT * FROM `testdb`.`mscIdentities3` WHERE `PersonID` = 10) AS `t` INNER JOIN (SELECT * FROM `testdb`.`mscIdentities3` WHERE `PersonID` = 10) AS `t0` ON `t`.`PersonID` = `t0`.`PersonID` ;
      +----------+---------+--------+---------+----------+---------+--------+---------+
      | PersonID | OrderID | ItemID | GroupID | PersonID | OrderID | ItemID | GroupID |
      +----------+---------+--------+---------+----------+---------+--------+---------+
      |       10 |      10 |     10 |      10 |       10 |      10 |     10 |      10 |
      +----------+---------+--------+---------+----------+---------+--------+---------+
      1 row in set (1.26 sec)
      

      Most likely Drill interprets the obtained Result Set incorrectly in the Project operator.

      Attachments

        Issue Links

          Activity

            People

              volodymyr Vova Vysotskyi
              vitalii Vitalii Diravka
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: