Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-3070

Inner join between MySQL tables produce wrong results if join order changed

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: 1.19.0
    • Fix Version/s: None
    • Component/s: core, jdbc-adapter
    • Labels:
      None

      Description

      Summary of issue:

      • We are trying to inner join 2 MySQL tables.  
        • persons table: a very small table 
        • foodmart table: a large table
      • The query produce different results if we change the join order in the "from" clause: "from A join B" vs "from B join A". One query produces correct results while the other query produces empty results. 
      • The physical plan generated for the produce-empty query was not optimal. it tried to scan be big table. 

       

      Working query and physical plan:

      select "t2"."product_name", "t2"."customer_id" from 
      "foodmart-mysql"."foodmart" as "t2" join "persons"."persons" as "t1" 
        on"t1"."person_id_int"="t2"."customer_id" and"t2"."timestamp" >= '1997-02-12 00:00:00' and "t2"."timestamp" < '1997-02-12 00:01:00'
      
      EnumerableCalc(expr#0..4=[\{inputs}], proj#0..1=[\{exprs}]): rowcount = 375.0, cumulative cost = {1287.971895621705 rows, 3719.5 cpu, 0.0 io}, id = 671
        EnumerableJoin(condition=[=($1, $4)], joinType=[inner]): rowcount = 375.0, cumulative cost = {912.971895621705 rows, 1094.5 cpu, 0.0 io}, id = 667
          JdbcToEnumerableConverter: rowcount = 25.0, cumulative cost = {147.5 rows, 283.5 cpu, 0.0 io}, id = 660
            JdbcProject(product_name=[$5], customer_id=[$22], $f85=[>=($78, 1997-02-12 00:00:00)], $f86=[<($78, 1997-02-12 00:01:00)]): rowcount = 25.0, cumulative cost = {145.0 rows, 281.0 cpu, 0.0 io}, id = 658
              JdbcFilter(condition=[AND(>=($78, 1997-02-12 00:00:00), <($78, 1997-02-12 00:01:00))]): rowcount = 25.0, cumulative cost = {125.0 rows, 201.0 cpu, 0.0 io}, id = 656
                JdbcTableScan(table=[[foodmart-mysql, foodmart]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 340
          EnumerableCalc(expr#0..5=[\{inputs}], person_id_int=[$t5]): rowcount = 100.0, cumulative cost = {210.0 rows, 811.0 cpu, 0.0 io}, id = 673
            JdbcToEnumerableConverter: rowcount = 100.0, cumulative cost = {110.0 rows, 111.0 cpu, 0.0 io}, id = 663
              JdbcTableScan(table=[[persons, persons]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 341
      

       

      Produce-empty query and physical plan:

      select "t2"."product_name", "t2"."customer_id" from 
      "persons"."persons" as "t1" join "foodmart-mysql"."foodmart" as "t2" 
        on "t1"."person_id_int"="t2"."customer_id" and"t2"."timestamp" >= '1997-02-12 00:00:00' and "t2"."timestamp" < '1997-02-12 00:01:00'
      
      EnumerableCalc(expr#0..4=[\{inputs}], proj#0..1=[\{exprs}]): rowcount = 375.0, cumulative cost = {1255.471895621705 rows, 12427.0 cpu, 0.0 io}, id = 334
        EnumerableJoin(condition=[=($1, $4)], joinType=[inner]): rowcount = 375.0, cumulative cost = {880.471895621705 rows, 9802.0 cpu, 0.0 io}, id = 328
          EnumerableCalc(expr#0..84=[\{inputs}], expr#85=[1997-02-12 00:00:00], expr#86=[>=($t78, $t85)], expr#87=[1997-02-12 00:01:00], expr#88=[<($t78, $t87)], expr#89=[AND($t86, $t88)], product_name=[$t5], customer_id=[$t22], $f85=[$t86], $f86=[$t88], $condition=[$t89]): rowcount = 25.0, cumulative cost = {135.0 rows, 9611.0 cpu, 0.0 io}, id = 338
            JdbcToEnumerableConverter: rowcount = 100.0, cumulative cost = {110.0 rows, 111.0 cpu, 0.0 io}, id = 317
              JdbcTableScan(table=[[foodmart-mysql, foodmart]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 1
          JdbcToEnumerableConverter: rowcount = 100.0, cumulative cost = {190.0 rows, 191.0 cpu, 0.0 io}, id = 326
            JdbcProject(person_id_int=[$5]): rowcount = 100.0, cumulative cost = {180.0 rows, 181.0 cpu, 0.0 io}, id = 324
              JdbcTableScan(table=[[persons, persons]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 0
      

       

        Attachments

        1. log.txt
          286 kB
          Zhibin Zhou

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              zhouzhb Zhibin Zhou
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: