Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-886

Wrong results for a query with Right Outer Join on the second (and subsequent) executions

    XMLWordPrintableJSON

Details

    Description

      The following query with a right outer join produces correct results on the first execution in a session but wrong results on the second and subsequent executions. A potential cause for the problem can be seen from the two Explain plans: the scan of the nation table shows a difference in the columns being projected.

      0: jdbc:drill:zk=local> select n.n_regionkey, r.r_regionkey from cp.`tpch/region.parquet` r right join cp.`tpch/nation.parquet` n on n.n_regionkey = r.r_regionkey;

      ------------------------+

      n_regionkey r_regionkey

      ------------------------+

      0 0
      0 0
      0 0
      0 0
      0 0
      1 1
      1 1
      1 1
      1 1
      1 1
      2 2
      2 2
      2 2
      2 2
      2 2
      3 3
      3 3
      3 3
      3 3
      3 3
      4 4
      4 4
      4 4
      4 4
      4 4

      ------------------------+
      25 rows selected (2.207 seconds)

      0: jdbc:drill:zk=local> select n.n_regionkey, r.r_regionkey from cp.`tpch/region.parquet` r right join cp.`tpch/nation.parquet` n on n.n_regionkey = r.r_regionkey;
      ------------------------+

      n_regionkey r_regionkey

      ------------------------+

      0 null
      1 null
      1 null
      1 null
      4 null
      0 null
      3 null
      3 null
      2 null
      2 null
      4 null
      4 null
      2 null
      4 null
      0 null
      0 null
      0 null
      1 null
      2 null
      3 null
      4 null
      2 null
      3 null
      3 null
      1 null

      ------------------------+
      25 rows selected (0.514 seconds)

      EXPLAIN plan for the good run:

      00-00 Screen
      00-01 Project(n_regionkey=[$0], r_regionkey=[$1])
      00-02 Project(n_regionkey=[$3], r_regionkey=[$1])
      00-03 HashJoin(condition=[=($3, $1)], joinType=[right])
      00-05 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/tpch/region.parquet]], selectionRoot=/tpch/region.parquet, columns=[SchemaPath [`r_regionkey`]]]])
      00-04 Project(*0=[$0], n_regionkey=[$1])
      00-06 BroadcastExchange
      01-01 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/tpch/nation.parquet]], selectionRoot=/tpch/nation.parquet, columns=[SchemaPath [`n_regionkey`]]]])

      Explain plan for the bad run:

      00-00 Screen
      00-01 Project(n_regionkey=[$0], r_regionkey=[$1])
      00-02 Project(n_regionkey=[$3], r_regionkey=[$1])
      00-03 HashJoin(condition=[=($2, $1)], joinType=[right])
      00-05 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/tpch/region.parquet]], selectionRoot=/tpch/region.parquet, columns=[SchemaPath [`r_regionkey`]]]])
      00-04 Project(*0=[$0], n_regionkey=[$1])
      00-06 BroadcastExchange
      01-01 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/tpch/nation.parquet]], selectionRoot=/tpch/nation.parquet, columns=null]])

      Attachments

        Activity

          People

            cchang@maprtech.com Chun Chang
            amansinha100 Aman Sinha
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: