Apache Drill
  1. Apache Drill
  2. DRILL-485

Support non-equijoins as long as there is at least 1 equijoin condition between the same 2 tables

    Details

      Description

      Currently, if the query has a non-equijoin between tables t1, t2 we throw an error that it is unsupported. However, if there is at least one equijoin between the tables t1, t2 in addition to the non-equijoin, then we should perform the join and then do a filter on top using the non-equijoin condition.
      example queries where this could be applicable:
      SELECT a1 FROM t1, t2 WHERE b1 = b2 AND c1 < c2;
      SELECT a1 FROM t1, t2 WHERE b1 = b2 AND (c1 = c2 OR d1 = d2);

        Activity

        Hide
        Jacques Nadeau added a comment -

        fixed in recent build

        Show
        Jacques Nadeau added a comment - fixed in recent build
        Hide
        Aman Sinha added a comment -

        As mentioned in my previous comment, this has been fixed. I tested it again on latest build (commit level 27a9c98) with the following queries .. both Explain plans and correctness of results w.r.t Postgres.

        Query 1: (Note the Filter above the HashJoin. The Filter has the inequality '<' condition)

        0: jdbc:drill:zk=local> explain plan for select cast(c.c_name as varchar(20)) from orders o, customer c where o.o_custkey = c.c_custkey and o.o_orderkey < c.c_nationkey;
        ----------------------+

        text json

        ----------------------+

        00-00 Screen
        00-01 UnionExchange
        01-01 Project(EXPR$0=[CAST($4):VARCHAR(20) CHARACTER SET "ISO-8859-1" COLLATE "ISO-8859-1$en_US$primary"])
        01-02 SelectionVectorRemover
        01-03 Filter(condition=[<($1, $3)])
        01-04 HashJoin(condition=[=($0, $2)], joinType=[inner])
        01-06 Project(o_custkey=[$1], o_orderkey=[$0])
        01-07 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/Users/asinha/data/tpchmulti/orders]], selectionRoot=/Users/asinha/data/tpchmulti/orders, columns=[SchemaPath [`o_custkey`], SchemaPath [`o_orderkey`]]]])
        01-05 BroadcastExchange
        02-01 Project(c_custkey=[$2], c_nationkey=[$1], c_name=[$0])
        02-02 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/Users/asinha/data/tpchmulti/customer]], selectionRoot=/Users/asinha/data/tpchmulti/customer, columns=[SchemaPath [`c_custkey`], SchemaPath [`c_nationkey`], SchemaPath [`c_name`]]]])

        0: jdbc:drill:zk=local> select cast(c.c_name as varchar(20)) from orders o, customer c where o.o_custkey = c.c_custkey and o.o_orderkey < c.c_nationkey;
        ------------

        EXPR$0

        ------------

        Customer#000000370
        Customer#000000781
        Customer#000001369
        Customer#000000445
        Customer#000000557
        Customer#000000392

        ------------
        6 rows selected

        Query 2: (Note the Filter above the HashJoin. The Filter has the OR condition).

        0: jdbc:drill:zk=local> explain plan for select cast(c.c_name as varchar(20)) from orders o, customer c where o.o_custkey = c.c_custkey and (o.o_orderkey = c.c_nationkey or o.o_custkey = c.c_nationkey);
        ----------------------+

        text json

        ----------------------+

        00-00 Screen
        00-01 UnionExchange
        01-01 Project(EXPR$0=[CAST($4):VARCHAR(20) CHARACTER SET "ISO-8859-1" COLLATE "ISO-8859-1$en_US$primary"])
        01-02 SelectionVectorRemover
        01-03 Filter(condition=[OR(=($1, $3), =($0, $3))])
        01-04 HashJoin(condition=[=($0, $2)], joinType=[inner])
        01-06 Project(o_custkey=[$1], o_orderkey=[$0])
        01-07 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/Users/asinha/data/tpchmulti/orders]], selectionRoot=/Users/asinha/data/tpchmulti/orders, columns=[SchemaPath [`o_custkey`], SchemaPath [`o_orderkey`]]]])
        01-05 BroadcastExchange
        02-01 Project(c_custkey=[$2], c_nationkey=[$1], c_name=[$0])
        02-02 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/Users/asinha/data/tpchmulti/customer]], selectionRoot=/Users/asinha/data/tpchmulti/customer, columns=[SchemaPath [`c_custkey`], SchemaPath [`c_nationkey`], SchemaPath [`c_name`]]]])

        0: jdbc:drill:zk=local> select cast(c.c_name as varchar(20)) from orders o, customer c where o.o_custkey = c.c_custkey and (o.o_orderkey = c.c_nationkey or o.o_custkey = c.c_nationkey);
        ------------

        EXPR$0

        ------------

        Customer#000000004
        Customer#000000004
        Customer#000000004
        Customer#000000004
        Customer#000000004
        Customer#000000004
        Customer#000000004
        Customer#000000004
        Customer#000000004
        Customer#000000004
        Customer#000000004
        Customer#000000004
        Customer#000000004
        Customer#000000004
        Customer#000000004
        Customer#000000004
        Customer#000000004
        Customer#000000004
        Customer#000000004
        Customer#000000004
        Customer#000000004
        Customer#000000004
        Customer#000000004
        Customer#000000004
        Customer#000000004
        Customer#000000004
        Customer#000000004
        Customer#000000004
        Customer#000000004
        Customer#000000004
        Customer#000000004

        ------------
        31 rows selected

        Show
        Aman Sinha added a comment - As mentioned in my previous comment, this has been fixed. I tested it again on latest build (commit level 27a9c98) with the following queries .. both Explain plans and correctness of results w.r.t Postgres. Query 1: (Note the Filter above the HashJoin. The Filter has the inequality '<' condition) 0: jdbc:drill:zk=local> explain plan for select cast(c.c_name as varchar(20)) from orders o, customer c where o.o_custkey = c.c_custkey and o.o_orderkey < c.c_nationkey; ----------- -----------+ text json ----------- -----------+ 00-00 Screen 00-01 UnionExchange 01-01 Project(EXPR$0= [CAST($4):VARCHAR(20) CHARACTER SET "ISO-8859-1" COLLATE "ISO-8859-1$en_US$primary"] ) 01-02 SelectionVectorRemover 01-03 Filter(condition= [<($1, $3)] ) 01-04 HashJoin(condition= [=($0, $2)] , joinType= [inner] ) 01-06 Project(o_custkey= [$1] , o_orderkey= [$0] ) 01-07 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/Users/asinha/data/tpchmulti/orders] ], selectionRoot=/Users/asinha/data/tpchmulti/orders, columns=[SchemaPath [`o_custkey`] , SchemaPath [`o_orderkey`] ]]]) 01-05 BroadcastExchange 02-01 Project(c_custkey= [$2] , c_nationkey= [$1] , c_name= [$0] ) 02-02 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/Users/asinha/data/tpchmulti/customer] ], selectionRoot=/Users/asinha/data/tpchmulti/customer, columns=[SchemaPath [`c_custkey`] , SchemaPath [`c_nationkey`] , SchemaPath [`c_name`] ]]]) 0: jdbc:drill:zk=local> select cast(c.c_name as varchar(20)) from orders o, customer c where o.o_custkey = c.c_custkey and o.o_orderkey < c.c_nationkey; ------------ EXPR$0 ------------ Customer#000000370 Customer#000000781 Customer#000001369 Customer#000000445 Customer#000000557 Customer#000000392 ------------ 6 rows selected Query 2: (Note the Filter above the HashJoin. The Filter has the OR condition). 0: jdbc:drill:zk=local> explain plan for select cast(c.c_name as varchar(20)) from orders o, customer c where o.o_custkey = c.c_custkey and (o.o_orderkey = c.c_nationkey or o.o_custkey = c.c_nationkey); ----------- -----------+ text json ----------- -----------+ 00-00 Screen 00-01 UnionExchange 01-01 Project(EXPR$0= [CAST($4):VARCHAR(20) CHARACTER SET "ISO-8859-1" COLLATE "ISO-8859-1$en_US$primary"] ) 01-02 SelectionVectorRemover 01-03 Filter(condition= [OR(=($1, $3), =($0, $3))] ) 01-04 HashJoin(condition= [=($0, $2)] , joinType= [inner] ) 01-06 Project(o_custkey= [$1] , o_orderkey= [$0] ) 01-07 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/Users/asinha/data/tpchmulti/orders] ], selectionRoot=/Users/asinha/data/tpchmulti/orders, columns=[SchemaPath [`o_custkey`] , SchemaPath [`o_orderkey`] ]]]) 01-05 BroadcastExchange 02-01 Project(c_custkey= [$2] , c_nationkey= [$1] , c_name= [$0] ) 02-02 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/Users/asinha/data/tpchmulti/customer] ], selectionRoot=/Users/asinha/data/tpchmulti/customer, columns=[SchemaPath [`c_custkey`] , SchemaPath [`c_nationkey`] , SchemaPath [`c_name`] ]]]) 0: jdbc:drill:zk=local> select cast(c.c_name as varchar(20)) from orders o, customer c where o.o_custkey = c.c_custkey and (o.o_orderkey = c.c_nationkey or o.o_custkey = c.c_nationkey); ------------ EXPR$0 ------------ Customer#000000004 Customer#000000004 Customer#000000004 Customer#000000004 Customer#000000004 Customer#000000004 Customer#000000004 Customer#000000004 Customer#000000004 Customer#000000004 Customer#000000004 Customer#000000004 Customer#000000004 Customer#000000004 Customer#000000004 Customer#000000004 Customer#000000004 Customer#000000004 Customer#000000004 Customer#000000004 Customer#000000004 Customer#000000004 Customer#000000004 Customer#000000004 Customer#000000004 Customer#000000004 Customer#000000004 Customer#000000004 Customer#000000004 Customer#000000004 Customer#000000004 ------------ 31 rows selected
        Hide
        Aman Sinha added a comment -

        I had checked in a fix for this in the TPCH branch, commit level 2f0da33. It should get merged into master branch sometime soon.

        Show
        Aman Sinha added a comment - I had checked in a fix for this in the TPCH branch, commit level 2f0da33. It should get merged into master branch sometime soon.

          People

          • Assignee:
            DrillCommitter
            Reporter:
            Aman Sinha
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development