Uploaded image for project: 'Tajo (Retired)'
  1. Tajo (Retired)
  2. TAJO-1597

Problem of ignoring theta join condition

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.11.0
    • Planner/Optimizer
    • None

    Description

      Tajo currently does not support theta join, non-equi theta join conditions must be evaluated as a filter after join execution. However, when non-equi theta join conditions are included at on clauses, those conditions are disappeared after filter push down optimization.

      This is because we assume that filters are pushed down from top during FPD phase, but theta join conditions from the on clause are not.

      For example, let me suppose a query that contains a projection after a join as follows.

      projection
             |
           join (contains a theta join condition)
          /     \
      scan scan
      

      During FPD optimization, the theta join condition is come up to the projection node (FilterPushDownRule.visitProjection()). This condition is converted based on information of transformedMap, which is created by findCanPushdownAndTransform() before visiting the join node. Obviously, any information for the theta join condition are not contained in transformedMap, it is ignored.

      You can reproduce this bug as follows.

      default> select n_nationkey, n_name, n_regionkey, t.cnt from nation n join (   select r_regionkey, count(*) as cnt   from nation n   join region r on (n.n_regionkey = r.r_regionkey)   group by r_regionkey ) t  on  (n.n_regionkey = t.r_regionkey) and n.n_nationkey > t.cnt  order by n_nationkey;
      ...
      -----------------------------
      Query Block Graph
      -----------------------------
      |-#ROOT
         |-#QB_0
      -----------------------------
      Optimization Log:
      [LogicalPlan]
      	> ProjectionNode is eliminated.
      	> ProjectionNode is eliminated.
      [#QB_0]
      	> Non-optimized join order: (default.nation ⋈θ default.region) (cost: 86513.6)
      	> Optimized join order    : (default.nation ⋈θ default.region) (cost: 86513.6)
      [#ROOT]
      	> Non-optimized join order: (default.nation ⋈θ default.t) (cost: 1.924062464E7)
      	> Optimized join order    : (default.nation ⋈θ default.t) (cost: 1.924062464E7)
      -----------------------------
      
      SORT(8)
        => Sort Keys: default.n.n_nationkey (INT4) (asc)
         JOIN(12)(INNER)
           => Join Cond: default.n.n_regionkey (INT4) = default.t.r_regionkey (INT4)
           => target list: default.n.n_nationkey (INT4), default.n.n_name (TEXT), default.n.n_regionkey (INT4), default.t.cnt (INT8)
           => out schema: {(4) default.n.n_nationkey (INT4), default.n.n_name (TEXT), default.n.n_regionkey (INT4), default.t.cnt (INT8)}
           => in schema: {(5) default.n.n_nationkey (INT4), default.n.n_name (TEXT), default.n.n_regionkey (INT4), default.t.cnt (INT8), default.t.r_regionkey (INT4)}
            TABLE_SUBQUERY(6) as default.t
              => Targets: default.t.cnt (INT8), default.t.r_regionkey (INT4)
              => out schema: {(2) default.t.cnt (INT8), default.t.r_regionkey (INT4)}
              => in  schema: {(2) default.t.r_regionkey (INT4), default.t.cnt (INT8)}
               GROUP_BY(4)(r_regionkey)
                 => exprs: (count())
                 => target list: default.r.r_regionkey (INT4), cnt (INT8)
                 => out schema:{(2) default.r.r_regionkey (INT4), cnt (INT8)}
                 => in schema:{(4) default.r.r_regionkey (INT4), default.n.n_regionkey (INT4), default.n.n_nationkey (INT4), default.n.n_name (TEXT)}
                  JOIN(11)(INNER)
                    => Join Cond: default.n.n_regionkey (INT4) = default.r.r_regionkey (INT4)
                    => target list: default.r.r_regionkey (INT4), default.n.n_regionkey (INT4), default.n.n_nationkey (INT4), default.n.n_name (TEXT)
                    => out schema: {(4) default.r.r_regionkey (INT4), default.n.n_regionkey (INT4), default.n.n_nationkey (INT4), default.n.n_name (TEXT)}
                    => in schema: {(4) default.n.n_regionkey (INT4), default.n.n_nationkey (INT4), default.n.n_name (TEXT), default.r.r_regionkey (INT4)}
                     SCAN(2) on default.region as r
                       => target list: default.r.r_regionkey (INT4)
                       => out schema: {(1) default.r.r_regionkey (INT4)}
                       => in schema: {(3) default.r.r_regionkey (INT4), default.r.r_name (TEXT), default.r.r_comment (TEXT)}
                     SCAN(1) on default.nation as n
                       => target list: default.n.n_regionkey (INT4), default.n.n_nationkey (INT4), default.n.n_name (TEXT)
                       => out schema: {(3) default.n.n_regionkey (INT4), default.n.n_nationkey (INT4), default.n.n_name (TEXT)}
                       => in schema: {(4) default.n.n_nationkey (INT4), default.n.n_name (TEXT), default.n.n_regionkey (INT4), default.n.n_comment (TEXT)}
            SCAN(0) on default.nation as n
              => target list: default.n.n_nationkey (INT4), default.n.n_name (TEXT), default.n.n_regionkey (INT4)
              => out schema: {(3) default.n.n_nationkey (INT4), default.n.n_name (TEXT), default.n.n_regionkey (INT4)}
              => in schema: {(4) default.n.n_nationkey (INT4), default.n.n_name (TEXT), default.n.n_regionkey (INT4), default.n.n_comment (TEXT)}
      ...
      

      Attachments

        Activity

          People

            jihoonson Jihoon Son
            jihoonson Jihoon Son
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: