Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-7952

Planner creates non-normalized binary predicates

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • Impala 3.1.0
    • None
    • Frontend
    • None
    • ghx-label-9

    Description

      The FE has a "normalize binary predicates" rule that puts slots on the left hand side:

      1 = id --> id = 1
      

      Presumably this is useful. As the planner proceeds, it creates additional binary predicates, but tends to create them in the non-normalized form.

      Examples:

      • Expr.trySubstitute()
      • StmtRewriter.createJoinConjunct()
      • SingleNodePlanner.getNormalizedEqPred()
      • StmtRewriter.rewriteWhereClauseSubqueries()
      • HashjoinNode.init()

      Once rewrite rules are integrated into analysis, we end up with a conflict: should expressions created internally be exempt from some or all of the rewrite rules? Even from mandatory rules, such as this one?

      The solution is to allow such expressions to be rewritten to normalized form as part of the new integrate analyze-and-rewrite logic.

      Note that the trySubstitute() case needs more attention. Presumably the expressions put into the "smap" are analyzed, hence rewritten. If not, then there are probably other subtle bugs lurking in that code.

      Fixing this bug caused plans to change in PlannerTest.testJoins(). These changes suggest that one part of the analyzer works to create the "<slot> <op> <expr>" pattern, while other parts strive for the opposite, creating instability. Requires more research.

      # test that on-clause predicates referring to multiple tuple ids
      # get registered as eq join conjuncts
      select t1.*
      from (select * from functional.alltypestiny) t1
        join (select * from functional.alltypestiny) t2 on (t1.id = t2.id)
        join functional.alltypestiny t3 on (coalesce(t1.id, t2.id) = t3.id)
      

      Plan before the fix:

      PLAN-ROOT SINK
      |
      04:HASH JOIN [INNER JOIN]
      |  hash predicates: coalesce(functional.alltypestiny.id, functional.alltypestiny.id) = t3.id
      |  runtime filters: RF000 <- t3.id
      |
      |--02:SCAN HDFS [functional.alltypestiny t3]
      |     partitions=4/4 files=4 size=460B
      |
      03:HASH JOIN [INNER JOIN]
      |  hash predicates: functional.alltypestiny.id = functional.alltypestiny.id
      |  runtime filters: RF002 <- functional.alltypestiny.id
      |
      |--01:SCAN HDFS [functional.alltypestiny]
      |     partitions=4/4 files=4 size=460B
      |     runtime filters: RF000 -> coalesce(functional.alltypestiny.id, functional.alltypestiny.id)
      |
      00:SCAN HDFS [functional.alltypestiny]
         partitions=4/4 files=4 size=460B
         runtime filters: RF000 -> coalesce(functional.alltypestiny.id, functional.alltypestiny.id), RF002 -> functional.alltypestiny.id
      

      Plan after the fix, with the filter pushed further down the plan:

      PLAN-ROOT SINK
      |
      04:HASH JOIN [INNER JOIN]
      |  hash predicates: t3.id = coalesce(functional.alltypestiny.id, functional.alltypestiny.id)
      |
      |--02:SCAN HDFS [functional.alltypestiny t3]
      |     partitions=4/4 files=4 size=460B
      |
      03:HASH JOIN [INNER JOIN]
      |  hash predicates: functional.alltypestiny.id = functional.alltypestiny.id
      |  runtime filters: RF002 <- functional.alltypestiny.id
      |
      |--01:SCAN HDFS [functional.alltypestiny]
      |     partitions=4/4 files=4 size=460B
      |
      00:SCAN HDFS [functional.alltypestiny]
         partitions=4/4 files=4 size=460B
         runtime filters: RF002 -> functional.alltypestiny.id
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            Paul.Rogers Paul Rogers
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: