Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4680

Join disjuncts are not simplified and pushed to the scan nodes when possible

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: Impala 2.8.0
    • Fix Version/s: None
    • Component/s: Frontend
    • Labels:

      Description

      While looking at the plan for primitive_conjunct_ordering_1 noticed that the join condition can be simplified and pushed to the scan but it is not.

      https://github.com/apache/incubator-impala/blob/master/testdata/workloads/targeted-perf/queries/primitive_conjunct_ordering.test#L31

      Query

      SELECT sum(l_extendedprice * (1 - l_discount)) AS revenue
      FROM lineitem, part
      WHERE p_partkey = l_partkey
        AND (p_partkey = 0 OR l_partkey = 0)
      
      +-----------------------------------------------------------+
      | Explain String                                            |
      +-----------------------------------------------------------+
      | Estimated Per-Host Requirements: Memory=777.54MB VCores=2 |
      |                                                           |
      | PLAN-ROOT SINK                                            |
      | |                                                         |
      | 06:AGGREGATE [FINALIZE]                                   |
      | |  output: sum:merge(l_extendedprice * (1 - l_discount))  |
      | |                                                         |
      | 05:EXCHANGE [UNPARTITIONED]                               |
      | |                                                         |
      | 03:AGGREGATE                                              |
      | |  output: sum(l_extendedprice * (1 - l_discount))        |
      | |                                                         |
      | 02:HASH JOIN [INNER JOIN, BROADCAST]                      |
      | |  hash predicates: l_partkey = p_partkey                 |
      | |  other predicates: (p_partkey = 0 OR l_partkey = 0)     |
      | |  runtime filters: RF000 <- p_partkey                    |
      | |                                                         |
      | |--04:EXCHANGE [BROADCAST]                                |
      | |  |                                                      |
      | |  01:SCAN HDFS [tpch_300_parquet.part]                   |
      | |     partitions=1/1 files=14 size=1.88GB                 |
      | |                                                         |
      | 00:SCAN HDFS [tpch_300_parquet.lineitem]                  |
      |    partitions=1/1 files=259 size=63.71GB                  |
      |    runtime filters: RF000 -> l_partkey                    |
      +-----------------------------------------------------------+
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                alex.behm Alexander Behm
                Reporter:
                mmokhtar Mostafa Mokhtar
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated: