Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4680

Join disjuncts are not simplified and pushed to the scan nodes when possible

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • Impala 2.8.0
    • None
    • Frontend

    Description

      While looking at the plan for primitive_conjunct_ordering_1 noticed that the join condition can be simplified and pushed to the scan but it is not.

      https://github.com/apache/incubator-impala/blob/master/testdata/workloads/targeted-perf/queries/primitive_conjunct_ordering.test#L31

      Query

      SELECT sum(l_extendedprice * (1 - l_discount)) AS revenue
      FROM lineitem, part
      WHERE p_partkey = l_partkey
        AND (p_partkey = 0 OR l_partkey = 0)
      
      +-----------------------------------------------------------+
      | Explain String                                            |
      +-----------------------------------------------------------+
      | Estimated Per-Host Requirements: Memory=777.54MB VCores=2 |
      |                                                           |
      | PLAN-ROOT SINK                                            |
      | |                                                         |
      | 06:AGGREGATE [FINALIZE]                                   |
      | |  output: sum:merge(l_extendedprice * (1 - l_discount))  |
      | |                                                         |
      | 05:EXCHANGE [UNPARTITIONED]                               |
      | |                                                         |
      | 03:AGGREGATE                                              |
      | |  output: sum(l_extendedprice * (1 - l_discount))        |
      | |                                                         |
      | 02:HASH JOIN [INNER JOIN, BROADCAST]                      |
      | |  hash predicates: l_partkey = p_partkey                 |
      | |  other predicates: (p_partkey = 0 OR l_partkey = 0)     |
      | |  runtime filters: RF000 <- p_partkey                    |
      | |                                                         |
      | |--04:EXCHANGE [BROADCAST]                                |
      | |  |                                                      |
      | |  01:SCAN HDFS [tpch_300_parquet.part]                   |
      | |     partitions=1/1 files=14 size=1.88GB                 |
      | |                                                         |
      | 00:SCAN HDFS [tpch_300_parquet.lineitem]                  |
      |    partitions=1/1 files=259 size=63.71GB                  |
      |    runtime filters: RF000 -> l_partkey                    |
      +-----------------------------------------------------------+
      

      Attachments

        Issue Links

          Activity

            People

              alex.behm Alexander Behm
              mmokhtar Mostafa Mokhtar
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: