Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3828

Planner : Add transformation rule that traverses the single node plan to swap the build and probe sides if necessary

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • Impala 2.5.0
    • Impala 2.7.0
    • Frontend

    Description

      Today the planner creates left deep trees where joins are ordered based on selectivity.
      The proposal is to add a rule in the planner that traverses the plan after createSingleNodePlan and flip the build and probe sides if the cardinality suggests so, which produces a bushy plan.

      This optimization should improve queries against normalized schemas with selective joins and multiple fact tables, more "efficient" runtime filters will be created as a result of the plan change

      The query below generates a left deep plan, where are a bush plan should be created

      query

      select count(*) from 
      orders,supplier, lineitem 
      where 
      l_orderkey = o_orderkey and 
      s_suppkey = l_suppkey and 
      s_name="Supplier#000000001";
      

      Plan

      10:AGGREGATE [FINALIZE]
      |  output: count:merge(*)
      |  hosts=15 per-host-mem=unavailable
      |  tuple-ids=3 row-size=8B cardinality=1
      |
      09:EXCHANGE [UNPARTITIONED]
      |  hosts=15 per-host-mem=unavailable
      |  tuple-ids=3 row-size=8B cardinality=1
      |
      05:AGGREGATE
      |  output: count(*)
      |  hosts=15 per-host-mem=10.00MB
      |  tuple-ids=3 row-size=8B cardinality=1
      |
      04:HASH JOIN [INNER JOIN, PARTITIONED]
      |  hash predicates: l_orderkey = o_orderkey
      |  runtime filters: RF000 <- o_orderkey
      |  hosts=15 per-host-mem=251.77MB
      |  tuple-ids=2,1,0 row-size=66B cardinality=600
      |
      |--08:EXCHANGE [HASH(o_orderkey)]
      |  |  hosts=15 per-host-mem=0B
      |  |  tuple-ids=0 row-size=8B cardinality=450000000
      |  |
      |  00:SCAN HDFS [tpch_300_parquet.orders, RANDOM]
      |     partitions=1/1 files=81 size=18.82GB
      |     table stats: 450000000 rows total
      |     column stats: all
      |     hosts=15 per-host-mem=88.00MB
      |     tuple-ids=0 row-size=8B cardinality=450000000
      |
      07:EXCHANGE [HASH(l_orderkey)]
      |  hosts=15 per-host-mem=0B
      |  tuple-ids=2,1 row-size=58B cardinality=600
      |
      03:HASH JOIN [INNER JOIN, BROADCAST]
      |  hash predicates: l_suppkey = s_suppkey
      |  runtime filters: RF001 <- s_suppkey
      |  hosts=15 per-host-mem=47B
      |  tuple-ids=2,1 row-size=58B cardinality=600
      |
      |--06:EXCHANGE [BROADCAST]
      |  |  hosts=9 per-host-mem=0B
      |  |  tuple-ids=1 row-size=42B cardinality=1
      |  |
      |  01:SCAN HDFS [tpch_300_parquet.supplier, RANDOM]
      |     partitions=1/1 files=9 size=233.61MB
      |     predicates: s_name = 'Supplier#000000001'
      |     table stats: 3000000 rows total
      |     column stats: all
      |     hosts=9 per-host-mem=80.00MB
      |     tuple-ids=1 row-size=42B cardinality=1
      |
      02:SCAN HDFS [tpch_300_parquet.lineitem, RANDOM]
         partitions=1/1 files=264 size=64.36GB
         runtime filters: RF000 -> l_orderkey, RF001 -> l_suppkey
         table stats: 1799989091 rows total
         column stats: all
         hosts=15 per-host-mem=176.00MB
         tuple-ids=2 row-size=16B cardinality=1799989091
      

      Attachments

        Activity

          People

            alex.behm Alexander Behm
            mmokhtar Mostafa Mokhtar
            Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: