Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-11424

Support pushdown non-equi join predicate from OUTER/INNER JOIN to SCANNODE

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Frontend
    • None

    Description

      In order to reduce the amount of data read and transmitted, the non-equivalent condition of Join can be pushed to SCAN_NODE.
      For example:

      // code placeholder
      select count(1) from ( select ss.ss_ticket_numberFROM store_sales  ss LEFT OUTER JOIN store_returns sr ON (sr.sr_item_sk = ss.ss_item_sk  AND sr.sr_ticket_number >= ss.ss_ticket_number) where ss.ss_sold_date_sk = 2450816) t where t.ss_ticket_number = 79577; 

      Current plan:

      // code placeholder
      PLAN-ROOT SINK
      |
      07:AGGREGATE [FINALIZE]
      |  output: count:merge(*)
      |  row-size=8B cardinality=1
      |
      06:EXCHANGE [UNPARTITIONED]
      |
      03:AGGREGATE
      |  output: count(*)
      |  row-size=8B cardinality=1
      |
      02:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]
      |  hash predicates: sr.sr_item_sk = ss.ss_item_sk
      |  other join predicates: sr.sr_ticket_number >= ss.ss_ticket_number
      |  runtime filters: RF000 <- ss.ss_item_sk
      |  row-size=32B cardinality=16
      |
      |--05:EXCHANGE [HASH(ss.ss_item_sk)]
      |  |
      |  00:SCAN HDFS [tpcds_parquet.store_sales ss]
      |     partition predicates: ss.ss_sold_date_sk = 2450816
      |     partitions=1/1824 files=1 size=70.77KB
      |     predicates: ss.ss_ticket_number = 79577
      |     row-size=16B cardinality=1
      |
      04:EXCHANGE [HASH(sr.sr_item_sk)]
      |
      01:SCAN HDFS [tpcds_parquet.store_returns sr]
         partitions=1/1 files=1 size=15.42MB
         runtime filters: RF000 -> sr.sr_item_sk
         row-size=16B cardinality=287.51K

      After Pushdown:

      // code placeholder
      PLAN-ROOT SINK
      |
      07:AGGREGATE [FINALIZE]
      |  output: count:merge(*)
      |  row-size=8B cardinality=1
      |
      06:EXCHANGE [UNPARTITIONED]
      |
      03:AGGREGATE
      |  output: count(*)
      |  row-size=8B cardinality=1
      |
      02:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]
      |  hash predicates: sr.sr_item_sk = ss.ss_item_sk
      |  other join predicates: sr.sr_ticket_number >= ss.ss_ticket_number
      |  runtime filters: RF000 <- ss.ss_item_sk
      |  row-size=32B cardinality=16
      |
      |--05:EXCHANGE [HASH(ss.ss_item_sk)]
      |  |
      |  00:SCAN HDFS [tpcds_parquet.store_sales ss]
      |     partition predicates: ss.ss_sold_date_sk = 2450816
      |     partitions=1/1824 files=1 size=70.77KB
      |     predicates: ss.ss_ticket_number = 79577
      |     row-size=16B cardinality=1
      |
      04:EXCHANGE [HASH(sr.sr_item_sk)]
      |
      01:SCAN HDFS [tpcds_parquet.store_returns sr]
         partitions=1/1 files=1 size=15.42MB
         predicates: sr.sr_ticket_number >= 79577
         runtime filters: RF000 -> sr.sr_item_sk
         row-size=16B cardinality=28.75K 

       
      For pushdown of Join non-equi conjuncts, the current qualifications:
      1. Only support LEFT_OUTER_JOIN, RIGHT_OUTER_JOIN, INNER_JOIN;
      2. Only valid for non-equi predicates containing literalExpr,
      for example: slot >= Literal, slot in Literal list;

      Attachments

        Activity

          People

            xiabaike Baike Xia
            xiabaike Baike Xia
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: