Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4805

Avoid hash exchanges before analytic functions in more situations.

    XMLWordPrintableJSON

Details

    Description

      This case works as expected. There is no no hash exchange before sort+analytic:

      explain select /* +straight_join */ count(*) over (partition by t1.id)
      from functional.alltypes t1
      inner join /* +shuffle */ functional.alltypes t2
        on t1.id = t2.id
      
      +-----------------------------------------------------------+
      | Explain String                                            |
      +-----------------------------------------------------------+
      | Estimated Per-Host Requirements: Memory=168.01MB VCores=2 |
      |                                                           |
      | PLAN-ROOT SINK                                            |
      | |                                                         |
      | 07:EXCHANGE [UNPARTITIONED]                               |
      | |                                                         |
      | 04:ANALYTIC                                               |
      | |  functions: count(*)                                    |
      | |  partition by: t1.id                                    |
      | |                                                         |
      | 03:SORT                                                   |
      | |  order by: id ASC NULLS FIRST                           |
      | |                                                         |
      | 02:HASH JOIN [INNER JOIN, PARTITIONED]                    |
      | |  hash predicates: t1.id = t2.id                         |
      | |  runtime filters: RF000 <- t2.id                        |
      | |                                                         |
      | |--06:EXCHANGE [HASH(t2.id)]                              |
      | |  |                                                      |
      | |  01:SCAN HDFS [functional.alltypes t2]                  |
      | |     partitions=24/24 files=24 size=478.45KB             |
      | |                                                         |
      | 05:EXCHANGE [HASH(t1.id)]                                 |
      | |                                                         |
      | 00:SCAN HDFS [functional.alltypes t1]                     |
      |    partitions=24/24 files=24 size=478.45KB                |
      |    runtime filters: RF000 -> t1.id                        |
      +-----------------------------------------------------------+
      

      This equivalent case has an unnecessary hash exchange:

      explain select /* +straight_join */ count(*) over (partition by t2.id)
      from functional.alltypes t1
      inner join /* +shuffle */ functional.alltypes t2
        on t1.id = t2.id
      
      +-----------------------------------------------------------+
      | Explain String                                            |
      +-----------------------------------------------------------+
      | Estimated Per-Host Requirements: Memory=168.01MB VCores=3 |
      |                                                           |
      | PLAN-ROOT SINK                                            |
      | |                                                         |
      | 08:EXCHANGE [UNPARTITIONED]                               |
      | |                                                         |
      | 04:ANALYTIC                                               |
      | |  functions: count(*)                                    |
      | |  partition by: t2.id                                    |
      | |                                                         |
      | 03:SORT                                                   |
      | |  order by: id ASC NULLS FIRST                           |
      | |                                                         |
      | 07:EXCHANGE [HASH(t2.id)]                                 |
      | |                                                         |
      | 02:HASH JOIN [INNER JOIN, PARTITIONED]                    |
      | |  hash predicates: t1.id = t2.id                         |
      | |  runtime filters: RF000 <- t2.id                        |
      | |                                                         |
      | |--06:EXCHANGE [HASH(t2.id)]                              |
      | |  |                                                      |
      | |  01:SCAN HDFS [functional.alltypes t2]                  |
      | |     partitions=24/24 files=24 size=478.45KB             |
      | |                                                         |
      | 05:EXCHANGE [HASH(t1.id)]                                 |
      | |                                                         |
      | 00:SCAN HDFS [functional.alltypes t1]                     |
      |    partitions=24/24 files=24 size=478.45KB                |
      |    runtime filters: RF000 -> t1.id                        |
      +-----------------------------------------------------------+
      

      Attachments

        Activity

          People

            amansinha Aman Sinha
            alex.behm Alexander Behm
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: