Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-8807

OPTIMIZE_PARTITION_KEY_SCANS works in more cases than documented

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: Impala 3.3.0
    • Component/s: Docs
    • Labels:

      Description

      This came up here https://community.cloudera.com/t5/Support-Questions/Avoiding-hdfs-scan-when-querying-only-partition-columns/m-p/93337#M57192%3Feid=1&aid=1

      Our docs say

      This optimization does not apply if the queries contain any WHERE, GROUP BY, or HAVING clause. The relevant queries should only compute the minimum, maximum, or number of distinct values for the partition key columns across the whole table.

      This is false. Here's query illustrating it working with all three things:

      [localhost:21000] default> set optimize_partition_key_scans=true; explain select max(ss_sold_date_sk) from tpcds_parquet.store_sales where ss_sold_date_sk % 10 = 0 group by ss_sold_date_sk having max(ss_sold_date_sk) > 1000;
      OPTIMIZE_PARTITION_KEY_SCANS set to true
      Query: explain select max(ss_sold_date_sk) from tpcds_parquet.store_sales where ss_sold_date_sk % 10 = 0 group by ss_sold_date_sk having max(ss_sold_date_sk) > 1000
      +------------------------------------------------------------+
      | Explain String                                             |
      +------------------------------------------------------------+
      | Max Per-Host Resource Reservation: Memory=1.94MB Threads=1 |
      | Per-Host Resource Estimates: Memory=10MB                   |
      | Codegen disabled by planner                                |
      |                                                            |
      | PLAN-ROOT SINK                                             |
      | |                                                          |
      | 01:AGGREGATE [FINALIZE]                                    |
      | |  output: max(ss_sold_date_sk)                            |
      | |  group by: ss_sold_date_sk                               |
      | |  having: max(ss_sold_date_sk) > 1000                     |
      | |  row-size=8B cardinality=182                             |
      | |                                                          |
      | 00:UNION                                                   |
      |    constant-operands=182                                   |
      |    row-size=4B cardinality=182                             |
      +------------------------------------------------------------+
      Fetched 15 row(s) in 0.11s
      

      We should reword this to be correct.

        Attachments

          Activity

            People

            • Assignee:
              tarmstrong Tim Armstrong
              Reporter:
              tarmstrong Tim Armstrong
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: