Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4792

NDV estimates for case expressions with limited number of output values could be improved

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • Impala 2.7.0, Impala 2.8.0, Impala 2.9.0
    • Impala 2.9.0
    • Frontend

    Description

      In the below example, the case expression only has two possible values but the planner estimates 7300 distinct values. This is a fairly common pattern in real-world queries and we could get a better estimate by analysing the case expression.

      [localhost:21000] > set explain_level=2;
      EXPLAIN_LEVEL set to 2
      [localhost:21000] > explain select distinct case when id = 1 then 'yes' else 'no' end from functional.alltypes;
      Query: explain select distinct case when id = 1 then 'yes' else 'no' end from functional.alltypes
      +---------------------------------------------------------------+
      | Explain String                                                |
      +---------------------------------------------------------------+
      | Estimated Per-Host Requirements: Memory=180.00MB VCores=2     |
      |                                                               |
      | PLAN-ROOT SINK                                                |
      | |                                                             |
      | 04:EXCHANGE [UNPARTITIONED]                                   |
      | |  hosts=3 per-host-mem=unavailable                           |
      | |  tuple-ids=1 row-size=16B cardinality=7300                  |
      | |                                                             |
      | 03:AGGREGATE [FINALIZE]                                       |
      | |  group by: CASE WHEN id = 1 THEN 'yes' ELSE 'no' END        |
      | |  hosts=3 per-host-mem=10.00MB                               |
      | |  tuple-ids=1 row-size=16B cardinality=7300                  |
      | |                                                             |
      | 02:EXCHANGE [HASH(CASE WHEN id = 1 THEN 'yes' ELSE 'no' END)] |
      | |  hosts=3 per-host-mem=0B                                    |
      | |  tuple-ids=1 row-size=16B cardinality=7300                  |
      | |                                                             |
      | 01:AGGREGATE [STREAMING]                                      |
      | |  group by: CASE WHEN id = 1 THEN 'yes' ELSE 'no' END        |
      | |  hosts=3 per-host-mem=10.00MB                               |
      | |  tuple-ids=1 row-size=16B cardinality=7300                  |
      | |                                                             |
      | 00:SCAN HDFS [functional.alltypes, RANDOM]                    |
      |    partitions=24/24 files=24 size=478.45KB                    |
      |    table stats: 7300 rows total                               |
      |    column stats: all                                          |
      |    hosts=3 per-host-mem=160.00MB                              |
      |    tuple-ids=0 row-size=4B cardinality=7300                   |
      +---------------------------------------------------------------+
      Fetched 28 row(s) in 0.01s
      [localhost:21000] > 
      

      Attachments

        Activity

          People

            joemcdonnell Joe McDonnell
            tarmstrong Tim Armstrong
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: