Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-10615

Cardinality estimates for some scalar functions could be improved

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: Impala 3.4.0
    • Fix Version/s: None
    • Component/s: Frontend
    • Labels:
      None
    • Epic Color:
      ghx-label-4

      Description

      The 10% default cardinality estimate for predicates involving most scalar functions can be a significant under-estimate. Consider the following cardinality estimate with UPPER():

      [localhost:21050] tpch> explain select * from nation where upper(n_name) is not null;
      
      | 00:SCAN HDFS [tpch.nation]                                 |
      |    HDFS partitions=1/1 files=1 size=2.15KB                 |
      |    predicates: upper(n_name) IS NOT NULL                   |
      |    row-size=109B cardinality=3                             |
      +------------------------------------------------------------+
      

      Since n_name is non-null, the actual cardinality is 25, as shown below:

      [localhost:21050] tpch> explain select * from nation where n_name is not null;
      
      | 00:SCAN HDFS [tpch.nation]                                 |
      |    HDFS partitions=1/1 files=1 size=2.15KB                 |
      |    predicates: n_name IS NOT NULL                          |
      |    row-size=109B cardinality=25                            |
      +------------------------------------------------------------+
      

      In general, if a scalar function cannot change the nullability of its input, we should compute the same selectivity.
      Note that for explicit CAST, we do the right thing:

      [localhost:21050] tpch> explain select * from nation where cast(n_name as varchar(10)) is not null;
      
      | 00:SCAN HDFS [tpch.nation]                                 |
      |    HDFS partitions=1/1 files=1 size=2.15KB                 |
      |    predicates: CAST(n_name AS VARCHAR(10)) IS NOT NULL     |
      |    row-size=109B cardinality=25                            |
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                amansinha Aman Sinha
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated: