Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-10615

Cardinality estimates for some scalar functions could be improved

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • Impala 3.4.0
    • None
    • Frontend
    • None
    • ghx-label-4

    Description

      The 10% default cardinality estimate for predicates involving most scalar functions can be a significant under-estimate. Consider the following cardinality estimate with UPPER():

      [localhost:21050] tpch> explain select * from nation where upper(n_name) is not null;
      
      | 00:SCAN HDFS [tpch.nation]                                 |
      |    HDFS partitions=1/1 files=1 size=2.15KB                 |
      |    predicates: upper(n_name) IS NOT NULL                   |
      |    row-size=109B cardinality=3                             |
      +------------------------------------------------------------+
      

      Since n_name is non-null, the actual cardinality is 25, as shown below:

      [localhost:21050] tpch> explain select * from nation where n_name is not null;
      
      | 00:SCAN HDFS [tpch.nation]                                 |
      |    HDFS partitions=1/1 files=1 size=2.15KB                 |
      |    predicates: n_name IS NOT NULL                          |
      |    row-size=109B cardinality=25                            |
      +------------------------------------------------------------+
      

      In general, if a scalar function cannot change the nullability of its input, we should compute the same selectivity.
      Note that for explicit CAST, we do the right thing:

      [localhost:21050] tpch> explain select * from nation where cast(n_name as varchar(10)) is not null;
      
      | 00:SCAN HDFS [tpch.nation]                                 |
      |    HDFS partitions=1/1 files=1 size=2.15KB                 |
      |    predicates: CAST(n_name AS VARCHAR(10)) IS NOT NULL     |
      |    row-size=109B cardinality=25                            |
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              amansinha Aman Sinha
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: