Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-10514 Frontend changes to support external FE
  3. IMPALA-10116

Builtin cast function's selectivity is different from that of explicit cast

Attach filesAttach ScreenshotVotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Impala 3.4.0
    • Impala 4.0.0
    • Frontend
    • None
    • ghx-label-13

    Description

      Query 1 below uses 'casttobigint()' in the IS NOT NULL predicate and its selectivity is computed as the default 10% of the input rows, resulting in cardinality = 7.3K. The predicate in Query 2 with 'CAST' expr computes the correct cardinality of 73.05K.

      Query 1:

      Query: explain select * from date_dim d1, date_dim d2 where d1.d_week_seq = d2.d_week_seq - 52 and casttobigint(d1.d_week_seq) is not null and casttobigint(d2.d_week_seq) is not null
                                                             |
      | 00:SCAN HDFS [tpcds.date_dim d1]                            |
      |    HDFS partitions=1/1 files=1 size=9.84MB                  |
      |    predicates: casttobigint(d1.d_week_seq) IS NOT NULL      |
      |    runtime filters: RF000 -> d1.d_week_seq                  |
      |    row-size=255B cardinality=7.30K                          |
      +-------------------------------------------------------------+
      

      Query 2:

      Query: explain select * from date_dim d1, date_dim d2 where d1.d_week_seq = d2.d_week_seq - 52 and cast(d1.d_week_seq as bigint) is not null and cast(d2.d_week_seq as bigint) is not null 
      
      | 00:SCAN HDFS [tpcds.date_dim d1]                            |
      |    HDFS partitions=1/1 files=1 size=9.84MB                  |
      |    predicates: CAST(d1.d_week_seq AS BIGINT) IS NOT NULL    |
      |    runtime filters: RF000 -> d1.d_week_seq                  |
      |    row-size=255B cardinality=73.05K                         |
      +-------------------------------------------------------------+
      

      Query 1 should ideally provide the same cardinality as Query 2. Note that I had to comment out the following lines in FunctionCallExpr.java because a user query is not supposed to directly call the builtin cast function. However, for an external frontend module that calls functions in impala-frontend.jar, this is supported and we should make the behavior consistent.

      +//    if (isBuiltinCastFunction()) {
      +//      throw new AnalysisException(toSql() +
      +//          " is reserved for internal use only. Use 'cast(expr AS type)' instead.");
      +//    }
      

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            amansinha Aman Sinha
            amansinha Aman Sinha
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment