Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-10349

Revisit constant folding on non-ASCII strings

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • None
    • None
    • Frontend
    • None
    • ghx-label-9

    Description

      Constant folding may produce non-ASCII strings. In such cases, we currently abandon folding the constant. See commit message of IMPALA-1788 or codes here: https://github.com/apache/impala/blob/9672d945963e1ca3c8699340f92d7d6ce1d91c9f/fe/src/main/java/org/apache/impala/analysis/LiteralExpr.java#L274-L282

      I think we should allow folding non-ASCII strings if they are legal UTF-8 strings.

      Example of constant folding work:

      Query: explain select * from functional.alltypes where string_col = substr('123', 1, 1)
      +-------------------------------------------------------------+
      | Explain String                                              |
      +-------------------------------------------------------------+
      | Max Per-Host Resource Reservation: Memory=32.00KB Threads=3 |
      | Per-Host Resource Estimates: Memory=160MB                   |
      | Codegen disabled by planner                                 |
      |                                                             |
      | PLAN-ROOT SINK                                              |
      | |                                                           |
      | 01:EXCHANGE [UNPARTITIONED]                                 |
      | |                                                           |
      | 00:SCAN HDFS [functional.alltypes]                          |
      |    HDFS partitions=24/24 files=24 size=478.45KB             |
      |    predicates: string_col = '1'                             |
      |    row-size=89B cardinality=730                             |
      +-------------------------------------------------------------+
      

      Example of constant folding doesn't work:

      Query: explain select * from functional.alltypes where string_col = substr('引擎', 1, 3)
      +-------------------------------------------------------------+
      | Explain String                                              |
      +-------------------------------------------------------------+
      | Max Per-Host Resource Reservation: Memory=32.00KB Threads=3 |
      | Per-Host Resource Estimates: Memory=160MB                   |
      | Codegen disabled by planner                                 |
      |                                                             |
      | PLAN-ROOT SINK                                              |
      | |                                                           |
      | 01:EXCHANGE [UNPARTITIONED]                                 |
      | |                                                           |
      | 00:SCAN HDFS [functional.alltypes]                          |
      |    HDFS partitions=24/24 files=24 size=478.45KB             |
      |    predicates: string_col = substr('引擎', 1, 3)            |
      |    row-size=89B cardinality=730                             |
      +-------------------------------------------------------------+
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              stigahuang Quanlong Huang
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: