Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-1788

Consider replacing constant exprs with literal equivalents

    Details

      Description

      Evaluating expressions that are the same for every row can be very expensive (note these numbers are taken from the debug build, so maybe this is not an issue in release?):

      (15:33:02@desktop) ~/src/cloudera/impala (cdh5-trunk) $ impala-shell.sh -q "SELECT 1.2345678 FROM tpch.lineitem" -B 2>&1 > /dev/null
      ...
      Fetched 6001215 row(s) in 61.33s
      
      (15:34:06@desktop) ~/src/cloudera/impala (cdh5-trunk) $ impala-shell.sh -q "SELECT CAST('1.2345678' AS DOUBLE) FROM tpch.lineitem" -B 2>&1 > /dev/null
      ...
      Fetched 6001215 row(s) in 74.86s
      

        Issue Links

          Activity

          Hide
          henryr Henry Robinson added a comment -

          Actually, this one might be a bit big to bite off as a first task.

          Show
          henryr Henry Robinson added a comment - Actually, this one might be a bit big to bite off as a first task.
          Hide
          tarmstrong Tim Armstrong added a comment -

          It's not clear that there's a good way to implement this in the frontend as-is. Assigning to Alex to triage accordingly.

          Show
          tarmstrong Tim Armstrong added a comment - It's not clear that there's a good way to implement this in the frontend as-is. Assigning to Alex to triage accordingly.
          Hide
          caseyc casey added a comment -

          Kudu would like something similar - if the comparison operator can be known to be false due to type limitations, replace the expr with false. Ex, "cast(int_col as tinyint) > 1000000" is false because 1000000 is greater than the max tinyint value. Alex suggested this might be possible to do as part of this issue. I can file a new issue if that is preferred.

          Show
          caseyc casey added a comment - Kudu would like something similar - if the comparison operator can be known to be false due to type limitations, replace the expr with false. Ex, "cast(int_col as tinyint) > 1000000" is false because 1000000 is greater than the max tinyint value. Alex suggested this might be possible to do as part of this issue. I can file a new issue if that is preferred.
          Hide
          mmokhtar Mostafa Mokhtar added a comment -

          Another use case is timestamp comparisons.

          This query finished in 6 seconds

          select count(*) from lineitem_time_stamp where l_shipdate < cast('2015-04-14 00:00:00' as timestamp);
          

          Finished in 220 seconds

          select count(*) from lineitem_time_stamp where to_date(l_shipdate) < to_date(date_sub(to_date('2015-05-29'), 45))
          

          Query below finished in 75 seconds

          select count(*) from lineitem_time_stamp where l_shipdate < date_sub(to_date('2015-05-29'), 45)
          

          Converting to timestamp is very expensive, so constant folding helps massively for this particular use case.

          name type comment
          l_orderkey bigint  
          l_partkey bigint  
          l_suppkey bigint  
          l_linenumber bigint  
          l_quantity decimal(12,2)  
          l_extendedprice decimal(12,2)  
          l_discount decimal(12,2)  
          l_tax decimal(12,2)  
          l_returnflag timestamp  
          l_linestatus string  
          l_shipdate timestamp  
          l_commitdate timestamp  
          l_receiptdate timestamp  
          l_shipinstruct string  
          l_shipmode string  
          l_comment string  
          Show
          mmokhtar Mostafa Mokhtar added a comment - Another use case is timestamp comparisons. This query finished in 6 seconds select count(*) from lineitem_time_stamp where l_shipdate < cast ('2015-04-14 00:00:00' as timestamp); Finished in 220 seconds select count(*) from lineitem_time_stamp where to_date(l_shipdate) < to_date(date_sub(to_date('2015-05-29'), 45)) Query below finished in 75 seconds select count(*) from lineitem_time_stamp where l_shipdate < date_sub(to_date('2015-05-29'), 45) Converting to timestamp is very expensive, so constant folding helps massively for this particular use case. name type comment l_orderkey bigint   l_partkey bigint   l_suppkey bigint   l_linenumber bigint   l_quantity decimal(12,2)   l_extendedprice decimal(12,2)   l_discount decimal(12,2)   l_tax decimal(12,2)   l_returnflag timestamp   l_linestatus string   l_shipdate timestamp   l_commitdate timestamp   l_receiptdate timestamp   l_shipinstruct string   l_shipmode string   l_comment string  
          Hide
          alex.behm Alexander Behm added a comment -

          commit bbf5255d0e829ab553b57052ff9383aa365c7d59
          Author: Alex Behm <alex.behm@cloudera.com>
          Date: Thu Nov 3 20:53:58 2016 -0700

          IMPALA-1788: Fold constant expressions.

          Adds a new ExprRewriteRule for replacing constant expressions
          with their literal equivalent via BE evaluation. Applies the
          new rule together with the existing ones on the parse tree,
          after analysis.

          Limitations

          • Constant folding is applied on the unresolved expressions.
            As a result, it only works for expressions that are constant
            within a single query block, as opposed to expressions that
            may become constant after fully substituting inline-view exprs.
          • Exprs are not normalized, so some opportunities for constant
            folding are missed for certain expr-tree shapes.

          This patch includes the following interesting changes:

          • Introduces a timestamp literal that can only be produced
            by constant folding (not expressible directly via SQL).
          • To make sure that rewrites have no user-visible effect,
            the original result types and column labels of the top-level
            statement are restored after the rewrites are performed.
          • Does not fold exprs if their evaluation resulted in a
            warning or error, or if the resulting value is not
            representable by corresponding FE LiteralExpr.
          • Fixes an existing issue with converting strings between
            the FE/BE. String produced in the BE that have characters
            with a value > 127 are not correctly deserialized into a
            Java String via thrift. We detect this case during constant
            folding and abandon folding of such exprs.
          • Fixes several issues with detecting/reporting errors in
            NativeEvalConstExprs().
          • Cleans up ExprContext::GetValue() into
            ExprContext::GetConstantValue() which clarifies its only use
            of evaluating exprs from the FE.

          Testing:

          • Modifies expr-test.cc to run all tests through the constant
            folding path.
          • Adds basic planner and rewrite rule tests.
          • Exhaustive test run passed

          Change-Id: If672b703db1ba0bfc26e5b9130161798b40a69e9
          Reviewed-on: http://gerrit.cloudera.org:8080/5109
          Reviewed-by: Alex Behm <alex.behm@cloudera.com>
          Tested-by: Internal Jenkins

          Show
          alex.behm Alexander Behm added a comment - commit bbf5255d0e829ab553b57052ff9383aa365c7d59 Author: Alex Behm <alex.behm@cloudera.com> Date: Thu Nov 3 20:53:58 2016 -0700 IMPALA-1788 : Fold constant expressions. Adds a new ExprRewriteRule for replacing constant expressions with their literal equivalent via BE evaluation. Applies the new rule together with the existing ones on the parse tree, after analysis. Limitations Constant folding is applied on the unresolved expressions. As a result, it only works for expressions that are constant within a single query block, as opposed to expressions that may become constant after fully substituting inline-view exprs. Exprs are not normalized, so some opportunities for constant folding are missed for certain expr-tree shapes. This patch includes the following interesting changes: Introduces a timestamp literal that can only be produced by constant folding (not expressible directly via SQL). To make sure that rewrites have no user-visible effect, the original result types and column labels of the top-level statement are restored after the rewrites are performed. Does not fold exprs if their evaluation resulted in a warning or error, or if the resulting value is not representable by corresponding FE LiteralExpr. Fixes an existing issue with converting strings between the FE/BE. String produced in the BE that have characters with a value > 127 are not correctly deserialized into a Java String via thrift. We detect this case during constant folding and abandon folding of such exprs. Fixes several issues with detecting/reporting errors in NativeEvalConstExprs(). Cleans up ExprContext::GetValue() into ExprContext::GetConstantValue() which clarifies its only use of evaluating exprs from the FE. Testing: Modifies expr-test.cc to run all tests through the constant folding path. Adds basic planner and rewrite rule tests. Exhaustive test run passed Change-Id: If672b703db1ba0bfc26e5b9130161798b40a69e9 Reviewed-on: http://gerrit.cloudera.org:8080/5109 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins

            People

            • Assignee:
              alex.behm Alexander Behm
              Reporter:
              henryr Henry Robinson
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development