Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-7753

Rewrite engine ignores top-level expressions in ORDER BY clause

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • Impala 3.0
    • None
    • Frontend
    • None
    • ghx-label-2

    Description

      The select statement represents the ORDER BY clause in two distinct ways. First, there is a list of the "original" ordering expressions, orderByElements_. Second, there is an analyzed list in sortInfo_. The explanation is:

            // create copies, we don't want to modify the original parse node, in case
            // we need to print it
      

      Later, we apply rewrite rules to the ORDER BY expression, but we do so using the original version, not the copy:

            for (OrderByElement orderByElem: orderByElements_) {
              orderByElem.setExpr(rewriteCheckOrdinalResult(rewriter, orderByElem.getExpr()));
            }
      

      The result is that we apply rewrite rules to expressions which have not been analyzed, triggering the assertion mentioned above. This assertion is telling us something: we skipped a step. Here, it turns out we are rewriting the wrong set of expressions. Modifying the code to rewrite those in sortInfo_ solves the problem. The current behavior is a bug as the rewrites currently do nothing, and the expressions we thought we were rewriting are never touched.

      The correct code would rewrite the expressions which are actually used when analyzing the query:

          if (orderByElements_ != null) {
            List<Expr> sortExprs = sortInfo_.getSortExprs();
            for (int i = 0; i < sortExprs.size(); i++) {
              sortExprs.set(i, rewriteCheckOrdinalResult(rewriter, sortExprs.get(i)));
            }
      

      We can, in addition, ask a more basic question: do we even need to do rewrites for ORDER BY expressions? The only valid expressions are column references, aren't they? Or, does Impala allow expressions in the ORDER BY clause?

      Here is the result of a PlannerTest run that shows how the bug affects the conditional function rewrite:

      07:AGGREGATE [FINALIZE]
      |  output: avg(sum(t1.id)), sum(avg(g)), count(id)
      |  group by: if(TupleIsNull(), NULL, CASE WHEN int_col IS NOT NULL THEN int_col ELSE 20 END)
      |
      06:ANALYTIC
      |  functions: avg(if(TupleIsNull(), NULL, CASE WHEN id + bigint_col IS NOT NULL THEN id + bigint_col ELSE 40 END))
      |  order by: if(TupleIsNull(), NULL, CASE WHEN bigint_col IS NOT NULL THEN bigint_col ELSE 30 END) ASC
      |  window: RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
      |
      05:SORT
      |  order by: if(TupleIsNull(), NULL, CASE WHEN bigint_col IS NOT NULL THEN bigint_col ELSE 30 END) ASC
      

      Note that the top-level ifs are not rewritten, but the nested coalesce’s are rewritten to CASE.

      Because this code rewrites the wrong list, the rewrite rules contain if-statements to check for un-analyzed nodes. In actuality, no node should be un-analyzed when passing through the rewrite engine.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              Paul.Rogers Paul Rogers
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: