Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-1618

SortProjectTransposeRule should check for monotonicity preserving CAST

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.11.0
    • Fix Version/s: 1.12.0
    • Component/s: core
    • Labels:
      None

      Description

      Due to this, SortProjectTransposeRule ignores CAST in the Project operator.

      For instance:

      ...
      HiveSortLimit(sort0=$4,sort1=$2,dir0=ASC-nulls-first,dir1=DESC-nulls-last,fetch=10)
        HiveProject(robot=$0,_o__c1=$2,m=$3,s=$4,(tok_function tok_int (tok_table_or_col robot))=CAST($0):INTEGER))
      ...
      

      will be transformed into:

      ...
      HiveProject(robot=$0,_o__c1=$2,m=$3,s=$4,(tok_function tok_int (tok_table_or_col robot))=CAST($0):INTEGER))
        HiveSortLimit(sort0=$0,sort1=$2,dir0=ASC-nulls-first,dir1=DESC-nulls-last,fetch=10)
      ...
      

      which is incorrect.

      The problem seems to be in the permutation method in RelOptUtil, which is called in L87. The method actually considers a CAST on a reference as a valid column permutation of the column referenced; probably it should not.

      permutation is only called by this rule and UnionPullUpConstantsRule, thus it seems it is safe to fix the semantics of the method.

        Activity

        Hide
        julianhyde Julian Hyde added a comment -

        Resolved in release 1.12.0 (2017-03-24).

        Show
        julianhyde Julian Hyde added a comment - Resolved in release 1.12.0 (2017-03-24).
        Hide
        jcamachorodriguez Jesus Camacho Rodriguez added a comment -
        Show
        jcamachorodriguez Jesus Camacho Rodriguez added a comment - Pushed in http://git-wip-us.apache.org/repos/asf/calcite/commit/2c9fddf . Thanks for the feedback Julian Hyde .
        Hide
        jcamachorodriguez Jesus Camacho Rodriguez added a comment -

        Julian Hyde, thanks for the feedback. I was a bit confused at first, but then I got it. The new PR has test cases and covers those that you mention.

        Show
        jcamachorodriguez Jesus Camacho Rodriguez added a comment - Julian Hyde , thanks for the feedback. I was a bit confused at first, but then I got it. The new PR has test cases and covers those that you mention.
        Hide
        julianhyde Julian Hyde added a comment -

        I don't know what the type of the "robot" column is in your test case. If it's a character, then I agree, it's not OK to push the Sort. But it's, say, a float, then the cast is monotonic, and it is OK to push the Sort.

        Monotonicity here is a property of the function, not its input. I think we can use a simple rule that numeric → numeric casts are monotonic, other casts are not.

        Show
        julianhyde Julian Hyde added a comment - I don't know what the type of the "robot" column is in your test case. If it's a character, then I agree, it's not OK to push the Sort. But it's, say, a float, then the cast is monotonic, and it is OK to push the Sort. Monotonicity here is a property of the function, not its input. I think we can use a simple rule that numeric → numeric casts are monotonic, other casts are not.
        Hide
        jcamachorodriguez Jesus Camacho Rodriguez added a comment -

        Julian Hyde, thanks for the feedback. I have updated the PR accordingly. Could you take a look?

        Show
        jcamachorodriguez Jesus Camacho Rodriguez added a comment - Julian Hyde , thanks for the feedback. I have updated the PR accordingly. Could you take a look?
        Hide
        jcamachorodriguez Jesus Camacho Rodriguez added a comment - - edited

        Julian Hyde, the change does the opposite (or maybe I misunderstood your second comment): not recognizing cast($i) as a simple reference - observe the code is removed by the patch, not added.

        I understand that in some occasions it might be possible to still push the Sort through the Project. I have seen that there are already utility methods to recognize whether a cast from x to y is monotonic. Let me extend the patch so we can still push the cast in these occasions.

        Show
        jcamachorodriguez Jesus Camacho Rodriguez added a comment - - edited Julian Hyde , the change does the opposite (or maybe I misunderstood your second comment): not recognizing cast($i) as a simple reference - observe the code is removed by the patch, not added. I understand that in some occasions it might be possible to still push the Sort through the Project. I have seen that there are already utility methods to recognize whether a cast from x to y is monotonic. Let me extend the patch so we can still push the cast in these occasions.
        Hide
        julianhyde Julian Hyde added a comment -

        What your change seems to do is to recognize cast($i) as a simple expression - as simple as $i - so that it the sort can occur before the cast is evaluated. In which case, you should be careful to distinguish between monotonic casts and non-monotonic casts.

        Show
        julianhyde Julian Hyde added a comment - What your change seems to do is to recognize cast($i) as a simple expression - as simple as $i - so that it the sort can occur before the cast is evaluated. In which case, you should be careful to distinguish between monotonic casts and non-monotonic casts.
        Hide
        julianhyde Julian Hyde added a comment -

        Many casts are monotonic: cast double to integer is monotonic; cast not-null integer to nullable integer is monotonic; cast character to integer is not monotonic. Furthermore, there are many implicit casts in queries, because casts are added automatically (e.g. if you join an integer column to a bigint column). If we fail to recognize when casts are monotonic I fear we will lose the ability to do range scans on indexes, etc. Or indeed to apply a limit before you do a project. Might this change cause that problem?

        Show
        julianhyde Julian Hyde added a comment - Many casts are monotonic: cast double to integer is monotonic; cast not-null integer to nullable integer is monotonic; cast character to integer is not monotonic. Furthermore, there are many implicit casts in queries, because casts are added automatically (e.g. if you join an integer column to a bigint column). If we fail to recognize when casts are monotonic I fear we will lose the ability to do range scans on indexes, etc. Or indeed to apply a limit before you do a project. Might this change cause that problem?
        Hide
        jcamachorodriguez Jesus Camacho Rodriguez added a comment -

        Julian Hyde, I have created a PR in https://github.com/apache/calcite/pull/369 . Could you take a look? Thanks

        Show
        jcamachorodriguez Jesus Camacho Rodriguez added a comment - Julian Hyde , I have created a PR in https://github.com/apache/calcite/pull/369 . Could you take a look? Thanks

          People

          • Assignee:
            jcamachorodriguez Jesus Camacho Rodriguez
            Reporter:
            jcamachorodriguez Jesus Camacho Rodriguez
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development