Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-1417

In RelBuilder, simplify "CAST(literal TO type)" to a literal when possible

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.11.0
    • Component/s: None
    • Labels:
      None

      Description

      In CALCITE-1357 we added some logic (in DruidDateTimeUtils.leafToRanges) to recognize various combinations of literals, casts of literals and comparison operators, and I feel that logic would be simpler if we moved the logic into RexUtil and flattened the casts of literals to just plain literals. Jesus Camacho Rodriguez, what do you think?

        Issue Links

          Activity

          Hide
          jcamachorodriguez Jesus Camacho Rodriguez added a comment -

          This makes a lot of sense Julian Hyde; the more utility methods we have, the less logic we end up coding repeatedly. I can try to take a look into detail at DruidDateTimeUtils to identify what we can factor out and generalize, so we can include it in RexUtil. If you already have a proposal, I can check that too.

          Show
          jcamachorodriguez Jesus Camacho Rodriguez added a comment - This makes a lot of sense Julian Hyde ; the more utility methods we have, the less logic we end up coding repeatedly. I can try to take a look into detail at DruidDateTimeUtils to identify what we can factor out and generalize, so we can include it in RexUtil. If you already have a proposal, I can check that too.
          Hide
          jnadeau Jacques Nadeau added a comment -

          Quick question here: doesn't constant reduction/reduce expression rule already manage these problems?

          Show
          jnadeau Jacques Nadeau added a comment - Quick question here: doesn't constant reduction/reduce expression rule already manage these problems?
          Hide
          julianhyde Julian Hyde added a comment -

          It's worth checking whether constant reduction works on these patterns. However, it's not always convenient to use constant reduction:

          • Constant reduction is a bit too heavy-weight to invoke continuously during optimization. Most people do constant reduction just once or twice (at the beginning & the end). So rules that require cast-literal to be flattened to literal won't fire on intermediate results.
          • Constant reduction probably makes use of literals (to make sure that the result has exactly the right type) so may not be able to eliminate all casts.
          • There's the issue of invalid casts, e.g. CAST('abc' AS INTEGER).
          Show
          julianhyde Julian Hyde added a comment - It's worth checking whether constant reduction works on these patterns. However, it's not always convenient to use constant reduction: Constant reduction is a bit too heavy-weight to invoke continuously during optimization. Most people do constant reduction just once or twice (at the beginning & the end). So rules that require cast-literal to be flattened to literal won't fire on intermediate results. Constant reduction probably makes use of literals (to make sure that the result has exactly the right type) so may not be able to eliminate all casts. There's the issue of invalid casts, e.g. CAST('abc' AS INTEGER) .
          Hide
          jcamachorodriguez Jesus Camacho Rodriguez added a comment -

          Jacques Nadeau, if I understand correctly, the issue is about representing expressions more compactly, e.g., 'cast on literals' as 'literals', and add some methods to RexUtil to handle them. For instance, that would avoid executing tree traversals repeatedly to check whether an expression is a literal or not. Julian Hyde, is that what you had in mind?

          Show
          jcamachorodriguez Jesus Camacho Rodriguez added a comment - Jacques Nadeau , if I understand correctly, the issue is about representing expressions more compactly, e.g., 'cast on literals' as 'literals', and add some methods to RexUtil to handle them. For instance, that would avoid executing tree traversals repeatedly to check whether an expression is a literal or not. Julian Hyde , is that what you had in mind?
          Hide
          julianhyde Julian Hyde added a comment -

          Jesus Camacho Rodriguez, Yes, my intend is to make the RexNode tree compact, and avoid rules having to deal with too many patterns. In my previous remark I made the point that constant reduction may be able to achieve that but I suspect that it is too expensive to invoke all the time, and there may be some cases it has difficultly handling.

          Show
          julianhyde Julian Hyde added a comment - Jesus Camacho Rodriguez , Yes, my intend is to make the RexNode tree compact, and avoid rules having to deal with too many patterns. In my previous remark I made the point that constant reduction may be able to achieve that but I suspect that it is too expensive to invoke all the time, and there may be some cases it has difficultly handling.
          Hide
          julianhyde Julian Hyde added a comment -

          By the way, Jesus Camacho Rodriguez, I am working on this; initial work is in https://github.com/julianhyde/calcite/tree/1417-literal-casts.

          Show
          julianhyde Julian Hyde added a comment - By the way, Jesus Camacho Rodriguez , I am working on this; initial work is in https://github.com/julianhyde/calcite/tree/1417-literal-casts .
          Hide
          jnadeau Jacques Nadeau added a comment -

          To me, a cast is just like any other function call. It just happens to have a different syntax. In any situation that the data must be consulted (e.g. invalid values), I think that the interpretation of how it should be interpreted should be left to an implementation of RelOptPlanner.Executor. Otherwise, you risk having an Executor that implements a different set of semantics than what you do in the code proposed here. While constant reduction generally may be overkill, leveraging a default implementation of Executor seems like the right approach.

          Show
          jnadeau Jacques Nadeau added a comment - To me, a cast is just like any other function call. It just happens to have a different syntax. In any situation that the data must be consulted (e.g. invalid values), I think that the interpretation of how it should be interpreted should be left to an implementation of RelOptPlanner.Executor. Otherwise, you risk having an Executor that implements a different set of semantics than what you do in the code proposed here. While constant reduction generally may be overkill, leveraging a default implementation of Executor seems like the right approach.
          Hide
          julianhyde Julian Hyde added a comment -

          I've split out the issues relating to error-handling as CALCITE-1439.

          Show
          julianhyde Julian Hyde added a comment - I've split out the issues relating to error-handling as CALCITE-1439 .
          Hide
          julianhyde Julian Hyde added a comment -

          Jesus Camacho Rodriguez, Do you remember why, when you were fixing CALCITE-1220, you changed RexUtil.simplify to not call simplifyIs for IS_NULL and IS_NOT_NULL? Your code is less powerful. Specifically, it does not try to simplify the argument to IS_NULL or IS_NOT_NULL.

          Show
          julianhyde Julian Hyde added a comment - Jesus Camacho Rodriguez , Do you remember why, when you were fixing CALCITE-1220 , you changed RexUtil.simplify to not call simplifyIs for IS_NULL and IS_NOT_NULL? Your code is less powerful. Specifically, it does not try to simplify the argument to IS_NULL or IS_NOT_NULL.
          Hide
          jcamachorodriguez Jesus Camacho Rodriguez added a comment -

          Julian Hyde, I have been checking and I think the answer is simple. I guess the Hive code was based on a version that did not have simplifyIs, which was introduced in Calcite 1.6. Then, when I brought the code to Calcite, I did not realize and thus left it as it was. Sorry about that.

          Show
          jcamachorodriguez Jesus Camacho Rodriguez added a comment - Julian Hyde , I have been checking and I think the answer is simple. I guess the Hive code was based on a version that did not have simplifyIs , which was introduced in Calcite 1.6. Then, when I brought the code to Calcite, I did not realize and thus left it as it was. Sorry about that.
          Show
          julianhyde Julian Hyde added a comment - Fixed in http://git-wip-us.apache.org/repos/asf/calcite/commit/54556b82 .
          Hide
          jnadeau Jacques Nadeau added a comment -

          Adding to my comments above, this is a really unfortunate change:

          https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rex/RexUtil.java#L111

          We just had to spend a bunch time debugging until we realized that this was happening. Please, please, please don't create this kind of technical debt. It is very hard to understand why something is failing.

          Show
          jnadeau Jacques Nadeau added a comment - Adding to my comments above, this is a really unfortunate change: https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rex/RexUtil.java#L111 We just had to spend a bunch time debugging until we realized that this was happening. Please, please, please don't create this kind of technical debt. It is very hard to understand why something is failing.
          Hide
          julianhyde Julian Hyde added a comment -

          Sorry you got stung.

          That change was actually inspired by your comments: using an executor for constant reduction, just not the executor, seems like a good start. I don't agree that using the "full" executor is the solution. The full executor uses code generation and janino, so is way too expensive to invoke every time someone calls RelBuilder.filter.

          Also, it is not accessible at this stage of query preparation (the sql-to-rel stage, before a planner has been created). CALCITE-1483 is an example where we need to deduce whether an expression can be null at sql-to-rel time.

          Also, there are forms of expression simplification which do not use constant reduction; they use logical rewrites, such as de Morgan's laws. Since executor cannot solve the whole problem, so it's not clear how much we want to invest in making it solve part of the problem.

          Show
          julianhyde Julian Hyde added a comment - Sorry you got stung. That change was actually inspired by your comments: using an executor for constant reduction, just not the executor, seems like a good start. I don't agree that using the "full" executor is the solution. The full executor uses code generation and janino, so is way too expensive to invoke every time someone calls RelBuilder.filter. Also, it is not accessible at this stage of query preparation (the sql-to-rel stage, before a planner has been created). CALCITE-1483 is an example where we need to deduce whether an expression can be null at sql-to-rel time. Also, there are forms of expression simplification which do not use constant reduction; they use logical rewrites, such as de Morgan's laws. Since executor cannot solve the whole problem, so it's not clear how much we want to invest in making it solve part of the problem.
          Hide
          julianhyde Julian Hyde added a comment -

          Resolved in release 1.11.0 (2017-01-11).

          Show
          julianhyde Julian Hyde added a comment - Resolved in release 1.11.0 (2017-01-11).

            People

            • Assignee:
              julianhyde Julian Hyde
              Reporter:
              julianhyde Julian Hyde
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development