Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-1738

Support CAST of literal values in filters pushed to Druid

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.13.0
    • Component/s: druid
    • Labels:
      None

      Description

      Because HIVE-16027 forced an implicit CAST on queries like WHERE <timestampcolumn> IN ('<literal', '<literal>'), IN, BETEWEEN and other filters are no longer pushed down to Druid. In this call stack:

      org.apache.calcite.adapter.druid.DruidDateTimeUtils.literalValue(DruidDateTimeUtils.java:246)
      	at org.apache.calcite.adapter.druid.DruidDateTimeUtils.leafToRanges(DruidDateTimeUtils.java:227)
      	at org.apache.calcite.adapter.druid.DruidDateTimeUtils.extractRanges(DruidDateTimeUtils.java:120)
      	at org.apache.calcite.adapter.druid.DruidDateTimeUtils.createInterval(DruidDateTimeUtils.java:65)
      	at org.apache.calcite.adapter.druid.DruidRules$DruidFilterRule.onMatch(DruidRules.java:186)
      	at org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:317)
      	at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:506)
      	at org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:385)
      	at org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:251)
      	at org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:125)
      	at org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:210)
      	at org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:197)
      	at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.hepPlan(CalcitePlanner.java:1790)
      	at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1518)
      	at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1265)
      	at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:113)
      	at org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:1043)
      	at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:149)
      	at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:106)
      	at org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1073)
      	at org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1089)
      	at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:368)
      	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11119)
      	at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:290)
      	at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
      

      the literalValue only knows how to handle RexLiteral case. Because of the CAST, the node is a RexCall.

        Activity

        Hide
        jcamachorodriguez Jesus Camacho Rodriguez added a comment -

        Resolved in release 1.13.0 (2017-06-26).

        Show
        jcamachorodriguez Jesus Camacho Rodriguez added a comment - Resolved in release 1.13.0 (2017-06-26).
        Hide
        julianhyde Julian Hyde added a comment -
        Show
        julianhyde Julian Hyde added a comment - Fixed in http://git-wip-us.apache.org/repos/asf/calcite/commit/495fd1dc . Thanks for the PR, Remus Rusanu !
        Hide
        julianhyde Julian Hyde added a comment -

        Reviewing now.

        Show
        julianhyde Julian Hyde added a comment - Reviewing now.
        Hide
        rusanu Remus Rusanu added a comment -

        I made the condition more tight, to cover exactly the case observed: nullability cast of TIMESTAMP NOT NULL to TIMESTAMP

        Show
        rusanu Remus Rusanu added a comment - I made the condition more tight, to cover exactly the case observed: nullability cast of TIMESTAMP NOT NULL to TIMESTAMP
        Hide
        julianhyde Julian Hyde added a comment -

        In the ideal case, all casts of literals would be flattened to literals, so this code would not be needed. So clearly we are not dealing with the ideal case. So let's figure out the exact circumstances where a cast of a literal would remain.

        One possibility is a cast of a date string (e.g. '2010-01-01') to a timestamp. That's not a "pure" cast because it needs a timezone from the executor.

        Another possibility is a cast of a non-nullable timestamp literal to a nullable timestamp.

        Based on what you've said, the latter seems more likely. I think you should narrow your code down to the exact kind of cast (e.g. one that converts a timestamp to a timestamp and changes nullability but does not change data type). Otherwise there's too much risk of unintended consequences.

        Show
        julianhyde Julian Hyde added a comment - In the ideal case, all casts of literals would be flattened to literals, so this code would not be needed. So clearly we are not dealing with the ideal case. So let's figure out the exact circumstances where a cast of a literal would remain. One possibility is a cast of a date string (e.g. '2010-01-01') to a timestamp. That's not a "pure" cast because it needs a timezone from the executor. Another possibility is a cast of a non-nullable timestamp literal to a nullable timestamp. Based on what you've said, the latter seems more likely. I think you should narrow your code down to the exact kind of cast (e.g. one that converts a timestamp to a timestamp and changes nullability but does not change data type). Otherwise there's too much risk of unintended consequences.
        Hide
        rusanu Remus Rusanu added a comment - - edited

        In the real case the code runs after constant folding, so one would not expect the CAST to be there. However, what we found is that the whole system behaves somehow unexpected. Say we start with a SQL text like ... WHERE dt BETWEEN '2010-01-01' AND '2011-01-01'. Since HIVE-16027 we're adding a CAST, so Calcite will see ... WHERE dt BETWEEN CAST('2010-01-01' AS TIMESTAMP) AND ... . During simplification the Hive executor gets invoked and ask to evaluate the expression CAST('2010-01-01' AS TIMESTAMP and we return a RexLiteral with the value 2010-01-01 of the appropriate TIMESTAMP type. Now when this value gets put back into the tree it does not match the nullability of the expression being replaced (ie. the CAST). As a result a new CAST is inserted on top of it, the relevant logic is [RexBuilder.java:553](https://github.com/apache/calcite/blob/branch-1.12/core/src/main/java/org/apache/calcite/rex/RexBuilder.java#L553). The new CAST will cast the 2010-01-01 timestamp literal to TIMESTAMP.

        When looking at this fix, I pondered whether to fix the generic case and have the DruidDateTimeUtils handle all cases. But I think that what we're dealign with here is a very specific issue and other cases (eg. CAST from a string literal) will be converted beforehand by constant folding into the actual literal. Making the code in DruidDateTimeUtils handle the generic case requires having the correct executor, much like the changes you did for CALCITE-1695, and in effect would run constant folding again in this place. So my instinct was to go with this simplified patch.

        Show
        rusanu Remus Rusanu added a comment - - edited In the real case the code runs after constant folding, so one would not expect the CAST to be there. However, what we found is that the whole system behaves somehow unexpected. Say we start with a SQL text like ... WHERE dt BETWEEN '2010-01-01' AND '2011-01-01'. Since HIVE-16027 we're adding a CAST, so Calcite will see ... WHERE dt BETWEEN CAST('2010-01-01' AS TIMESTAMP) AND ... . During simplification the Hive executor gets invoked and ask to evaluate the expression CAST('2010-01-01' AS TIMESTAMP and we return a RexLiteral with the value 2010-01-01 of the appropriate TIMESTAMP type. Now when this value gets put back into the tree it does not match the nullability of the expression being replaced (ie. the CAST). As a result a new CAST is inserted on top of it, the relevant logic is [RexBuilder.java:553] ( https://github.com/apache/calcite/blob/branch-1.12/core/src/main/java/org/apache/calcite/rex/RexBuilder.java#L553 ). The new CAST will cast the 2010-01-01 timestamp literal to TIMESTAMP. When looking at this fix, I pondered whether to fix the generic case and have the DruidDateTimeUtils handle all cases. But I think that what we're dealign with here is a very specific issue and other cases (eg. CAST from a string literal) will be converted beforehand by constant folding into the actual literal. Making the code in DruidDateTimeUtils handle the generic case requires having the correct executor, much like the changes you did for CALCITE-1695 , and in effect would run constant folding again in this place. So my instinct was to go with this simplified patch.
        Hide
        julianhyde Julian Hyde added a comment -

        If a cast of a literal can be flattened to a literal, why has it not been flattened in this case?

        It looks as if your code just removes the cast; thus CAST('foo' AS TIMESTAMP) becomes a CHAR literal 'foo'. Is the code that calls literalValue prepared to handle that?

        Please rename testFilterWithCAST to testFilterWithCast.

        Show
        julianhyde Julian Hyde added a comment - If a cast of a literal can be flattened to a literal, why has it not been flattened in this case? It looks as if your code just removes the cast; thus CAST('foo' AS TIMESTAMP) becomes a CHAR literal 'foo'. Is the code that calls literalValue prepared to handle that? Please rename testFilterWithCAST to testFilterWithCast .
        Show
        rusanu Remus Rusanu added a comment - https://github.com/apache/calcite/pull/430

          People

          • Assignee:
            rusanu Remus Rusanu
            Reporter:
            rusanu Remus Rusanu
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development