Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.0.0-rc1
    • Fix Version/s: None
    • Labels:
      None

      Description

      Proposal for implementing a CAST mechanism in Drill.
      ===============================================

      The casting mechanism would be of two types

      • Implicit type casting
      • Explicit type casting

      Details:

      • The Implicit type cast would take care of the casting of the lower datatype holders to the higher datatype holders automatically.

        Eg. IntHolder would be casted to Float4Holder/Float8Holder directly.

      • The explicit type casting would enable the user to use a CAST() function to cast some value to another datatype by specifying the type. The cast function would be a function exposed with syntax similar to standard SQL format.

        Eg. SELECT CAST (somevalue AS INT) FROM sometable;

      Type conversion rules:
      Conversion rules have to be similar to SQL standards.

      Implicit type conversion:

      • For arithmetic & comparison operators (+, -, *, /, <, >, =, etc) -
        • If both operands types are different, Strings would be converted to Double, and then both the operands would be converted to the same type by choosing the type with higher precision.
      • For values passed to a Function/UDF -
        • The values would be converted to the parameter accepted by the Function.
        • In case of multiple overloaded functions are present, the function with least number of conversions would be selected.
        • In case there are multiple functions with least number of conversions, there would be an error returned to user for ambiguous function.

      Explicit Type Conversion

      • User would use the CAST Function for converting types to another specified type.
      • For nonconvertible types user gets an error back.
      1. DRILL_259.yash.patch.txt
        40 kB
        Jinfeng Ni
      2. DRILL_259.jinfeng.patch.3.txt
        110 kB
        Jinfeng Ni
      3. DRILL_259.combined.patch.3.txt
        89 kB
        Jinfeng Ni

        Activity

        Hide
        Jacques Nadeau added a comment -

        Comitted in 3694542

        Show
        Jacques Nadeau added a comment - Comitted in 3694542
        Hide
        Jason Altekruse added a comment -

        Thanks Yash

        Show
        Jason Altekruse added a comment - Thanks Yash
        Hide
        Yash Sharma added a comment -

        The patch is dependent on the Explicit Cast Patch:
        https://issues.apache.org/jira/browse/DRILL-316

        Show
        Yash Sharma added a comment - The patch is dependent on the Explicit Cast Patch: https://issues.apache.org/jira/browse/DRILL-316
        Hide
        Jason Altekruse added a comment -

        I had some trouble applying the patches, what commit are they based on?

        Show
        Jason Altekruse added a comment - I had some trouble applying the patches, what commit are they based on?
        Hide
        Jinfeng Ni added a comment -

        Revised the patches based on review comments.

        Show
        Jinfeng Ni added a comment - Revised the patches based on review comments.
        Hide
        Jinfeng Ni added a comment -

        The review board is updated with the code change to process NullExpression.INSTANCE as input to a function or operator.

        Show
        Jinfeng Ni added a comment - The review board is updated with the code change to process NullExpression.INSTANCE as input to a function or operator.
        Hide
        Jinfeng Ni added a comment -

        Combined patch. With new code to handle NullExpression.INSTANCE.

        Show
        Jinfeng Ni added a comment - Combined patch. With new code to handle NullExpression.INSTANCE.
        Hide
        Jinfeng Ni added a comment -

        New code to handle NullExpression.INSTANCE.

        Show
        Jinfeng Ni added a comment - New code to handle NullExpression.INSTANCE.
        Hide
        Jinfeng Ni added a comment -

        review board link :

        https://reviews.apache.org/r/16490/

        The review board request is generated using the combined patch.

        Show
        Jinfeng Ni added a comment - review board link : https://reviews.apache.org/r/16490/ The review board request is generated using the combined patch.
        Hide
        Jinfeng Ni added a comment -

        this patch combines the code change done by Yash and Jinfeng.

        Show
        Jinfeng Ni added a comment - this patch combines the code change done by Yash and Jinfeng.
        Hide
        Jinfeng Ni added a comment -

        This patch consists the change made by Jinfeng, on top of Yash's code change.

        Show
        Jinfeng Ni added a comment - This patch consists the change made by Jinfeng, on top of Yash's code change.
        Hide
        Jinfeng Ni added a comment -

        1. This patch consists code change made by Yash.
        https://github.com/yashs360/incubator-drill-casting

        Show
        Jinfeng Ni added a comment - 1. This patch consists code change made by Yash. https://github.com/yashs360/incubator-drill-casting
        Hide
        Jinfeng Ni added a comment -

        1 . implicit cast vs explicit cast.

        Basically, implicit cast would need leverage explicit cast's implementation. For instance, given an logical expression : 1 + 3.0, if the function resolver finds that argument 1 should be implicitly casted into float4, then, drill code should transform the logical expression into cast(1 as float4) + 3.0, so that "+" operator will call the add(float4, float4) implementation.

        2. Add implicit cast function call in the logical expression tree.

        Currently, drill code uses FunctionImplementationRegistry.getFunction(FunctionCall) to get DrillFuncHolder in EvalVisitor.

        Your FunctionResolver is used to find the best match in the call of getFunction(). However, if the best match says implicit cast is required, it's kind of difficult to let getFunction() do 1) insert the cast function to the logical expression tree, and 2) get cast's DrillFuncHolder, and 3) generate the code for the cast function.

        We probably should separate the process of adding implicit cast from the logic of FunctionImplementationRegistry.getFunction() and code generation.

        To do that :
        Introduce a new Visitor class ImplicitCastBuilder.
        ImplicitCastBuilder will look similar to EvalVisitor . ImplicitCastBuilder extends AbstractExprVisitor<LogicalExpression, FunctionImplenetationRegistry, RuntimeException>
        ImplicitCastBuilder should build cast function call in bottom-up way.
        ImplicitCastBuilder will modify logical expression tree, and inject a cast function on top of an argument, if yourFunctionResolver.getBestMatch() shows implicit cast is required.
        CodeGenerator.addExpr will call ImplicitCastBuilder to insert the implicit cast() to logical expression tree .

        public HoldingContainer addExpr(LogicalExpression ex, boolean rotate){
        // logger.debug("Adding next write {}", ex);
        if(rotate) rotateBlock();
        ex = implicitCastBuilder.accept(ex, this);
        return evaluationVisitor.addExpr(ex, this);
        }

        EvalVisitor will then do the match() and code generation as before. ( No need to call your FunctionResolver.getBestMatch() at this stage, since all the required implicit cast has been inserted into the logical expression tree).

        For example, let's say we have logical expression tree f1 ( a1, f2( a2, a3))

        f1()
        / \
        / \
        a1 f2()
        / \
        / \
        a2 a3

        We may end up with the following logical expression tree, after ImplicitCastBuilder visit.

        f1()
        / \
        / \
        a1 cast1()
        \
        \
        f2()
        / \
        / \
        a2 cast2()

        a3

        Note that cast2 would be inserted first, followed by cast1 during the visite, since we need do it in bottom-up ( when we do getBestMatch() for f1 and insert cast1, we need know the output type of cast2, in order to determine output type of f2(), which is argument to f1() ).

        Show
        Jinfeng Ni added a comment - 1 . implicit cast vs explicit cast. Basically, implicit cast would need leverage explicit cast's implementation. For instance, given an logical expression : 1 + 3.0, if the function resolver finds that argument 1 should be implicitly casted into float4, then, drill code should transform the logical expression into cast(1 as float4) + 3.0, so that "+" operator will call the add(float4, float4) implementation. 2. Add implicit cast function call in the logical expression tree. Currently, drill code uses FunctionImplementationRegistry.getFunction(FunctionCall) to get DrillFuncHolder in EvalVisitor. Your FunctionResolver is used to find the best match in the call of getFunction(). However, if the best match says implicit cast is required, it's kind of difficult to let getFunction() do 1) insert the cast function to the logical expression tree, and 2) get cast's DrillFuncHolder, and 3) generate the code for the cast function. We probably should separate the process of adding implicit cast from the logic of FunctionImplementationRegistry.getFunction() and code generation. To do that : Introduce a new Visitor class ImplicitCastBuilder. ImplicitCastBuilder will look similar to EvalVisitor . ImplicitCastBuilder extends AbstractExprVisitor<LogicalExpression, FunctionImplenetationRegistry, RuntimeException> ImplicitCastBuilder should build cast function call in bottom-up way. ImplicitCastBuilder will modify logical expression tree, and inject a cast function on top of an argument, if yourFunctionResolver.getBestMatch() shows implicit cast is required. CodeGenerator.addExpr will call ImplicitCastBuilder to insert the implicit cast() to logical expression tree . public HoldingContainer addExpr(LogicalExpression ex, boolean rotate){ // logger.debug("Adding next write {}", ex); if(rotate) rotateBlock(); ex = implicitCastBuilder.accept(ex, this); return evaluationVisitor.addExpr(ex, this); } EvalVisitor will then do the match() and code generation as before. ( No need to call your FunctionResolver.getBestMatch() at this stage, since all the required implicit cast has been inserted into the logical expression tree). For example, let's say we have logical expression tree f1 ( a1, f2( a2, a3)) f1() / \ / \ a1 f2() / \ / \ a2 a3 We may end up with the following logical expression tree, after ImplicitCastBuilder visit. f1() / \ / \ a1 cast1() \ \ f2() / \ / \ a2 cast2() a3 Note that cast2 would be inserted first, followed by cast1 during the visite, since we need do it in bottom-up ( when we do getBestMatch() for f1 and insert cast1, we need know the output type of cast2, in order to determine output type of f2(), which is argument to f1() ).
        Hide
        Yash Sharma added a comment -
        Show
        Yash Sharma added a comment - Document for Cast proposal: https://docs.google.com/document/d/1HzcWg4uQ42gnz_IlgxEzpE2PjPpYvgPjkHyfA04vCac/edit# Review & Comment
        Hide
        Jacques Nadeau added a comment -

        For implicit casting, can you go into more detail about how you think it should within the existing set of classes? Thanks!

        Show
        Jacques Nadeau added a comment - For implicit casting, can you go into more detail about how you think it should within the existing set of classes? Thanks!

          People

          • Assignee:
            Jinfeng Ni
            Reporter:
            Yash Sharma
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development