Pig
  1. Pig
  2. PIG-3374

CASE and IN fail when expression includes dereferencing operator

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.12.0
    • Component/s: parser
    • Labels:
      None

      Description

      This is another bug that I discovered after deploying CASE/IN expressions internally.

      The current implementation of CASE/IN expression assumes that the 1st operand is a single expression. But this is not true, for example, if it contains a dereferencing operator. The following example demonstrates the problem:

      A = LOAD 'foo' AS (k1:chararray, k2:chararray, v:int);
      B = GROUP A BY (k1, k2);
      C = FILTER B BY group.k1 IN ('a', 'b');
      DUMP C;
      

      This fails with the following error:

      Caused by: java.lang.IndexOutOfBoundsException: Index: 5, Size: 5
          at java.util.ArrayList.RangeCheck(ArrayList.java:547)
          at java.util.ArrayList.get(ArrayList.java:322)
          at org.apache.pig.parser.LogicalPlanGenerator.in_eval(LogicalPlanGenerator.java:8624)
          at org.apache.pig.parser.LogicalPlanGenerator.cond(LogicalPlanGenerator.java:8405)
          at org.apache.pig.parser.LogicalPlanGenerator.filter_clause(LogicalPlanGenerator.java:7564)
          at org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1403)
          at org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:821)
          at org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:539)
          at org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:414)
          at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:181)
      

      Here is the relavant code that causes trouble:

      QueryParser.g
      if(tree.getType() == IN) {
        Tree lhs = tree.getChild(0); // lhs is not a single node!
        for(int i = 2; i < tree.getChildCount(); i = i + 2) {
          tree.insertChild(i, deepCopy(lhs));
        }
      }
      
      1. PIG-3374-4.patch
        12 kB
        Cheolsoo Park
      2. PIG-3374-3.patch
        11 kB
        Cheolsoo Park
      3. PIG-3374-2.patch
        11 kB
        Cheolsoo Park
      4. PIG-3374.patch
        11 kB
        Cheolsoo Park

        Activity

        Hide
        Cheolsoo Park added a comment -
        Show
        Cheolsoo Park added a comment - ReviewBoard: https://reviews.apache.org/r/12290/
        Hide
        Cheolsoo Park added a comment -

        Corrected few typos in a new patch.

        Show
        Cheolsoo Park added a comment - Corrected few typos in a new patch.
        Hide
        Cheolsoo Park added a comment -

        Rebased the patch to trunk. I also verified that all unit tests pass.

        In terms of the fix, here is the summary of what I did.

        When constructing antlr trees from case/in syntax, I used to collapse expressions into a flat list and assume that the first element is the lhs.

        i.e. expr IN ( expr, expr* ) -> ^( IN expr+ )

        But now I construct a list of sub-trees instead.

        i.e. lhs IN ( rhs, rhs* ) -> ^( IN ^( LHS lhs ) ^( RHS rhs )+ )

        This way it is clear where the boundaries between expressions are, so I no longer need to assume anything.

        Show
        Cheolsoo Park added a comment - Rebased the patch to trunk. I also verified that all unit tests pass. In terms of the fix, here is the summary of what I did. When constructing antlr trees from case/in syntax, I used to collapse expressions into a flat list and assume that the first element is the lhs. i.e. expr IN ( expr, expr* ) -> ^( IN expr+ ) But now I construct a list of sub-trees instead. i.e. lhs IN ( rhs, rhs* ) -> ^( IN ^( LHS lhs ) ^( RHS rhs )+ ) This way it is clear where the boundaries between expressions are, so I no longer need to assume anything.
        Hide
        Cheolsoo Park added a comment -

        I forgot to update AliasMasker.g in the previous patch, so I included the update in a new patch.

        I also discovered that PIG-3342 didn't update AliasMasker.g and included the change in the new patch.

        Show
        Cheolsoo Park added a comment - I forgot to update AliasMasker.g in the previous patch, so I included the update in a new patch. I also discovered that PIG-3342 didn't update AliasMasker.g and included the change in the new patch.
        Hide
        Cheolsoo Park added a comment -
        Show
        Cheolsoo Park added a comment - ReviewBoard: https://reviews.apache.org/r/13210/
        Hide
        Cheolsoo Park added a comment -

        Got +1 from Daniel in RB. Committed to trunk.

        Thank you Daniel for the review!

        Show
        Cheolsoo Park added a comment - Got +1 from Daniel in RB. Committed to trunk. Thank you Daniel for the review!

          People

          • Assignee:
            Cheolsoo Park
            Reporter:
            Cheolsoo Park
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development