Pig
  1. Pig
  2. PIG-2159

New logical plan uses incorrect class for SUM causing for ClassCastException

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.9.0
    • Fix Version/s: 0.9.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      The below is my script;

      A = load 'input1' using PigStorage(',')  as (f1:int,f2:int,f3:int,f4:long,f5:double);
      B = load 'input2' using PigStorage(',')  as (f1:int,f2:int,f3:int,f4:long,f5:double);
      C = load 'input_Main' using PigStorage(',')  as (f1:int,f2:int,f3:int);
      U = UNION ONSCHEMA A,B;
      J = join C by (f1,f2,f3) LEFT OUTER, U by (f1,f2,f3);
      Porj = foreach J generate C::f1 as f1 ,C::f2 as f2,C::f3 as f3,U::f4 as f4,U::f5 as f5;
      G = GROUP Porj by (f1,f2,f3,f5);
      Final = foreach G generate SUM(Porj.f4) as total;
      dump Final;
      

      The script fails at while computing the sum with class cast exception.
      Caused by: java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Double
      at org.apache.pig.builtin.DoubleSum$Initial.exec(DoubleSum.java:82)
      ... 19 more

      This is clearly a bug in the logical plan created in 0.9. The sum operation should have processed using org.apache.pig.builtin.LongSum, but instead 0.9 logical plan have used org.apache.pig.builtin.DoubleSum which is meant for sum of doubles. And hence the ClassCastException.

      The same script works fine with Pig 0.8.

      1. PIG-2159-2.patch
        6 kB
        Daniel Dai
      2. PIG-2159-1.patch
        5 kB
        Daniel Dai

        Activity

        Hide
        Vivek Padmanabhan added a comment -

        The issue could be reproduced using the above mentioned script. A sample input is provided below ;

        input1
        100,101,102,103,104,105
        110,111,112,113,114,115
        
        input2
        200,201,202,203,204,205
        210,211,212,213,214,215
        
        input0
        100,101,102,103,104,105
        200,201,202,203,204,205
        

        The logical plan from explain for 0.9;
        #-----------------------------------------------

        1. New Logical Plan:
          #-----------------------------------------------
          Final: (Name: LOStore Schema: total#95:double)
          ---Final: (Name: LOForEach Schema: total#95:double)
           
          (Name: LOGenerate[false] Schema:
          total#95:double)ColumnPrune:InputUids=[91]ColumnPrune:OutputUids=[95]
             
            (Name: UserFunc(org.apache.pig.builtin.DoubleSum) Type:
          double Uid: 95)

        The logical plan from explain for 0.8;
        #-----------------------------------------------

        1. New Logical Plan:
          #-----------------------------------------------
          fake: (Name: LOStore Schema: total#68:long)
          ---Final: (Name: LOForEach Schema: total#68:long)
           
          (Name: LOGenerate[false] Schema:
          total#68:long)ColumnPrune:InputUids=[66]ColumnPrune:OutputUids=[68]
             
            (Name: UserFunc(org.apache.pig.builtin.LongSum) Type: long
          Uid: 68)
             
              ---(Name: Dereference Type: bag Uid: 67 Column:[3])
             
              ---Porj:(Name: Project Type: bag Uid: 66 Input: 0
          Column: )
           
            ---Porj: (Name: LOInnerLoad[1] Schema:
          f1#43:int,f2#44:int,f3#45:int,f4#59:long,f5#60:double)
        Show
        Vivek Padmanabhan added a comment - The issue could be reproduced using the above mentioned script. A sample input is provided below ; input1 100,101,102,103,104,105 110,111,112,113,114,115 input2 200,201,202,203,204,205 210,211,212,213,214,215 input0 100,101,102,103,104,105 200,201,202,203,204,205 The logical plan from explain for 0.9; #----------------------------------------------- New Logical Plan: #----------------------------------------------- Final: (Name: LOStore Schema: total#95:double) ---Final: (Name: LOForEach Schema: total#95:double)   (Name: LOGenerate [false] Schema: total#95:double)ColumnPrune:InputUids= [91] ColumnPrune:OutputUids= [95]       (Name: UserFunc(org.apache.pig.builtin.DoubleSum) Type: double Uid: 95) The logical plan from explain for 0.8; #----------------------------------------------- New Logical Plan: #----------------------------------------------- fake: (Name: LOStore Schema: total#68:long) ---Final: (Name: LOForEach Schema: total#68:long)   (Name: LOGenerate [false] Schema: total#68:long)ColumnPrune:InputUids= [66] ColumnPrune:OutputUids= [68]       (Name: UserFunc(org.apache.pig.builtin.LongSum) Type: long Uid: 68)         ---(Name: Dereference Type: bag Uid: 67 Column: [3] )         ---Porj:(Name: Project Type: bag Uid: 66 Input: 0 Column: )     ---Porj: (Name: LOInnerLoad [1] Schema: f1#43:int,f2#44:int,f3#45:int,f4#59:long,f5#60:double)
        Hide
        Daniel Dai added a comment -

        The error is caused by schema generated by unionOnSchema, which only has empty uid. This would impact many queries containing unionOnSchema.

        Show
        Daniel Dai added a comment - The error is caused by schema generated by unionOnSchema, which only has empty uid. This would impact many queries containing unionOnSchema.
        Hide
        Dmitriy V. Ryaboy added a comment -

        Sounds like a blocker for the 0.9 release, changing the priority accordingly.
        Nice catch.

        Show
        Dmitriy V. Ryaboy added a comment - Sounds like a blocker for the 0.9 release, changing the priority accordingly. Nice catch.
        Hide
        Daniel Dai added a comment -

        Fix test failure on TestUnionOnSchemaSetter.

        Show
        Daniel Dai added a comment - Fix test failure on TestUnionOnSchemaSetter.
        Hide
        Alan Gates added a comment -

        Dmitry, I don't see this as a blocker for 0.9. It does not produce wrong results and users can rewrite their scripts to work around it. I agree it should go on the 0.9 branch and be part of the anticipated 0.9.1 release.

        Show
        Alan Gates added a comment - Dmitry, I don't see this as a blocker for 0.9. It does not produce wrong results and users can rewrite their scripts to work around it. I agree it should go on the 0.9 branch and be part of the anticipated 0.9.1 release.
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/1138/
        -----------------------------------------------------------

        Review request for pig and thejas.

        Summary
        -------

        See PIG-2159

        This addresses bug PIG-2159.
        https://issues.apache.org/jira/browse/PIG-2159

        Diffs


        trunk/src/org/apache/pig/newplan/logical/relational/LOUnion.java 1146183
        trunk/test/org/apache/pig/parser/TestUnionOnSchemaSetter.java 1146183
        trunk/test/org/apache/pig/test/TestEvalPipeline2.java 1146183

        Diff: https://reviews.apache.org/r/1138/diff

        Testing
        -------

        Unit-test:
        all pass

        Test-patch:
        all pass

        Thanks,

        Daniel

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1138/ ----------------------------------------------------------- Review request for pig and thejas. Summary ------- See PIG-2159 This addresses bug PIG-2159 . https://issues.apache.org/jira/browse/PIG-2159 Diffs trunk/src/org/apache/pig/newplan/logical/relational/LOUnion.java 1146183 trunk/test/org/apache/pig/parser/TestUnionOnSchemaSetter.java 1146183 trunk/test/org/apache/pig/test/TestEvalPipeline2.java 1146183 Diff: https://reviews.apache.org/r/1138/diff Testing ------- Unit-test: all pass Test-patch: all pass Thanks, Daniel
        Hide
        Thejas M Nair added a comment -

        +1

        Show
        Thejas M Nair added a comment - +1
        Hide
        Daniel Dai added a comment -

        Patch committed to both 0.9 branch and trunk

        Show
        Daniel Dai added a comment - Patch committed to both 0.9 branch and trunk

          People

          • Assignee:
            Daniel Dai
            Reporter:
            Vivek Padmanabhan
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development