Pig
  1. Pig
  2. PIG-2004

Incorrect input types passed on to eval function

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.9.0
    • Fix Version/s: 0.9.0
    • Component/s: impl
    • Labels:
      None

      Description

      The below script fails by throwing a ClassCastException from the MAX udf. The udf expects the value of the bag supplied to be databyte array, but at run time the udf gets the actual type, ie Double in this case. This causes the script execution to fail with exception;

      Caused by: java.lang.ClassCastException: java.lang.Double cannot be cast to org.apache.pig.data.DataByteArray

      The same script runs properly with Pig 0.8.

      A = LOAD 'myinput' as (f1,f2,f3);
      B = foreach A generate f1,f2+f3/1000.0 as doub;
      C = group B by f1;
      D = foreach C generate (long)(MAX(B.doub)) as f4;
      dump D;
      

      myinput
      -------
      a 1000 12345
      b 2000 23456
      c 3000 34567
      a 1500 54321
      b 2500 65432

      1. PIG-2004-0.patch
        4 kB
        Daniel Dai
      2. PIG-2004.1.patch
        8 kB
        Thejas M Nair

        Activity

        Olga Natkovich made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Thejas M Nair made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        Thejas M Nair added a comment -

        Patch committed to trunk and 0.9 branch.

        Show
        Thejas M Nair added a comment - Patch committed to trunk and 0.9 branch.
        Hide
        Daniel Dai added a comment -

        +1

        Show
        Daniel Dai added a comment - +1
        Thejas M Nair made changes -
        Attachment PIG-2004.1.patch [ 12477348 ]
        Hide
        Thejas M Nair added a comment -

        PIG-2004.1.patch

        • Reset fieldschema of all expressions from TypeCheckingExpVisitor constructor, instead of doing it in each visit function.
        • Reset target fieldschema in CastExpression, copied LHS fieldschema in BinCondExpression so that uid of inner schema is not re-used.
        • Fixed a NPE in LogicalSchema that was seen in test cases after this issue was fixed.
        Show
        Thejas M Nair added a comment - PIG-2004 .1.patch Reset fieldschema of all expressions from TypeCheckingExpVisitor constructor, instead of doing it in each visit function. Reset target fieldschema in CastExpression, copied LHS fieldschema in BinCondExpression so that uid of inner schema is not re-used. Fixed a NPE in LogicalSchema that was seen in test cases after this issue was fixed.
        Hide
        Olga Natkovich added a comment -

        Thejas - can you treat this as high priority since it can cause failures in pretty basic Pig scripts

        Show
        Olga Natkovich added a comment - Thejas - can you treat this as high priority since it can cause failures in pretty basic Pig scripts
        Daniel Dai made changes -
        Assignee Xuefu Zhang [ xuefuz ] Thejas M Nair [ thejas ]
        Daniel Dai made changes -
        Attachment PIG-2004-0.patch [ 12476939 ]
        Hide
        Daniel Dai added a comment -

        TypeCheckVisitor does not update schema before processing LOGenerate. Attach a patch for demonstration. Assign to Thejas for further investigation.

        Show
        Daniel Dai added a comment - TypeCheckVisitor does not update schema before processing LOGenerate. Attach a patch for demonstration. Assign to Thejas for further investigation.
        Olga Natkovich made changes -
        Assignee Xuefu Zhang [ xuefuz ]
        Hide
        Daniel Dai added a comment -

        Seems logical plan pick the wrong MAX implementation (should pick DoubleMax)

        Show
        Daniel Dai added a comment - Seems logical plan pick the wrong MAX implementation (should pick DoubleMax)
        Daniel Dai made changes -
        Field Original Value New Value
        Description The below script fails by throwing a ClassCastException from the MAX udf. The udf expects the value of the bag supplied to be databyte array, but at run time the udf gets the actual type, ie Double in this case. This causes the script execution to fail with exception;

        | Caused by: java.lang.ClassCastException: java.lang.Double cannot be cast to org.apache.pig.data.DataByteArray


        The same script runs properly with Pig 0.8.



        {code}
        A = LOAD 'myinput' as (f1,f2,f3);
        B = foreach A generate f1,f2+f3/1000.0 as doub;
        C = group B by f1;
        D = foreach D generate (long)(MAX(B.doub)) as f4;
        dump D;
        {code}

        myinput
        -------
        a 1000 12345
        b 2000 23456
        c 3000 34567
        a 1500 54321
        b 2500 65432

        The below script fails by throwing a ClassCastException from the MAX udf. The udf expects the value of the bag supplied to be databyte array, but at run time the udf gets the actual type, ie Double in this case. This causes the script execution to fail with exception;

        | Caused by: java.lang.ClassCastException: java.lang.Double cannot be cast to org.apache.pig.data.DataByteArray


        The same script runs properly with Pig 0.8.



        {code}
        A = LOAD 'myinput' as (f1,f2,f3);
        B = foreach A generate f1,f2+f3/1000.0 as doub;
        C = group B by f1;
        D = foreach C generate (long)(MAX(B.doub)) as f4;
        dump D;
        {code}

        myinput
        -------
        a 1000 12345
        b 2000 23456
        c 3000 34567
        a 1500 54321
        b 2500 65432

        Vivek Padmanabhan created issue -

          People

          • Assignee:
            Thejas M Nair
            Reporter:
            Vivek Padmanabhan
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development