Pig
  1. Pig
  2. PIG-3679

e2e StreamingPythonUDFs_10 fails in trunk

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.13.0
    • Component/s: None
    • Labels:
      None

      Description

      The e2e test StreamingPythonUDFs_10 fails in trunk with NPE-

      Caused by: java.lang.NullPointerException
              at org.apache.pig.builtin.DoubleRound.exec(DoubleRound.java:45)
      

      The test query is as follows-

      a = load '/user/pig/tests/data/singlefile/allscalar10k' using PigStorage() as (name:chararray, age:int, gpa:double, instate:chararray);
      b = foreach a generate name, ((double)ROUND((instate=='true'?gpa:gpa+1)*10000)) / 10000.0;
      store b into '/user/pig/out/cheolsoop-1390330024-nightly.conf-StreamingPythonUDFs/StreamingPythonUDFs_10_benchmark.out';
      
      1. PIG-3679-1.patch
        1 kB
        Cheolsoo Park
      2. PIG-3679-2.patch
        3 kB
        Cheolsoo Park
      3. PIG-3679-3.patch
        0.9 kB
        Daniel Dai
      4. PIG-3679-4.patch
        2 kB
        Cheolsoo Park
      5. PIG-3679-5.patch
        3 kB
        Cheolsoo Park
      6. PIG-3679-6.patch
        4 kB
        Cheolsoo Park
      7. PIG-3679-7.patch
        5 kB
        Cheolsoo Park
      8. PIG-3679-backward-compatibility.patch
        1 kB
        Aniket Mokashi

        Issue Links

          Activity

          Cheolsoo Park created issue -
          Hide
          Cheolsoo Park added a comment -

          The problem is that random-generated allscalar10k may contain null values for instate and gpa columns. For eg,

          (fred hernandez,44,0.28,)
          (calvin brown,22,1.73,)
          (priscilla carson,71,0.0,)
          

          That causes "instate=='true'?gpa:gpa+1" to become null.

          Attaching a fix that adds a filter-by that filters out null values on those two columns. Confirmed the test passes with the fix.

          Show
          Cheolsoo Park added a comment - The problem is that random-generated allscalar10k may contain null values for instate and gpa columns. For eg, (fred hernandez,44,0.28,) (calvin brown,22,1.73,) (priscilla carson,71,0.0,) That causes "instate=='true'?gpa:gpa+1" to become null. Attaching a fix that adds a filter-by that filters out null values on those two columns. Confirmed the test passes with the fix.
          Cheolsoo Park made changes -
          Field Original Value New Value
          Attachment PIG-3679-1.patch [ 12624182 ]
          Cheolsoo Park made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Daniel Dai added a comment -

          A little suprised why we not see it before. We should always has some null in instate and gpa.

          Show
          Daniel Dai added a comment - A little suprised why we not see it before. We should always has some null in instate and gpa.
          Hide
          Cheolsoo Park added a comment -

          Good question. I don't know.

          Show
          Cheolsoo Park added a comment - Good question. I don't know.
          Hide
          Rohini Palaniswamy added a comment -

          In 0.11, null or non-numbers are ignored.

          ROUND.java

          try{
                      Double d =  DataType.toDouble(input.get(0));
          		    return Math.round(d);
                  } catch (NumberFormatException nfe){
                      System.err.println("Failed to process input; error - " + nfe.getMessage());
                      return null;
                  } catch (Exception e){
                      throw new IOException("Caught exception processing input row ", e);
                  }
          
          Show
          Rohini Palaniswamy added a comment - In 0.11, null or non-numbers are ignored. ROUND.java try { Double d = DataType.toDouble(input.get(0)); return Math .round(d); } catch (NumberFormatException nfe){ System .err.println( "Failed to process input; error - " + nfe.getMessage()); return null ; } catch (Exception e){ throw new IOException( "Caught exception processing input row " , e); }
          Hide
          Cheolsoo Park added a comment -

          But ROUND.java hasn't changed in trunk, has it?

          Show
          Cheolsoo Park added a comment - But ROUND.java hasn't changed in trunk, has it?
          Hide
          Cheolsoo Park added a comment -

          OK, I found this is a regression of PIG-3568 (Define the semantics of POStatus.STATUS_NULL). If I drop PIG-3568 in trunk, the test passes. In shorts, the exec() in DoubleRound was never called with nulls in 0.12, whereas it is in trunk resulting in NPE.

          In terms of the fix, I think we should explicitly handle NullPointerException in addition to NumberFormatException in ROUND, DoubleRound, and FloatRound. Currently, if input.get(0) returns null, Math.round() will throw NPE.

          I am attaching a new patch that adds a null check before calling Math.round(). I also copied the handler for NumberFormatException from ROUND to DoubleRound and FloadRound since it was missing in those functions.

          Show
          Cheolsoo Park added a comment - OK, I found this is a regression of PIG-3568 (Define the semantics of POStatus.STATUS_NULL). If I drop PIG-3568 in trunk, the test passes. In shorts, the exec() in DoubleRound was never called with nulls in 0.12, whereas it is in trunk resulting in NPE. In terms of the fix, I think we should explicitly handle NullPointerException in addition to NumberFormatException in ROUND, DoubleRound, and FloatRound. Currently, if input.get(0) returns null, Math.round() will throw NPE. I am attaching a new patch that adds a null check before calling Math.round(). I also copied the handler for NumberFormatException from ROUND to DoubleRound and FloadRound since it was missing in those functions.
          Cheolsoo Park made changes -
          Attachment PIG-3679-2.patch [ 12626609 ]
          Hide
          Daniel Dai added a comment -

          Thanks for pursuing this.

          Did a quick check, seems I cannot find another UDF like this. +1 for the patch.

          Show
          Daniel Dai added a comment - Thanks for pursuing this. Did a quick check, seems I cannot find another UDF like this. +1 for the patch.
          Hide
          Rohini Palaniswamy added a comment -

          If the previous behavior was to not execute in case of null and if we are executing now, isn't that a big behavior change? There might be other user written UDFs which might encounter issues even if there are no builtin ones.

          Show
          Rohini Palaniswamy added a comment - If the previous behavior was to not execute in case of null and if we are executing now, isn't that a big behavior change? There might be other user written UDFs which might encounter issues even if there are no builtin ones.
          Hide
          Daniel Dai added a comment -

          Yes, agree this might break some UDF. We should bring the old behavior.

          Show
          Daniel Dai added a comment - Yes, agree this might break some UDF. We should bring the old behavior.
          Hide
          Daniel Dai added a comment -

          Put up a patch to restore old behavior.

          Show
          Daniel Dai added a comment - Put up a patch to restore old behavior.
          Daniel Dai made changes -
          Attachment PIG-3679-3.patch [ 12626708 ]
          Hide
          Rohini Palaniswamy added a comment -

          Wouldn't it be better for us to check for POStatus.STATUS_NULL instead of temp.result == null ?

          Show
          Rohini Palaniswamy added a comment - Wouldn't it be better for us to check for POStatus.STATUS_NULL instead of temp.result == null ?
          Hide
          Daniel Dai added a comment -

          With Mark's change, POStatus.STATUS_NULL is changed into POStatus.STATUS_OK with result.result == null. This is actually more natural, however, we need to bring backward compatibility for POUserFunc.

          Show
          Daniel Dai added a comment - With Mark's change, POStatus.STATUS_NULL is changed into POStatus.STATUS_OK with result.result == null. This is actually more natural, however, we need to bring backward compatibility for POUserFunc.
          Hide
          Cheolsoo Park added a comment -

          Daniel Dai, unfortunately, you patch introduces another issue. The e2e test still fails but for a different reason. Here is the stack trace-

          org.apache.pig.backend.executionengine.ExecException: ERROR 0: Failed adding input to inputQueue
              at org.apache.pig.impl.builtin.StreamingUDF.getOutput(StreamingUDF.java:328)
              at org.apache.pig.impl.builtin.StreamingUDF.exec(StreamingUDF.java:150)
              at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:328)
              at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNextDouble(POUserFunc.java:394)
              at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:318)
              at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:378)
              at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:298)
              at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:282)
              at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
              at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
              at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
              at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:771)
              at org.apache.hadoop.mapred.MapTask.run(MapTask.java:375)
              at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
              at java.security.AccessController.doPrivileged(Native Method)
              at javax.security.auth.Subject.doAs(Subject.java:415)
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
              at org.apache.hadoop.mapred.Child.main(Child.java:249)
          Caused by: java.lang.NullPointerException
              at java.util.concurrent.ArrayBlockingQueue.checkNotNull(ArrayBlockingQueue.java:145)
              at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:319)
              at org.apache.pig.impl.builtin.StreamingUDF.getOutput(StreamingUDF.java:326)
              ... 17 more
          

          To be clear, I don't think POUserFunc used to filter out nulls before PIG-3568 because it didn't modify POUserFunc. I believe(?) it's one of expression operators that filtered out nulls in the following plan-

              |   POUserFunc(org.apache.pig.builtin.DoubleRound)[long] - scope-25
              |   |   
              |   |---Multiply[double] - scope-24
              |       |   
              |       |---POBinCond[double] - scope-21
              |       |   |   
              |       |   |---Equal To[boolean] - scope-15
              |       |   |   |   
              |       |   |   |---Project[chararray][2] - scope-13
              |       |   |   |   
              |       |   |   |---Constant(true) - scope-14
              |       |   |   
              |       |   |---Project[double][1] - scope-16
              |       |   |   
              |       |   |---Add[double] - scope-20
              |       |       |   
              |       |       |---Project[double][1] - scope-17
              |       |       |   
              |       |       |---Cast[double] - scope-19
              |       |           |   
              |       |           |---Constant(1) - scope-18
              |       |   
              |       |---Cast[double] - scope-23
              |           |   
              |           |---Constant(10000) - scope-22
          

          I am not entirely sure whether we can bring the old behavior in every case where PIG-3568 breaks backward compatibility. We discovered this particular one, but we don't know how many cases like this exist.

          Show
          Cheolsoo Park added a comment - Daniel Dai , unfortunately, you patch introduces another issue. The e2e test still fails but for a different reason. Here is the stack trace- org.apache.pig.backend.executionengine.ExecException: ERROR 0: Failed adding input to inputQueue at org.apache.pig.impl.builtin.StreamingUDF.getOutput(StreamingUDF.java:328) at org.apache.pig.impl.builtin.StreamingUDF.exec(StreamingUDF.java:150) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:328) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNextDouble(POUserFunc.java:394) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:318) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:378) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:298) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:282) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:771) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:375) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.lang.NullPointerException at java.util.concurrent.ArrayBlockingQueue.checkNotNull(ArrayBlockingQueue.java:145) at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:319) at org.apache.pig.impl.builtin.StreamingUDF.getOutput(StreamingUDF.java:326) ... 17 more To be clear, I don't think POUserFunc used to filter out nulls before PIG-3568 because it didn't modify POUserFunc. I believe(?) it's one of expression operators that filtered out nulls in the following plan- | POUserFunc(org.apache.pig.builtin.DoubleRound)[ long ] - scope-25 | | | |---Multiply[ double ] - scope-24 | | | |---POBinCond[ double ] - scope-21 | | | | | |---Equal To[ boolean ] - scope-15 | | | | | | | |---Project[chararray][2] - scope-13 | | | | | | | |---Constant( true ) - scope-14 | | | | | |---Project[ double ][1] - scope-16 | | | | | |---Add[ double ] - scope-20 | | | | | |---Project[ double ][1] - scope-17 | | | | | |---Cast[ double ] - scope-19 | | | | | |---Constant(1) - scope-18 | | | |---Cast[ double ] - scope-23 | | | |---Constant(10000) - scope-22 I am not entirely sure whether we can bring the old behavior in every case where PIG-3568 breaks backward compatibility. We discovered this particular one, but we don't know how many cases like this exist.
          Hide
          Cheolsoo Park added a comment -

          I am uploading yet another patch that fixes the e2e test. This time, I took Daniel's patch and added a null check to the exec() of StreamingUDF.

          On a second thought about backward incompatibility of PIG-3568, looks like we're safe with changing POUserFunc to filter out nulls for the following reasons-

          1. The changes in all the other PO operators except POUserFunc should be transparent to users because in the end, the result remains unchanged. The only difference is whether nulls travel the full or part of pipeline.
          2. Since POUserFunc now filters out nulls, some UDFs that used to throw NPE will no longer throw NPE (e.g. DoubleRound). But this is more of an improvement than an backward incompatible change.

          So I will commit PIG-3679-4.patch if there is no objection. Thanks!

          Show
          Cheolsoo Park added a comment - I am uploading yet another patch that fixes the e2e test. This time, I took Daniel's patch and added a null check to the exec() of StreamingUDF. On a second thought about backward incompatibility of PIG-3568 , looks like we're safe with changing POUserFunc to filter out nulls for the following reasons- The changes in all the other PO operators except POUserFunc should be transparent to users because in the end, the result remains unchanged. The only difference is whether nulls travel the full or part of pipeline. Since POUserFunc now filters out nulls, some UDFs that used to throw NPE will no longer throw NPE (e.g. DoubleRound). But this is more of an improvement than an backward incompatible change. So I will commit PIG-3679 -4.patch if there is no objection. Thanks!
          Cheolsoo Park made changes -
          Attachment PIG-3679-4.patch [ 12627282 ]
          Hide
          Daniel Dai added a comment -

          +1

          Show
          Daniel Dai added a comment - +1
          Hide
          Cheolsoo Park added a comment -

          Actually, the latest patch is still no good. I see a dozen of e2e test cases fail with NPE. To name a few, EvalFunc_5, Types_3, 4, and more. Apparently, short-circuiting nulls in POUserFunc has a side effect.

          I will need to debug them more carefully once I get back to work. Hard to debug them in a hotel room. Canceling patch for now.

          Show
          Cheolsoo Park added a comment - Actually, the latest patch is still no good. I see a dozen of e2e test cases fail with NPE. To name a few, EvalFunc_5, Types_3, 4, and more. Apparently, short-circuiting nulls in POUserFunc has a side effect. I will need to debug them more carefully once I get back to work. Hard to debug them in a hotel room. Canceling patch for now.
          Cheolsoo Park made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          Cheolsoo Park added a comment -

          Here is yet another attempt to fix the issue. I confirmed that all the affected test cases pass now.

          This time I made the logic more explicitly such that the invocation of UDF will be skipped only if all the fields in the input tuple are null. I am running full e2e test suites.

          Show
          Cheolsoo Park added a comment - Here is yet another attempt to fix the issue. I confirmed that all the affected test cases pass now. This time I made the logic more explicitly such that the invocation of UDF will be skipped only if all the fields in the input tuple are null. I am running full e2e test suites.
          Cheolsoo Park made changes -
          Attachment PIG-3679-5.patch [ 12630627 ]
          Hide
          Cheolsoo Park added a comment -

          I fixed one more minor issue in the latest patch (PIG-3679-6.patch).

          If the output schema of UDF is tuple, POUserFunc should return a tuple whose fields are null instead of a null. I ran into this issue with "Scripting_5", and I fixed it in the new patch.

          All e2e test cases pass except a known failure (Warning_4). Ready for review.

          Show
          Cheolsoo Park added a comment - I fixed one more minor issue in the latest patch ( PIG-3679 -6.patch). If the output schema of UDF is tuple, POUserFunc should return a tuple whose fields are null instead of a null. I ran into this issue with "Scripting_5", and I fixed it in the new patch. All e2e test cases pass except a known failure (Warning_4). Ready for review.
          Cheolsoo Park made changes -
          Attachment PIG-3679-6.patch [ 12631047 ]
          Cheolsoo Park made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Cheolsoo Park made changes -
          Link This issue is broken by PIG-3568 [ PIG-3568 ]
          Hide
          Cheolsoo Park added a comment -
          Show
          Cheolsoo Park added a comment - RB link- https://reviews.apache.org/r/18525/
          Hide
          Rohini Palaniswamy added a comment -

          +1

          Show
          Rohini Palaniswamy added a comment - +1
          Cheolsoo Park made changes -
          Attachment PIG-3679-7.patch [ 12631464 ]
          Hide
          Cheolsoo Park added a comment -

          Committed to trunk. Thank you Rohini for the review!

          Show
          Cheolsoo Park added a comment - Committed to trunk. Thank you Rohini for the review!
          Cheolsoo Park made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Daniel Dai made changes -
          Link This issue breaks PIG-3739 [ PIG-3739 ]
          Hide
          Aniket Mokashi added a comment -

          During investigation of PIG-3739, we identified that this still does not bring back the full backwards compatibility.

          Show
          Aniket Mokashi added a comment - During investigation of PIG-3739 , we identified that this still does not bring back the full backwards compatibility.
          Aniket Mokashi made changes -
          Resolution Fixed [ 1 ]
          Status Resolved [ 5 ] Reopened [ 4 ]
          Aniket Mokashi made changes -
          Attachment PIG-3679-backward-compatibility.patch [ 12647434 ]
          Hide
          Aniket Mokashi added a comment -

          I tried fix as per Daniel's suggestion (PIG-3739), I have attached PIG-3679-backward-compatibility.patch with fix. Daniel Dai can you check e2e after this fix?

          Results of manual testing-

          Old counters (with pig11)

          Group Name Map Reduce Total
          org.​apache.​pig.​PigWarning UDF_WARNING_3 10989 0 10989
          org.​apache.​pig.​PigWarning UDF_WARNING_4 22 0 22

          New counters (with the fix)
          New counters

          Group Name Map Reduce Total
          org.​apache.​pig.​test.​udf.​evalfunc.​TestWarningFunc UDF_WARNING_3 10989 0 10989
          org.​apache.​pig.​test.​udf.​evalfunc.​TestWarningFunc UDF_WARNING_4 22 0 22
          Show
          Aniket Mokashi added a comment - I tried fix as per Daniel's suggestion ( PIG-3739 ), I have attached PIG-3679 -backward-compatibility.patch with fix. Daniel Dai can you check e2e after this fix? Results of manual testing- Old counters (with pig11) Group Name Map Reduce Total org.​apache.​pig.​PigWarning UDF_WARNING_3 10989 0 10989 org.​apache.​pig.​PigWarning UDF_WARNING_4 22 0 22 New counters (with the fix) New counters Group Name Map Reduce Total org.​apache.​pig.​test.​udf.​evalfunc.​TestWarningFunc UDF_WARNING_3 10989 0 10989 org.​apache.​pig.​test.​udf.​evalfunc.​TestWarningFunc UDF_WARNING_4 22 0 22
          Hide
          Cheolsoo Park added a comment -

          +1 to PIG-3679-backward-compatibility.patch. Thank you Aniket for fixing it.

          Show
          Cheolsoo Park added a comment - +1 to PIG-3679 -backward-compatibility.patch . Thank you Aniket for fixing it.
          Hide
          Cheolsoo Park added a comment -

          I just ran StreamingPythonUDFs_10. It passes since the issue was with a tuple with a single null field.

          Show
          Cheolsoo Park added a comment - I just ran StreamingPythonUDFs_10. It passes since the issue was with a tuple with a single null field.
          Hide
          Daniel Dai added a comment -

          +1

          Show
          Daniel Dai added a comment - +1
          Hide
          Cheolsoo Park added a comment -

          Committed the addendum patch to both trunk and branch-0.13.

          Show
          Cheolsoo Park added a comment - Committed the addendum patch to both trunk and branch-0.13.
          Cheolsoo Park made changes -
          Status Reopened [ 4 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Daniel Dai made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Patch Available Patch Available Open Open
          16d 1h 39m 1 Cheolsoo Park 06/Feb/14 21:18
          Open Open Patch Available Patch Available
          19d 50m 2 Cheolsoo Park 25/Feb/14 21:35
          Patch Available Patch Available Resolved Resolved
          1d 9h 5m 1 Cheolsoo Park 27/Feb/14 06:41
          Resolved Resolved Reopened Reopened
          91d 12h 1m 1 Aniket Mokashi 29/May/14 19:43
          Reopened Reopened Resolved Resolved
          6h 5m 1 Cheolsoo Park 30/May/14 01:48
          Resolved Resolved Closed Closed
          38d 17h 19m 1 Daniel Dai 07/Jul/14 19:08

            People

            • Assignee:
              Cheolsoo Park
              Reporter:
              Cheolsoo Park
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development