Pig
  1. Pig
  2. PIG-3679

e2e StreamingPythonUDFs_10 fails in trunk

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.13.0
    • Component/s: None
    • Labels:
      None

      Description

      The e2e test StreamingPythonUDFs_10 fails in trunk with NPE-

      Caused by: java.lang.NullPointerException
              at org.apache.pig.builtin.DoubleRound.exec(DoubleRound.java:45)
      

      The test query is as follows-

      a = load '/user/pig/tests/data/singlefile/allscalar10k' using PigStorage() as (name:chararray, age:int, gpa:double, instate:chararray);
      b = foreach a generate name, ((double)ROUND((instate=='true'?gpa:gpa+1)*10000)) / 10000.0;
      store b into '/user/pig/out/cheolsoop-1390330024-nightly.conf-StreamingPythonUDFs/StreamingPythonUDFs_10_benchmark.out';
      
      1. PIG-3679-backward-compatibility.patch
        1 kB
        Aniket Mokashi
      2. PIG-3679-7.patch
        5 kB
        Cheolsoo Park
      3. PIG-3679-6.patch
        4 kB
        Cheolsoo Park
      4. PIG-3679-5.patch
        3 kB
        Cheolsoo Park
      5. PIG-3679-4.patch
        2 kB
        Cheolsoo Park
      6. PIG-3679-3.patch
        0.9 kB
        Daniel Dai
      7. PIG-3679-2.patch
        3 kB
        Cheolsoo Park
      8. PIG-3679-1.patch
        1 kB
        Cheolsoo Park

        Issue Links

          Activity

          Hide
          Cheolsoo Park added a comment -

          The problem is that random-generated allscalar10k may contain null values for instate and gpa columns. For eg,

          (fred hernandez,44,0.28,)
          (calvin brown,22,1.73,)
          (priscilla carson,71,0.0,)
          

          That causes "instate=='true'?gpa:gpa+1" to become null.

          Attaching a fix that adds a filter-by that filters out null values on those two columns. Confirmed the test passes with the fix.

          Show
          Cheolsoo Park added a comment - The problem is that random-generated allscalar10k may contain null values for instate and gpa columns. For eg, (fred hernandez,44,0.28,) (calvin brown,22,1.73,) (priscilla carson,71,0.0,) That causes "instate=='true'?gpa:gpa+1" to become null. Attaching a fix that adds a filter-by that filters out null values on those two columns. Confirmed the test passes with the fix.
          Hide
          Daniel Dai added a comment -

          A little suprised why we not see it before. We should always has some null in instate and gpa.

          Show
          Daniel Dai added a comment - A little suprised why we not see it before. We should always has some null in instate and gpa.
          Hide
          Cheolsoo Park added a comment -

          Good question. I don't know.

          Show
          Cheolsoo Park added a comment - Good question. I don't know.
          Hide
          Rohini Palaniswamy added a comment -

          In 0.11, null or non-numbers are ignored.

          ROUND.java

          try{
                      Double d =  DataType.toDouble(input.get(0));
          		    return Math.round(d);
                  } catch (NumberFormatException nfe){
                      System.err.println("Failed to process input; error - " + nfe.getMessage());
                      return null;
                  } catch (Exception e){
                      throw new IOException("Caught exception processing input row ", e);
                  }
          
          Show
          Rohini Palaniswamy added a comment - In 0.11, null or non-numbers are ignored. ROUND.java try { Double d = DataType.toDouble(input.get(0)); return Math .round(d); } catch (NumberFormatException nfe){ System .err.println( "Failed to process input; error - " + nfe.getMessage()); return null ; } catch (Exception e){ throw new IOException( "Caught exception processing input row " , e); }
          Hide
          Cheolsoo Park added a comment -

          But ROUND.java hasn't changed in trunk, has it?

          Show
          Cheolsoo Park added a comment - But ROUND.java hasn't changed in trunk, has it?
          Hide
          Cheolsoo Park added a comment -

          OK, I found this is a regression of PIG-3568 (Define the semantics of POStatus.STATUS_NULL). If I drop PIG-3568 in trunk, the test passes. In shorts, the exec() in DoubleRound was never called with nulls in 0.12, whereas it is in trunk resulting in NPE.

          In terms of the fix, I think we should explicitly handle NullPointerException in addition to NumberFormatException in ROUND, DoubleRound, and FloatRound. Currently, if input.get(0) returns null, Math.round() will throw NPE.

          I am attaching a new patch that adds a null check before calling Math.round(). I also copied the handler for NumberFormatException from ROUND to DoubleRound and FloadRound since it was missing in those functions.

          Show
          Cheolsoo Park added a comment - OK, I found this is a regression of PIG-3568 (Define the semantics of POStatus.STATUS_NULL). If I drop PIG-3568 in trunk, the test passes. In shorts, the exec() in DoubleRound was never called with nulls in 0.12, whereas it is in trunk resulting in NPE. In terms of the fix, I think we should explicitly handle NullPointerException in addition to NumberFormatException in ROUND, DoubleRound, and FloatRound. Currently, if input.get(0) returns null, Math.round() will throw NPE. I am attaching a new patch that adds a null check before calling Math.round(). I also copied the handler for NumberFormatException from ROUND to DoubleRound and FloadRound since it was missing in those functions.
          Hide
          Daniel Dai added a comment -

          Thanks for pursuing this.

          Did a quick check, seems I cannot find another UDF like this. +1 for the patch.

          Show
          Daniel Dai added a comment - Thanks for pursuing this. Did a quick check, seems I cannot find another UDF like this. +1 for the patch.
          Hide
          Rohini Palaniswamy added a comment -

          If the previous behavior was to not execute in case of null and if we are executing now, isn't that a big behavior change? There might be other user written UDFs which might encounter issues even if there are no builtin ones.

          Show
          Rohini Palaniswamy added a comment - If the previous behavior was to not execute in case of null and if we are executing now, isn't that a big behavior change? There might be other user written UDFs which might encounter issues even if there are no builtin ones.
          Hide
          Daniel Dai added a comment -

          Yes, agree this might break some UDF. We should bring the old behavior.

          Show
          Daniel Dai added a comment - Yes, agree this might break some UDF. We should bring the old behavior.
          Hide
          Daniel Dai added a comment -

          Put up a patch to restore old behavior.

          Show
          Daniel Dai added a comment - Put up a patch to restore old behavior.
          Hide
          Rohini Palaniswamy added a comment -

          Wouldn't it be better for us to check for POStatus.STATUS_NULL instead of temp.result == null ?

          Show
          Rohini Palaniswamy added a comment - Wouldn't it be better for us to check for POStatus.STATUS_NULL instead of temp.result == null ?
          Hide
          Daniel Dai added a comment -

          With Mark's change, POStatus.STATUS_NULL is changed into POStatus.STATUS_OK with result.result == null. This is actually more natural, however, we need to bring backward compatibility for POUserFunc.

          Show
          Daniel Dai added a comment - With Mark's change, POStatus.STATUS_NULL is changed into POStatus.STATUS_OK with result.result == null. This is actually more natural, however, we need to bring backward compatibility for POUserFunc.
          Hide
          Cheolsoo Park added a comment -

          Daniel Dai, unfortunately, you patch introduces another issue. The e2e test still fails but for a different reason. Here is the stack trace-

          org.apache.pig.backend.executionengine.ExecException: ERROR 0: Failed adding input to inputQueue
              at org.apache.pig.impl.builtin.StreamingUDF.getOutput(StreamingUDF.java:328)
              at org.apache.pig.impl.builtin.StreamingUDF.exec(StreamingUDF.java:150)
              at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:328)
              at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNextDouble(POUserFunc.java:394)
              at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:318)
              at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:378)
              at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:298)
              at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:282)
              at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
              at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
              at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
              at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:771)
              at org.apache.hadoop.mapred.MapTask.run(MapTask.java:375)
              at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
              at java.security.AccessController.doPrivileged(Native Method)
              at javax.security.auth.Subject.doAs(Subject.java:415)
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
              at org.apache.hadoop.mapred.Child.main(Child.java:249)
          Caused by: java.lang.NullPointerException
              at java.util.concurrent.ArrayBlockingQueue.checkNotNull(ArrayBlockingQueue.java:145)
              at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:319)
              at org.apache.pig.impl.builtin.StreamingUDF.getOutput(StreamingUDF.java:326)
              ... 17 more
          

          To be clear, I don't think POUserFunc used to filter out nulls before PIG-3568 because it didn't modify POUserFunc. I believe(?) it's one of expression operators that filtered out nulls in the following plan-

              |   POUserFunc(org.apache.pig.builtin.DoubleRound)[long] - scope-25
              |   |   
              |   |---Multiply[double] - scope-24
              |       |   
              |       |---POBinCond[double] - scope-21
              |       |   |   
              |       |   |---Equal To[boolean] - scope-15
              |       |   |   |   
              |       |   |   |---Project[chararray][2] - scope-13
              |       |   |   |   
              |       |   |   |---Constant(true) - scope-14
              |       |   |   
              |       |   |---Project[double][1] - scope-16
              |       |   |   
              |       |   |---Add[double] - scope-20
              |       |       |   
              |       |       |---Project[double][1] - scope-17
              |       |       |   
              |       |       |---Cast[double] - scope-19
              |       |           |   
              |       |           |---Constant(1) - scope-18
              |       |   
              |       |---Cast[double] - scope-23
              |           |   
              |           |---Constant(10000) - scope-22
          

          I am not entirely sure whether we can bring the old behavior in every case where PIG-3568 breaks backward compatibility. We discovered this particular one, but we don't know how many cases like this exist.

          Show
          Cheolsoo Park added a comment - Daniel Dai , unfortunately, you patch introduces another issue. The e2e test still fails but for a different reason. Here is the stack trace- org.apache.pig.backend.executionengine.ExecException: ERROR 0: Failed adding input to inputQueue at org.apache.pig.impl.builtin.StreamingUDF.getOutput(StreamingUDF.java:328) at org.apache.pig.impl.builtin.StreamingUDF.exec(StreamingUDF.java:150) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:328) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNextDouble(POUserFunc.java:394) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:318) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:378) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:298) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:282) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:771) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:375) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.lang.NullPointerException at java.util.concurrent.ArrayBlockingQueue.checkNotNull(ArrayBlockingQueue.java:145) at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:319) at org.apache.pig.impl.builtin.StreamingUDF.getOutput(StreamingUDF.java:326) ... 17 more To be clear, I don't think POUserFunc used to filter out nulls before PIG-3568 because it didn't modify POUserFunc. I believe(?) it's one of expression operators that filtered out nulls in the following plan- | POUserFunc(org.apache.pig.builtin.DoubleRound)[ long ] - scope-25 | | | |---Multiply[ double ] - scope-24 | | | |---POBinCond[ double ] - scope-21 | | | | | |---Equal To[ boolean ] - scope-15 | | | | | | | |---Project[chararray][2] - scope-13 | | | | | | | |---Constant( true ) - scope-14 | | | | | |---Project[ double ][1] - scope-16 | | | | | |---Add[ double ] - scope-20 | | | | | |---Project[ double ][1] - scope-17 | | | | | |---Cast[ double ] - scope-19 | | | | | |---Constant(1) - scope-18 | | | |---Cast[ double ] - scope-23 | | | |---Constant(10000) - scope-22 I am not entirely sure whether we can bring the old behavior in every case where PIG-3568 breaks backward compatibility. We discovered this particular one, but we don't know how many cases like this exist.
          Hide
          Cheolsoo Park added a comment -

          I am uploading yet another patch that fixes the e2e test. This time, I took Daniel's patch and added a null check to the exec() of StreamingUDF.

          On a second thought about backward incompatibility of PIG-3568, looks like we're safe with changing POUserFunc to filter out nulls for the following reasons-

          1. The changes in all the other PO operators except POUserFunc should be transparent to users because in the end, the result remains unchanged. The only difference is whether nulls travel the full or part of pipeline.
          2. Since POUserFunc now filters out nulls, some UDFs that used to throw NPE will no longer throw NPE (e.g. DoubleRound). But this is more of an improvement than an backward incompatible change.

          So I will commit PIG-3679-4.patch if there is no objection. Thanks!

          Show
          Cheolsoo Park added a comment - I am uploading yet another patch that fixes the e2e test. This time, I took Daniel's patch and added a null check to the exec() of StreamingUDF. On a second thought about backward incompatibility of PIG-3568 , looks like we're safe with changing POUserFunc to filter out nulls for the following reasons- The changes in all the other PO operators except POUserFunc should be transparent to users because in the end, the result remains unchanged. The only difference is whether nulls travel the full or part of pipeline. Since POUserFunc now filters out nulls, some UDFs that used to throw NPE will no longer throw NPE (e.g. DoubleRound). But this is more of an improvement than an backward incompatible change. So I will commit PIG-3679 -4.patch if there is no objection. Thanks!
          Hide
          Daniel Dai added a comment -

          +1

          Show
          Daniel Dai added a comment - +1
          Hide
          Cheolsoo Park added a comment -

          Actually, the latest patch is still no good. I see a dozen of e2e test cases fail with NPE. To name a few, EvalFunc_5, Types_3, 4, and more. Apparently, short-circuiting nulls in POUserFunc has a side effect.

          I will need to debug them more carefully once I get back to work. Hard to debug them in a hotel room. Canceling patch for now.

          Show
          Cheolsoo Park added a comment - Actually, the latest patch is still no good. I see a dozen of e2e test cases fail with NPE. To name a few, EvalFunc_5, Types_3, 4, and more. Apparently, short-circuiting nulls in POUserFunc has a side effect. I will need to debug them more carefully once I get back to work. Hard to debug them in a hotel room. Canceling patch for now.
          Hide
          Cheolsoo Park added a comment -

          Here is yet another attempt to fix the issue. I confirmed that all the affected test cases pass now.

          This time I made the logic more explicitly such that the invocation of UDF will be skipped only if all the fields in the input tuple are null. I am running full e2e test suites.

          Show
          Cheolsoo Park added a comment - Here is yet another attempt to fix the issue. I confirmed that all the affected test cases pass now. This time I made the logic more explicitly such that the invocation of UDF will be skipped only if all the fields in the input tuple are null. I am running full e2e test suites.
          Hide
          Cheolsoo Park added a comment -

          I fixed one more minor issue in the latest patch (PIG-3679-6.patch).

          If the output schema of UDF is tuple, POUserFunc should return a tuple whose fields are null instead of a null. I ran into this issue with "Scripting_5", and I fixed it in the new patch.

          All e2e test cases pass except a known failure (Warning_4). Ready for review.

          Show
          Cheolsoo Park added a comment - I fixed one more minor issue in the latest patch ( PIG-3679 -6.patch). If the output schema of UDF is tuple, POUserFunc should return a tuple whose fields are null instead of a null. I ran into this issue with "Scripting_5", and I fixed it in the new patch. All e2e test cases pass except a known failure (Warning_4). Ready for review.
          Hide
          Cheolsoo Park added a comment -
          Show
          Cheolsoo Park added a comment - RB link- https://reviews.apache.org/r/18525/
          Hide
          Rohini Palaniswamy added a comment -

          +1

          Show
          Rohini Palaniswamy added a comment - +1
          Hide
          Cheolsoo Park added a comment -

          Committed to trunk. Thank you Rohini for the review!

          Show
          Cheolsoo Park added a comment - Committed to trunk. Thank you Rohini for the review!
          Hide
          Aniket Mokashi added a comment -

          During investigation of PIG-3739, we identified that this still does not bring back the full backwards compatibility.

          Show
          Aniket Mokashi added a comment - During investigation of PIG-3739 , we identified that this still does not bring back the full backwards compatibility.
          Hide
          Aniket Mokashi added a comment -

          I tried fix as per Daniel's suggestion (PIG-3739), I have attached PIG-3679-backward-compatibility.patch with fix. Daniel Dai can you check e2e after this fix?

          Results of manual testing-

          Old counters (with pig11)

          Group Name Map Reduce Total
          org.​apache.​pig.​PigWarning UDF_WARNING_3 10989 0 10989
          org.​apache.​pig.​PigWarning UDF_WARNING_4 22 0 22

          New counters (with the fix)
          New counters

          Group Name Map Reduce Total
          org.​apache.​pig.​test.​udf.​evalfunc.​TestWarningFunc UDF_WARNING_3 10989 0 10989
          org.​apache.​pig.​test.​udf.​evalfunc.​TestWarningFunc UDF_WARNING_4 22 0 22
          Show
          Aniket Mokashi added a comment - I tried fix as per Daniel's suggestion ( PIG-3739 ), I have attached PIG-3679 -backward-compatibility.patch with fix. Daniel Dai can you check e2e after this fix? Results of manual testing- Old counters (with pig11) Group Name Map Reduce Total org.​apache.​pig.​PigWarning UDF_WARNING_3 10989 0 10989 org.​apache.​pig.​PigWarning UDF_WARNING_4 22 0 22 New counters (with the fix) New counters Group Name Map Reduce Total org.​apache.​pig.​test.​udf.​evalfunc.​TestWarningFunc UDF_WARNING_3 10989 0 10989 org.​apache.​pig.​test.​udf.​evalfunc.​TestWarningFunc UDF_WARNING_4 22 0 22
          Hide
          Cheolsoo Park added a comment -

          +1 to PIG-3679-backward-compatibility.patch. Thank you Aniket for fixing it.

          Show
          Cheolsoo Park added a comment - +1 to PIG-3679 -backward-compatibility.patch . Thank you Aniket for fixing it.
          Hide
          Cheolsoo Park added a comment -

          I just ran StreamingPythonUDFs_10. It passes since the issue was with a tuple with a single null field.

          Show
          Cheolsoo Park added a comment - I just ran StreamingPythonUDFs_10. It passes since the issue was with a tuple with a single null field.
          Hide
          Daniel Dai added a comment -

          +1

          Show
          Daniel Dai added a comment - +1
          Hide
          Cheolsoo Park added a comment -

          Committed the addendum patch to both trunk and branch-0.13.

          Show
          Cheolsoo Park added a comment - Committed the addendum patch to both trunk and branch-0.13.

            People

            • Assignee:
              Cheolsoo Park
              Reporter:
              Cheolsoo Park
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development