Uploaded image for project: 'DataFu'
  1. DataFu
  2. DATAFU-68

SampleByKey can throw NullPointerException

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.3.0
    • Labels:
      None

      Description

      I've noticed that SampleByKey can throw NullPointerException:

      Caused by: java.lang.NullPointerException
      	at datafu.pig.sampling.SampleByKey.setUDFContextSignature(SampleByKey.java:86)
      	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.setSignature(POUserFunc.java:604)
      	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.instantiateFunc(POUserFunc.java:127)
      	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.<init>(POUserFunc.java:122)
      	at org.apache.pig.newplan.logical.expression.ExpToPhyTranslationVisitor.visit(ExpToPhyTranslationVisitor.java:505)
      	at org.apache.pig.newplan.logical.expression.UserFuncExpression.accept(UserFuncExpression.java:112)
      	at org.apache.pig.newplan.ReverseDependencyOrderWalkerWOSeenChk.walk(ReverseDependencyOrderWalkerWOSeenChk.java:69)
      	at org.apache.pig.newplan.logical.relational.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:220)
      	at org.apache.pig.newplan.logical.relational.LOFilter.accept(LOFilter.java:79)
      	at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
      	at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
      	at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:310)
      	at org.apache.pig.PigServer.compilePp(PigServer.java:1380)
      	at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1305)
      	at org.apache.pig.PigServer.storeEx(PigServer.java:978)
      	at org.apache.pig.PigServer.store(PigServer.java:942)
      	at org.apache.pig.Pig
      

      I've reproduced the behaviour on old 1.1.0 version, but the UDF in question did not change much since then and hence I'm assuming that trunk will be affected the same way. Script that reproduces the issue is simple:

      grunt> DEFINE SampleByKey datafu.pig.sampling.SampleByKey('0.5'); 
      grunt> data = LOAD 'datafu/input_datafu' AS (A_id:chararray, B_id:chararray, C:int);
      grunt> out = FILTER data BY SampleByKey(A_id); 
      grunt> DUMP out;
      

      The problem seems to be that method setUDFContextSignature can be called with null argument that breaks our code. The documentation for this method is not specific whether null is or isn't allowed. I've looked into other UDFs in Pig and it seems that they are handling the case when signature is null and hence I've decided to fix SampleByKey as well.

        Attachments

        1. DATAFU-68.patch
          3 kB
          Jarek Jarcec Cecho
        2. DATAFU-68.patch
          3 kB
          Jarek Jarcec Cecho

          Issue Links

            Activity

              People

              • Assignee:
                jarcec Jarek Jarcec Cecho
                Reporter:
                jarcec Jarek Jarcec Cecho
              • Votes:
                1 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: