Uploaded image for project: 'DataFu'
  1. DataFu
  2. DATAFU-68

SampleByKey can throw NullPointerException

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.3.0
    • None

    Description

      I've noticed that SampleByKey can throw NullPointerException:

      Caused by: java.lang.NullPointerException
      	at datafu.pig.sampling.SampleByKey.setUDFContextSignature(SampleByKey.java:86)
      	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.setSignature(POUserFunc.java:604)
      	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.instantiateFunc(POUserFunc.java:127)
      	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.<init>(POUserFunc.java:122)
      	at org.apache.pig.newplan.logical.expression.ExpToPhyTranslationVisitor.visit(ExpToPhyTranslationVisitor.java:505)
      	at org.apache.pig.newplan.logical.expression.UserFuncExpression.accept(UserFuncExpression.java:112)
      	at org.apache.pig.newplan.ReverseDependencyOrderWalkerWOSeenChk.walk(ReverseDependencyOrderWalkerWOSeenChk.java:69)
      	at org.apache.pig.newplan.logical.relational.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:220)
      	at org.apache.pig.newplan.logical.relational.LOFilter.accept(LOFilter.java:79)
      	at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
      	at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
      	at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:310)
      	at org.apache.pig.PigServer.compilePp(PigServer.java:1380)
      	at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1305)
      	at org.apache.pig.PigServer.storeEx(PigServer.java:978)
      	at org.apache.pig.PigServer.store(PigServer.java:942)
      	at org.apache.pig.Pig
      

      I've reproduced the behaviour on old 1.1.0 version, but the UDF in question did not change much since then and hence I'm assuming that trunk will be affected the same way. Script that reproduces the issue is simple:

      grunt> DEFINE SampleByKey datafu.pig.sampling.SampleByKey('0.5'); 
      grunt> data = LOAD 'datafu/input_datafu' AS (A_id:chararray, B_id:chararray, C:int);
      grunt> out = FILTER data BY SampleByKey(A_id); 
      grunt> DUMP out;
      

      The problem seems to be that method setUDFContextSignature can be called with null argument that breaks our code. The documentation for this method is not specific whether null is or isn't allowed. I've looked into other UDFs in Pig and it seems that they are handling the case when signature is null and hence I've decided to fix SampleByKey as well.

      Attachments

        1. DATAFU-68.patch
          3 kB
          Jarek Jarcec Cecho
        2. DATAFU-68.patch
          3 kB
          Jarek Jarcec Cecho

        Issue Links

          Activity

            People

              jarcec Jarek Jarcec Cecho
              jarcec Jarek Jarcec Cecho
              Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: