Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
I've noticed that SampleByKey can throw NullPointerException:
Caused by: java.lang.NullPointerException at datafu.pig.sampling.SampleByKey.setUDFContextSignature(SampleByKey.java:86) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.setSignature(POUserFunc.java:604) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.instantiateFunc(POUserFunc.java:127) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.<init>(POUserFunc.java:122) at org.apache.pig.newplan.logical.expression.ExpToPhyTranslationVisitor.visit(ExpToPhyTranslationVisitor.java:505) at org.apache.pig.newplan.logical.expression.UserFuncExpression.accept(UserFuncExpression.java:112) at org.apache.pig.newplan.ReverseDependencyOrderWalkerWOSeenChk.walk(ReverseDependencyOrderWalkerWOSeenChk.java:69) at org.apache.pig.newplan.logical.relational.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:220) at org.apache.pig.newplan.logical.relational.LOFilter.accept(LOFilter.java:79) at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:310) at org.apache.pig.PigServer.compilePp(PigServer.java:1380) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1305) at org.apache.pig.PigServer.storeEx(PigServer.java:978) at org.apache.pig.PigServer.store(PigServer.java:942) at org.apache.pig.Pig
I've reproduced the behaviour on old 1.1.0 version, but the UDF in question did not change much since then and hence I'm assuming that trunk will be affected the same way. Script that reproduces the issue is simple:
grunt> DEFINE SampleByKey datafu.pig.sampling.SampleByKey('0.5'); grunt> data = LOAD 'datafu/input_datafu' AS (A_id:chararray, B_id:chararray, C:int); grunt> out = FILTER data BY SampleByKey(A_id); grunt> DUMP out;
The problem seems to be that method setUDFContextSignature can be called with null argument that breaks our code. The documentation for this method is not specific whether null is or isn't allowed. I've looked into other UDFs in Pig and it seems that they are handling the case when signature is null and hence I've decided to fix SampleByKey as well.
Attachments
Attachments
Issue Links
- links to