Pig
  1. Pig
  2. PIG-255

Calling non default constructor of Final class from Main class in UDF

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.1.0
    • Component/s: None
    • Labels:
      None

      Description

      Pig supports the use of define to call a non default constructor. Making it work across Algebraic functions is not possible with the current code. The problem is once the func is defined to use a non default constructor which takes in names of the variables, we have no way of transmitting this information from the main class to the final class. We tried passing the func spec through the call to getFinal(). That is, What ever names we get in the main class we store it and when the getFinal method is called, instead of just passing the name of the Final class we attach the string args received by the main class to the name to construct a func spec. For ex. if define COV = Covariance('Population', 'Height'); Then we would have the "Population' & 'Height' stored in the main class. A call to getFinal would return Covariance$Final("Population", "Height") instead of just Covariance$Final. I guess this is the right way to go. However, pig has a problem with this. The resolveClassName method doesn't think of its args as specs and assumes them to be just names. So in createJar, when the func spec, Covariance$Final("Population", "Height") is being resolved it fails. I think this is an issue with pig and we need to resolve it by clipping the args before doing a resolveClassName.

      1. cons.patch
        0.8 kB
        Ajay Garg
      2. new.patch
        0.5 kB
        Ajay Garg
      3. test.patch
        6 kB
        Ajay Garg

        Activity

        Hide
        Olga Natkovich added a comment -

        Hi Ajay,

        I am trying to understand the issue that you are raising better. If I understand correctly, you want to pass arguments to your UDF via define statement and you want the same arguments to be used for all stages of algebraic function computation. Is this correct?

        I looked at the code and it was not immediately obvious to me why it would not work right now. We use the arguments in the constructor of the FuncEvalSpec so this data should be available to all functions. What am I missing?

        Also, a simple unit test that demonstrates the problem would be very helpful.

        Show
        Olga Natkovich added a comment - Hi Ajay, I am trying to understand the issue that you are raising better. If I understand correctly, you want to pass arguments to your UDF via define statement and you want the same arguments to be used for all stages of algebraic function computation. Is this correct? I looked at the code and it was not immediately obvious to me why it would not work right now. We use the arguments in the constructor of the FuncEvalSpec so this data should be available to all functions. What am I missing? Also, a simple unit test that demonstrates the problem would be very helpful.
        Hide
        Pi Song added a comment -

        Are you in pig-dev mailing-list? I think we have discussed something similar to this with Mathieu before. If not, you can search for the topic "[Pig Wiki] Update of PigMetaData by AlanGates".

        I may come up with a design pretty soon.

        Show
        Pi Song added a comment - Are you in pig-dev mailing-list? I think we have discussed something similar to this with Mathieu before. If not, you can search for the topic " [Pig Wiki] Update of PigMetaData by AlanGates". I may come up with a design pretty soon.
        Hide
        Ajay Garg added a comment -

        Hi,
        I am attaching two patch to explain and resolve this issue . First is test.patch which create TestUDF.java and modify build.xml to run this test case. run "ant udftest" to run this test case. In this test case I return Final.class.getName() + "(" schema name + ")" in getFinal() method. Calling final class with arguments should be right way doing but it gives following error.
        java.lang.ClassNotFoundException: Could not resolve org.apache.pig.test.TestUDF$Test$Final('a') using imports: [, org.apache.pig.builtin., com.yahoo.pig.yst.sds.ULT., org.apache.pig.impl.builtin.]

        The reason for this problem is that resolveClassName method doesn't think of its args as specs and assumes them to be just names. So in createJar, when the func spec TestUDF$Test$Final(a') is being resolved it fails.

        The second patch cons.patch resolve this problem. It finds class name in resovleClassName by searching for index of "(" which I think should be the correct way of resolving this .
        Please give your feedbacks .
        Thanks

        Show
        Ajay Garg added a comment - Hi, I am attaching two patch to explain and resolve this issue . First is test.patch which create TestUDF.java and modify build.xml to run this test case. run "ant udftest" to run this test case. In this test case I return Final.class.getName() + "(" schema name + ")" in getFinal() method. Calling final class with arguments should be right way doing but it gives following error. java.lang.ClassNotFoundException: Could not resolve org.apache.pig.test.TestUDF$Test$Final('a') using imports: [, org.apache.pig.builtin., com.yahoo.pig.yst.sds.ULT., org.apache.pig.impl.builtin.] The reason for this problem is that resolveClassName method doesn't think of its args as specs and assumes them to be just names. So in createJar, when the func spec TestUDF$Test$Final(a') is being resolved it fails. The second patch cons.patch resolve this problem. It finds class name in resovleClassName by searching for index of "(" which I think should be the correct way of resolving this . Please give your feedbacks . Thanks
        Hide
        Olga Natkovich added a comment -

        I am not sure this is the correct solution. My understanding is that arguments were suppose to be stripped before the name is passed to the resolveClassName.

        I applied your patch and ran the test but unfortunately it does not show the complete stack. Do you know the sequence of commands that causes resolveClassName to be called?

        Show
        Olga Natkovich added a comment - I am not sure this is the correct solution. My understanding is that arguments were suppose to be stripped before the name is passed to the resolveClassName. I applied your patch and ran the test but unfortunately it does not show the complete stack. Do you know the sequence of commands that causes resolveClassName to be called?
        Hide
        Pi Song added a comment -

        The information you mentioned can get derived at runtime after parsing stage just right before execution. We can build a small framework to allow passing parameters between plan operators.

        Show
        Pi Song added a comment - The information you mentioned can get derived at runtime after parsing stage just right before execution. We can build a small framework to allow passing parameters between plan operators.
        Hide
        Shravan Matthur Narayanamurthy added a comment -

        It was our mistake. We had not intended to change resolveClassName. We will be uploading a new patch.

        public Class getClassForAlias(String alias) throws IOException{
                String className, funcSpec = null;
                if (definedFunctions != null) {
                    funcSpec = definedFunctions.get(alias);
                }
                if (funcSpec != null) {
                    className = getClassNameFromSpec(funcSpec);
                }else{
                    className = alias;
                }
                return resolveClassName(className);
            }
        

        The fix we are proposing is here. When funcSpec==null, we set className=alias. But in our case when the final function uses the arg version , the alias is not just a class name but a funcSpec. So the if block should be

         if (funcSpec != null) {
                    className = getClassNameFromSpec(funcSpec);
                }else{
                    className = getClassNameFromSpec(alias);
                }
        

        Will be submitting a new patch with this.

        Show
        Shravan Matthur Narayanamurthy added a comment - It was our mistake. We had not intended to change resolveClassName. We will be uploading a new patch. public Class getClassForAlias(String alias) throws IOException{ String className, funcSpec = null; if (definedFunctions != null) { funcSpec = definedFunctions.get(alias); } if (funcSpec != null) { className = getClassNameFromSpec(funcSpec); }else{ className = alias; } return resolveClassName(className); } The fix we are proposing is here. When funcSpec==null, we set className=alias. But in our case when the final function uses the arg version , the alias is not just a class name but a funcSpec. So the if block should be if (funcSpec != null) { className = getClassNameFromSpec(funcSpec); }else{ className = getClassNameFromSpec(alias); } Will be submitting a new patch with this.
        Hide
        Ajay Garg added a comment -

        Attaching the patch (new.patch) for the modification explained by Shravan. Btw following is the stack trace of the error without patch.

        java.io.IOException: Could not resolve org.apache.pig.builtin.COR$Final('a','b','c') using imports: [, org.apache.pig.builtin., org.apache.pig.builtin.Math., com.yahoo.pig.yst.sds.ULT., org.apache.pig.impl.builtin.]
        at org.apache.pig.impl.util.WrappedIOException.wrap(WrappedIOException.java:16)
        at org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:428)
        at org.apache.pig.impl.PigContext.getClassForAlias(PigContext.java:513)
        at org.apache.pig.impl.util.JarManager.createJar(JarManager.java:109)
        at org.apache.pig.backend.hadoop.executionengine.mapreduceExec.MapReduceLauncher.launchPig(MapReduceLauncher.java:159)
        at org.apache.pig.backend.hadoop.executionengine.POMapreduce.open(POMapreduce.java:185)
        at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:275)
        at org.apache.pig.PigServer.optimizeAndRunQuery(PigServer.java:413)
        at org.apache.pig.PigServer.openIterator(PigServer.java:332)
        at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:265)
        at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:162)
        at org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:73)
        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
        at org.apache.pig.Main.main(Main.java:273)
        Caused by: java.lang.ClassNotFoundException: Could not resolve org.apache.pig.builtin.COR$Final('a','b','c') using imports: [, org.apache.pig.builtin., org.apache.pig.builtin.Math., com.yahoo.pig.yst.sds.ULT., org.apache.pig.impl.builtin.]
        at org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:427)

        Show
        Ajay Garg added a comment - Attaching the patch (new.patch) for the modification explained by Shravan. Btw following is the stack trace of the error without patch. java.io.IOException: Could not resolve org.apache.pig.builtin.COR$Final('a','b','c') using imports: [, org.apache.pig.builtin., org.apache.pig.builtin.Math., com.yahoo.pig.yst.sds.ULT., org.apache.pig.impl.builtin.] at org.apache.pig.impl.util.WrappedIOException.wrap(WrappedIOException.java:16) at org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:428) at org.apache.pig.impl.PigContext.getClassForAlias(PigContext.java:513) at org.apache.pig.impl.util.JarManager.createJar(JarManager.java:109) at org.apache.pig.backend.hadoop.executionengine.mapreduceExec.MapReduceLauncher.launchPig(MapReduceLauncher.java:159) at org.apache.pig.backend.hadoop.executionengine.POMapreduce.open(POMapreduce.java:185) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:275) at org.apache.pig.PigServer.optimizeAndRunQuery(PigServer.java:413) at org.apache.pig.PigServer.openIterator(PigServer.java:332) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:265) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:162) at org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:73) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54) at org.apache.pig.Main.main(Main.java:273) Caused by: java.lang.ClassNotFoundException: Could not resolve org.apache.pig.builtin.COR$Final('a','b','c') using imports: [, org.apache.pig.builtin., org.apache.pig.builtin.Math., com.yahoo.pig.yst.sds.ULT., org.apache.pig.impl.builtin.] at org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:427)
        Hide
        Olga Natkovich added a comment -

        patch committed

        Show
        Olga Natkovich added a comment - patch committed

          People

          • Assignee:
            Ajay Garg
            Reporter:
            Ajay Garg
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development