Pig
  1. Pig
  2. PIG-1479

Embed Pig in scripting languages

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.9.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      It should be possible to embed Pig calls in a scripting language and let functions defined in the same script available as UDFs.
      This is a spin off of https://issues.apache.org/jira/browse/PIG-928 which lets users define UDFs in scripting languages.

      1. PIG-1479_2.patch
        23 kB
        Richard Ding
      2. PIG-1479_3.patch
        58 kB
        Richard Ding
      3. PIG-1479_4.patch
        145 kB
        Richard Ding
      4. PIG-1479_5.patch
        150 kB
        Richard Ding
      5. PIG-1479_6.patch
        153 kB
        Richard Ding
      6. PIG-1479.patch
        16 kB
        Richard Ding
      7. pig-greek.tgz
        5.62 MB
        Julien Le Dem
      8. pig-greek-test.tar
        10 kB
        Richard Ding
      9. pig-greek-test.tar
        10 kB
        Richard Ding

        Issue Links

          Activity

          Hide
          Julien Le Dem added a comment -

          See: https://issues.apache.org/jira/browse/PIG-928

          To run the example (assuming javac, jar and java are in your PATH):

          • tar xzvf pig-greek.tgz
          • add pig-0.6.0-core.jar to the lib folder
          • ./makejar.sh
          • ./runme.sh

          This contains a generic base class and a Python implementation.

          To implement other scripting languages, extend org.apache.pig.greek.ScriptEngine

          Show
          Julien Le Dem added a comment - See: https://issues.apache.org/jira/browse/PIG-928 To run the example (assuming javac, jar and java are in your PATH): tar xzvf pig-greek.tgz add pig-0.6.0-core.jar to the lib folder ./makejar.sh ./runme.sh This contains a generic base class and a Python implementation. To implement other scripting languages, extend org.apache.pig.greek.ScriptEngine
          Hide
          Richard Ding added a comment -

          Thanks Julien. I rebased the patch with the latest trunk and added an option (-greek) in the Main class.

          Now one can run a "PIG-Greek" script with following command:

          java -cp pig.jar:<jython jar>:<hadoop config dir> org.apache.pig.Main -g <pig-greek script>
          

          or in local mode:

          java -cp pig.jar:<jython jar> org.apache.pig.Main -x local -g <pig-greek script>
          
          Show
          Richard Ding added a comment - Thanks Julien. I rebased the patch with the latest trunk and added an option (-greek) in the Main class. Now one can run a "PIG-Greek" script with following command: java -cp pig.jar:<jython jar>:<hadoop config dir> org.apache.pig.Main -g <pig-greek script> or in local mode: java -cp pig.jar:<jython jar> org.apache.pig.Main -x local -g <pig-greek script>
          Hide
          Julien Le Dem added a comment -

          Thanks Richard!

          Show
          Julien Le Dem added a comment - Thanks Richard!
          Hide
          Richard Ding added a comment -

          In the previous patch, the executeScript method on ScriptPigServer returns a list of ExecJobs (one for each store statement in the script). Unfortunately, the order of ExecJobs in the list is indeterminate.

          This patch fixes this problem by making the executeScript method return a PigStats object. One then can retrieves the output result by the alias corresponding to store statement.

          Here is a example:

          P = pig.executeScript("""
                  A = load '${input}';
                  ... ...
                  store G into '${output}'; """)
          
          output = P.result("G")  # an OutputStats object
          iter = output.iterator()
          if iter.hasNext():
                  # do something
          else:
                  # do something else
          
          Show
          Richard Ding added a comment - In the previous patch, the executeScript method on ScriptPigServer returns a list of ExecJobs (one for each store statement in the script). Unfortunately, the order of ExecJobs in the list is indeterminate. This patch fixes this problem by making the executeScript method return a PigStats object. One then can retrieves the output result by the alias corresponding to store statement. Here is a example: P = pig.executeScript(""" A = load '${input}'; ... ... store G into '${output}'; """) output = P.result( "G" ) # an OutputStats object iter = output.iterator() if iter.hasNext(): # do something else : # do something else
          Hide
          Richard Ding added a comment -

          Attach the updated test program from Julien.

          To run the example:

          • tar -xvf pig-greek-test.tar
          • java -cp pig.jar:<jython jar> org.apache.pig.Main -x local -g script/tc.py
          Show
          Richard Ding added a comment - Attach the updated test program from Julien. To run the example: tar -xvf pig-greek-test.tar java -cp pig.jar:<jython jar> org.apache.pig.Main -x local -g script/tc.py
          Hide
          Julien Le Dem added a comment -

          The -g parameter on the command line should take two parameters, the scripting implementation instance name and the script itself.
          That way we can have several scripting implementations.

          java -cp pig.jar:<jython jar> org.apache.pig.Main -x local -g jython script/tc.py
          
                  case GREEK: {       
                      ScriptEngine scriptEngine = ScriptEngine.getInstance(instanceName);
                      scriptEngine.run(new PigServer(pigContext), file);
                      return ReturnCode.SUCCESS;
                  }
          
          Show
          Julien Le Dem added a comment - The -g parameter on the command line should take two parameters, the scripting implementation instance name and the script itself. That way we can have several scripting implementations. java -cp pig.jar:<jython jar> org.apache.pig.Main -x local -g jython script/tc.py case GREEK: { ScriptEngine scriptEngine = ScriptEngine.getInstance(instanceName); scriptEngine.run( new PigServer(pigContext), file); return ReturnCode.SUCCESS; }
          Hide
          Julien Le Dem added a comment -

          The end of loop condition in the script can just test for to_join_n emptiness. It was testing both because it did not know which one was to_join_n.

          if (not P.result("to_join_n").iterator().hasNext()):
          
          Show
          Julien Le Dem added a comment - The end of loop condition in the script can just test for to_join_n emptiness. It was testing both because it did not know which one was to_join_n. if (not P.result( "to_join_n" ).iterator().hasNext()):
          Hide
          Richard Ding added a comment -

          Attach the test script modified based on Julien's comment. As for commend line option -g, it can also use one parameter (script file name) and let Pig determine the script engine by the file extension.

          Show
          Richard Ding added a comment - Attach the test script modified based on Julien's comment. As for commend line option -g, it can also use one parameter (script file name) and let Pig determine the script engine by the file extension.
          Hide
          Julien Le Dem added a comment -

          Using the file extension requires a registration mechanism (or hard coded list) so if it is supported it would be nice to be able to provide the class name of the scripting implementation as well.
          I would like to use my own implementation of the scripting engine (let's say javascript) by specifying the class name in the command line.
          similar to the mecanism for UDFs inclusion:
          http://wiki.apache.org/pig/UDFsUsingScriptingLanguages

          Register 'test.py' using org.apache.pig.scripting.jython.JythonScriptEngine as myfuncs;

          Show
          Julien Le Dem added a comment - Using the file extension requires a registration mechanism (or hard coded list) so if it is supported it would be nice to be able to provide the class name of the scripting implementation as well. I would like to use my own implementation of the scripting engine (let's say javascript) by specifying the class name in the command line. similar to the mecanism for UDFs inclusion: http://wiki.apache.org/pig/UDFsUsingScriptingLanguages Register 'test.py' using org.apache.pig.scripting.jython.JythonScriptEngine as myfuncs;
          Hide
          Richard Ding added a comment -

          Alan has posted a proposal that includes embedding Pig in scripting language on Pig wiki: http://wiki.apache.org/pig/TuringCompletePig. The proposal is based on the implementation here via a JDBC like compile, bind, run model.

          Show
          Richard Ding added a comment - Alan has posted a proposal that includes embedding Pig in scripting language on Pig wiki: http://wiki.apache.org/pig/TuringCompletePig . The proposal is based on the implementation here via a JDBC like compile, bind, run model.
          Hide
          Richard Ding added a comment -

          Attaching the initial patch that aims to implement the embedding part of the above proposal.

          Notes about the patch:

          • Pig executes the top-level Jython statements in the script, no need to write a main() function.
          • You can invoke a Jython script from the command line the same way as you invoke a standard Pig script as long as the first line of the script is #! /usr/bin/python.

          Example:

          java -cp jython.jar:pig.jar myscript.py
          
          • The run method on ScriptEngine returns a Map<String, PigStats>, with one entry for each runtime Pig pipeline. For named pipeline, the map key is the given pipeline name.
          • The proposed API is implemented in two classes: ScriptPigServer and PigPipeline.
          • The compile method now is a no-op, will be implemented later.
          Show
          Richard Ding added a comment - Attaching the initial patch that aims to implement the embedding part of the above proposal. Notes about the patch: Pig executes the top-level Jython statements in the script, no need to write a main() function. You can invoke a Jython script from the command line the same way as you invoke a standard Pig script as long as the first line of the script is #! /usr/bin/python . Example: java -cp jython.jar:pig.jar myscript.py The run method on ScriptEngine returns a Map<String, PigStats>, with one entry for each runtime Pig pipeline. For named pipeline, the map key is the given pipeline name. The proposed API is implemented in two classes: ScriptPigServer and PigPipeline . The compile method now is a no-op, will be implemented later.
          Hide
          Julien Le Dem added a comment -

          Hi Richard,
          Some comments about PIG-1479_3.patch:

          • The ScriptEngine implementations that can be used are still hardwired. As a user I would want to add a parameter to the command line to use my own (adding it to the classpath and providing the class name). For example I'm working on a javascript implementation for Pig-Greek. Currently I have no way of using it without modifying Pig's code.
          • I like to not have to define a main() function for the top level code, however using regular expressions to separate functions from the main code seems at high risk of not working in many cases (in JythonScriptEngine.getFunctions(InputStream)). It would be better to trust an actual Python parser or to leave it as is: requiring a main() function.
          Show
          Julien Le Dem added a comment - Hi Richard, Some comments about PIG-1479 _3.patch: The ScriptEngine implementations that can be used are still hardwired. As a user I would want to add a parameter to the command line to use my own (adding it to the classpath and providing the class name). For example I'm working on a javascript implementation for Pig-Greek. Currently I have no way of using it without modifying Pig's code. I like to not have to define a main() function for the top level code, however using regular expressions to separate functions from the main code seems at high risk of not working in many cases (in JythonScriptEngine.getFunctions(InputStream)). It would be better to trust an actual Python parser or to leave it as is: requiring a main() function.
          Hide
          Richard Ding added a comment -

          Thanks Julien.

          As for the second comment, there is a third option, namely separating frontend (control flow code) from backend (scripting UDFs) by putting them in different files, and requires control flow writer to explicitly register UDFs in his/her script. For example, in control flow file script.py:

          pig.registerUDF("myudfs.py", "mynamespace")
          
          # control flow and PIG pipelines that use UDFs defined in myudfs.py
          

          The advantage of this is that only UDF files are shipped to the backend while control flow file (and its dependencies) remains in front end. Obviously, the disadvantage is that you can't put everything in one file.

          Show
          Richard Ding added a comment - Thanks Julien. As for the second comment, there is a third option, namely separating frontend (control flow code) from backend (scripting UDFs) by putting them in different files, and requires control flow writer to explicitly register UDFs in his/her script. For example, in control flow file script.py: pig.registerUDF( "myudfs.py" , "mynamespace" ) # control flow and PIG pipelines that use UDFs defined in myudfs.py The advantage of this is that only UDF files are shipped to the backend while control flow file (and its dependencies) remains in front end. Obviously, the disadvantage is that you can't put everything in one file.
          Hide
          Richard Ding added a comment -

          Attaching patch that addresses above comments:

          • One can use --embedded option to specify his/her favorite script engine classname or keyword. For example
          java -cp pig.jar:jython.jar org.apache.pig.Main --embedded jython myscript.py
          
          • Implemented the proposed approach of separating frontend control flow script from backend UDF in scripting language. One needs to explicitly register UDF in Pig Latin or embedded Pig.
          • Both compile() and bind() methods return objects. So one can write code in jython script like this:
          results = pig.compile("<Pig Latin>").bind({param:value, ...}).run()
          
          • One can also run embedded scripts using PigRunner.
          Show
          Richard Ding added a comment - Attaching patch that addresses above comments: One can use --embedded option to specify his/her favorite script engine classname or keyword. For example java -cp pig.jar:jython.jar org.apache.pig.Main --embedded jython myscript.py Implemented the proposed approach of separating frontend control flow script from backend UDF in scripting language. One needs to explicitly register UDF in Pig Latin or embedded Pig. Both compile() and bind() methods return objects. So one can write code in jython script like this: results = pig.compile( "<Pig Latin>" ).bind({param:value, ...}).run() One can also run embedded scripts using PigRunner.
          Hide
          Alan Gates added a comment -

          Comments and questions:

          This patch makes changes to the public interface PigProgressNotificationListener. It's ok, since it's marked evolving. Do we know how many people are using this and what we'll need to do to mitigate the changes for them?

          PigPipeline needs better javadoc comments at the class level. The current javadocs confuse it with the defined Pig class.

          Rather than the Pig class detailed in the design doc this patch has ScriptPigServer, which has a slightly different interface. Does this represent a change to the design or is there a yet to be built Pig class?

          Do we need two classes BoundPipeline and MultiBoundPipeline? Could we instead have just BoundPipeline, and then for each run method there would be:

          public List<PigStats> run()
          public PigStats runSingle() {
              if (multijob) throw ...
              return run().get(0);
          }
          

          Then run is a valid call whether this is a single or multi-job situation, which means users don't have to write their code differently in situations where they are using both single and multi-job binds. In simple cases where users know they only have one thing bound they can use the simpler runSingle call. Calling runSingle when multiple things are bound would be an error.

          We need to mark the availability and stability of the ScriptEngine interface. I suspect it is Public Evolving.

          Show
          Alan Gates added a comment - Comments and questions: This patch makes changes to the public interface PigProgressNotificationListener. It's ok, since it's marked evolving. Do we know how many people are using this and what we'll need to do to mitigate the changes for them? PigPipeline needs better javadoc comments at the class level. The current javadocs confuse it with the defined Pig class. Rather than the Pig class detailed in the design doc this patch has ScriptPigServer, which has a slightly different interface. Does this represent a change to the design or is there a yet to be built Pig class? Do we need two classes BoundPipeline and MultiBoundPipeline? Could we instead have just BoundPipeline, and then for each run method there would be: public List<PigStats> run() public PigStats runSingle() { if (multijob) throw ... return run().get(0); } Then run is a valid call whether this is a single or multi-job situation, which means users don't have to write their code differently in situations where they are using both single and multi-job binds. In simple cases where users know they only have one thing bound they can use the simpler runSingle call. Calling runSingle when multiple things are bound would be an error. We need to mark the availability and stability of the ScriptEngine interface. I suspect it is Public Evolving.
          Hide
          Richard Ding added a comment -

          Thanks Alan,

          This patch makes changes to the public interface PigProgressNotificationListener. It's ok, since it's marked evolving. Do we know how many people are using this and what we'll need to do to mitigate the changes for them?

          This interface is available only in Pig 0.8 which is just ready to release. So not many people are using it. On the other hand it's too late to get into 0.8. The reason for the change is that the embedded script could contain multiple Pig scripts and Pig runtime needs to tell users from which script they get the notification.

          PigPipeline needs better javadoc comments at the class level. The current javadocs confuse it with the defined Pig class.

          Will do.

          Rather than the Pig class detailed in the design doc this patch has ScriptPigServer, which has a slightly different interface. Does this represent a change to the design or is there a yet to be built Pig class?

          The patch breaks the Pig class interface into several class: ScriptPigServer to register or define in global scope, to compile a Pig Latin script into a PigPipeline object. PigPipeline binds a set of variables and generates a BoundPigline object which then runs the bound pipeline. Embedded script writers will have access to a ScriptPigServer object called "pig" in the script.

          Do we need two classes BoundPipeline and MultiBoundPipeline? Could we instead have just BoundPipeline, and then for each run method there would be: ...

          I went back and forth between these two approaches. I'm fine with a single BoundPipeline class with two different run/runSingle method.

          We need to mark the availability and stability of the ScriptEngine interface. I suspect it is Public Evolving.

          Will do.

          Show
          Richard Ding added a comment - Thanks Alan, This patch makes changes to the public interface PigProgressNotificationListener. It's ok, since it's marked evolving. Do we know how many people are using this and what we'll need to do to mitigate the changes for them? This interface is available only in Pig 0.8 which is just ready to release. So not many people are using it. On the other hand it's too late to get into 0.8. The reason for the change is that the embedded script could contain multiple Pig scripts and Pig runtime needs to tell users from which script they get the notification. PigPipeline needs better javadoc comments at the class level. The current javadocs confuse it with the defined Pig class. Will do. Rather than the Pig class detailed in the design doc this patch has ScriptPigServer, which has a slightly different interface. Does this represent a change to the design or is there a yet to be built Pig class? The patch breaks the Pig class interface into several class: ScriptPigServer to register or define in global scope, to compile a Pig Latin script into a PigPipeline object. PigPipeline binds a set of variables and generates a BoundPigline object which then runs the bound pipeline. Embedded script writers will have access to a ScriptPigServer object called "pig" in the script. Do we need two classes BoundPipeline and MultiBoundPipeline? Could we instead have just BoundPipeline, and then for each run method there would be: ... I went back and forth between these two approaches. I'm fine with a single BoundPipeline class with two different run/runSingle method. We need to mark the availability and stability of the ScriptEngine interface. I suspect it is Public Evolving. Will do.
          Hide
          Julien Le Dem added a comment -

          Hi Richard,
          Thank you for the updated patch.
          Follow my comments, all related to usability:

          • Pig script invocation
            The main invocation mechanism is as follows:
            results = pig.compile("<Pig Latin>").bind({param:value, ...}).run()
            

            I was proposing to also bind variables automatically to local variables in the current scope.

            results = pig.compile("<Pig Latin>").bindToLocal().run()
            

            or more simply

            results = pig.run("<Pig Latin>")
            

            (as implemented in the original submission)
            I understand that all languages may not allow that, but all scripting languages I can think of allow it. Only compiled languages strip variable names. This could be optional for the implementation.
            If the bind() step is usefull in some situations and is more generic, it is not the most frequent use case.
            Implicit binding to local variables is an important feature. As the Pig script is embedded in a particular context, in most use cases the parameters will have the same name than the local variables used to populate them.
            The goal is to embed Pig, making the integration seemless. Most cases won't need the indirection to have different parameter names from local variables, making it a burden for the developper.

          • Ability to have the main program and the UDFs in the same script
            This was the main reason I started this work. The goal was to have everything in one script. The fact that the UDFs are run on the slaves should not force the user to put them in a separate file. The main goal is to have the entire algorithm in the same place without arbitrary separations like this one.
            When putting in the balance having a main() function vs not being able to have UDFs in the same file I will definitly choose to have a main() function.
            Just embedding Pig without having UDFs in the same file is not very different from running the Pig command line from a script.
          Show
          Julien Le Dem added a comment - Hi Richard, Thank you for the updated patch. Follow my comments, all related to usability: Pig script invocation The main invocation mechanism is as follows: results = pig.compile( "<Pig Latin>" ).bind({param:value, ...}).run() I was proposing to also bind variables automatically to local variables in the current scope. results = pig.compile( "<Pig Latin>" ).bindToLocal().run() or more simply results = pig.run( "<Pig Latin>" ) (as implemented in the original submission) I understand that all languages may not allow that, but all scripting languages I can think of allow it. Only compiled languages strip variable names. This could be optional for the implementation. If the bind() step is usefull in some situations and is more generic, it is not the most frequent use case. Implicit binding to local variables is an important feature. As the Pig script is embedded in a particular context, in most use cases the parameters will have the same name than the local variables used to populate them. The goal is to embed Pig, making the integration seemless. Most cases won't need the indirection to have different parameter names from local variables, making it a burden for the developper. Ability to have the main program and the UDFs in the same script This was the main reason I started this work. The goal was to have everything in one script. The fact that the UDFs are run on the slaves should not force the user to put them in a separate file. The main goal is to have the entire algorithm in the same place without arbitrary separations like this one. When putting in the balance having a main() function vs not being able to have UDFs in the same file I will definitly choose to have a main() function. Just embedding Pig without having UDFs in the same file is not very different from running the Pig command line from a script.
          Hide
          Julien Le Dem added a comment -

          another possibility would to have scripts writtent in the following way:

          def udf1()
             ...
          
          def udf2()
             ...
          
          def main()
             ...
          
          if __name__ == "__main__":
              main()
          

          See: http://docs.python.org/library/__main__.html

          Show
          Julien Le Dem added a comment - another possibility would to have scripts writtent in the following way: def udf1() ... def udf2() ... def main() ... if __name__ == "__main__" : main() See: http://docs.python.org/library/__main__.html
          Hide
          Richard Ding added a comment -

          Thanks Julien. How about the following proposal?

          Pig script invocation:

          Pig will use the bind() method to implicitly bind variables to local variables in the current scope. It'll do an implicit mapping of variables in the host language to parameters in Pig Latin:

          results = pig.compile("<Pig Latin>").bind().run()
          

          Ability to have the control flow program and the UDFs in the same script:

          I agree that it's good to have everything in one script. Since I can't think of a way to only execute functions in python, I'll go back to use a simple parser to separate functions and control flow program so that UDFs can be registered before the control flow program runs.

          A related issue is the python IMPORT statements. Users will be responsible to ship the imported modules to the backend servers. Pig won't automatically resolve the module paths and ship the files to the backend.

          Show
          Richard Ding added a comment - Thanks Julien. How about the following proposal? Pig script invocation: Pig will use the bind() method to implicitly bind variables to local variables in the current scope. It'll do an implicit mapping of variables in the host language to parameters in Pig Latin: results = pig.compile( "<Pig Latin>" ).bind().run() Ability to have the control flow program and the UDFs in the same script: I agree that it's good to have everything in one script. Since I can't think of a way to only execute functions in python, I'll go back to use a simple parser to separate functions and control flow program so that UDFs can be registered before the control flow program runs. A related issue is the python IMPORT statements. Users will be responsible to ship the imported modules to the backend servers. Pig won't automatically resolve the module paths and ship the files to the backend.
          Hide
          Alan Gates added a comment -

          +1 to using a fuzzy parser. I agree that being able to have the Python UDFs in the same file is important, and in user reviews others have voiced the same opinion. But forcing Python users to have a main function is going to seem very unnatural to them. So I think the fuzzy parsing is the best compromise.

          Show
          Alan Gates added a comment - +1 to using a fuzzy parser. I agree that being able to have the Python UDFs in the same file is important, and in user reviews others have voiced the same opinion. But forcing Python users to have a main function is going to seem very unnatural to them. So I think the fuzzy parsing is the best compromise.
          Hide
          Richard Ding added a comment -

          Based on the feedback, the new patch contains the following changes:

          • Support the main program and the UDFs in the same script. However, when mixing jython functions with top level control flow code, the script must use the idiomatic "conditional script" stanza:
          def udf1()
             ...
          
          def udf2()
             ...
          
          if __name__ == '__main__':
              # control flow code
          
          • Support explicit registering scripting UDFs:
          Pig.registerUDF("udfs.py", "")
          
          # control flow code
          
          from org.apache.pig.scripting import Pig
          ... ...
          results = Pig.compile("<Pig Latin>").bind().run()
          
          Show
          Richard Ding added a comment - Based on the feedback, the new patch contains the following changes: Support the main program and the UDFs in the same script. However, when mixing jython functions with top level control flow code, the script must use the idiomatic "conditional script" stanza: def udf1() ... def udf2() ... if __name__ == '__main__': # control flow code Support explicit registering scripting UDFs: Pig.registerUDF( "udfs.py" , "") # control flow code Confirm Pig scripting API to the specification: http://wiki.apache.org/pig/TuringCompletePig . The main change is that the scripts now need explicitly import the Pig class: from org.apache.pig.scripting import Pig ... ... results = Pig.compile( "<Pig Latin>" ).bind().run()
          Hide
          Alan Gates added a comment -

          Latest patch looks good. I just have one question. Why do we need the synchronous implementation of PigProgressNotificationListener (SyncProgressNotificationAdaptor)? In what case do we expect Pig to be notifying in parallel? I am assuming that we want to allow user scripts to be multi-threaded, but do we expect multiple threads to use the same PigProgressNotificationListener?

          Show
          Alan Gates added a comment - Latest patch looks good. I just have one question. Why do we need the synchronous implementation of PigProgressNotificationListener (SyncProgressNotificationAdaptor)? In what case do we expect Pig to be notifying in parallel? I am assuming that we want to allow user scripts to be multi-threaded, but do we expect multiple threads to use the same PigProgressNotificationListener?
          Hide
          Richard Ding added a comment -

          It is for parallel execution of a pipeline. User registers listener through PigRunner API:

          public static PigStats run(String[] args, PigProgressNotificationListener listener) ;
          

          It's expected that the same listener is used by all the threads (each executes an instance of the pipeline) in parallel.

          Show
          Richard Ding added a comment - It is for parallel execution of a pipeline. User registers listener through PigRunner API: public static PigStats run( String [] args, PigProgressNotificationListener listener) ; It's expected that the same listener is used by all the threads (each executes an instance of the pipeline) in parallel.
          Hide
          Julien Le Dem added a comment -

          I have reviewed the patch.
          The latest changes look good to me.
          Thanks Richard!

          Show
          Julien Le Dem added a comment - I have reviewed the patch. The latest changes look good to me. Thanks Richard!
          Hide
          Richard Ding added a comment -

          Minor changes to fix a couple of findbugs warnings. Rerun the test-patch:

               [exec] -1 overall.  
               [exec] 
               [exec]     +1 @author.  The patch does not contain any @author tags.
               [exec] 
               [exec]     +1 tests included.  The patch appears to include 6 new or modified tests.
               [exec] 
               [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
               [exec] 
               [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
               [exec] 
               [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
               [exec] 
               [exec]     -1 release audit.  The applied patch generated 477 release audit warnings (more than the trunk's current 467 warnings).
          

          Release audit warnings are all html related.

          Unit tests passed.

          Show
          Richard Ding added a comment - Minor changes to fix a couple of findbugs warnings. Rerun the test-patch: [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 6 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] -1 release audit. The applied patch generated 477 release audit warnings (more than the trunk's current 467 warnings). Release audit warnings are all html related. Unit tests passed.
          Hide
          Richard Ding added a comment -

          The latest patch (PIG-1479_6) committed to trunk.

          Show
          Richard Ding added a comment - The latest patch ( PIG-1479 _6) committed to trunk.

            People

            • Assignee:
              Richard Ding
              Reporter:
              Julien Le Dem
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development