Pig
  1. Pig
  2. PIG-1478

Add progress notification listener to PigRunner API

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.8.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      PIG-1333 added PigRunner API to allow Pig users and tools to get a status/stats object back after executing a Pig script. The new API, however, is synchronous (blocking). It's known that a Pig script can spawn tens (even hundreds) MR jobs and take hours to complete. Therefore it'll be nice to give progress feedback to the callers during the execution.

      The proposal is to add an optional parameter to the API:

      public abstract class PigRunner {
          public static PigStats run(String[] args, PigProgressNotificationListener listener) {...}
      }
      

      The new listener is defined as following:

      package org.apache.pig.tools.pigstats;
      
      public interface PigProgressNotificationListener extends java.util.EventListener {
          // just before the launch of MR jobs for the script
          public void LaunchStartedNotification(int numJobsToLaunch);
          // number of jobs submitted in a batch
          public void jobsSubmittedNotification(int numJobsSubmitted);
          // a job is started
          public void jobStartedNotification(String assignedJobId);
          // a job is completed successfully
          public void jobFinishedNotification(JobStats jobStats);
          // a job is failed
          public void jobFailedNotification(JobStats jobStats);
          // a user output is completed successfully
          public void outputCompletedNotification(OutputStats outputStats);
          // updates the progress as percentage
          public void progressUpdatedNotification(int progress);
          // the script execution is done
          public void launchCompletedNotification(int numJobsSucceeded);
      }
      

      Any thoughts?

      1. PIG-1478.patch
        19 kB
        Richard Ding

        Activity

        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12448532/PIG-1478.patch
        against trunk revision 958666.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/336/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/336/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/336/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12448532/PIG-1478.patch against trunk revision 958666. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/336/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/336/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/336/console This message is automatically generated.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12448532/PIG-1478.patch
        against trunk revision 959865.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/358/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/358/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/358/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12448532/PIG-1478.patch against trunk revision 959865. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/358/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/358/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/358/console This message is automatically generated.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12448532/PIG-1478.patch
        against trunk revision 959865.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/337/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/337/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/337/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12448532/PIG-1478.patch against trunk revision 959865. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/337/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/337/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/337/console This message is automatically generated.
        Hide
        Richard Ding added a comment -

        Run core tests manually and they passed.

        Show
        Richard Ding added a comment - Run core tests manually and they passed.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12448532/PIG-1478.patch
        against trunk revision 960062.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/339/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/339/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/339/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12448532/PIG-1478.patch against trunk revision 960062. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/339/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/339/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/339/console This message is automatically generated.
        Hide
        Olga Natkovich added a comment -

        Richard, I think it would be good to provide more information regarding each API of the listener. For instance, does jobFinishedNotification would be called for all jobs or only for successful jobs?

        Also, I think Dmitry asked for this feature during the dev meeting. Dmitry, I think it would be good if you could review the API to make sure that it fits your needs.

        Show
        Olga Natkovich added a comment - Richard, I think it would be good to provide more information regarding each API of the listener. For instance, does jobFinishedNotification would be called for all jobs or only for successful jobs? Also, I think Dmitry asked for this feature during the dev meeting. Dmitry, I think it would be good if you could review the API to make sure that it fits your needs.
        Hide
        Alan Gates added a comment -

        I don't understand the difference between launchStartedNotification() and jobsSubmittedNotification().

        When will outputCompletedNotification() be called? Only after the job is completely done? What, if any, guarantees are we making on the order of this relative to when PigRunner.run returns?

        It isn't clear to me that launchCompleteNotification() is useful. Once the launch has completed the user will start getting jobStartedNotification() calls.

        Show
        Alan Gates added a comment - I don't understand the difference between launchStartedNotification() and jobsSubmittedNotification(). When will outputCompletedNotification() be called? Only after the job is completely done? What, if any, guarantees are we making on the order of this relative to when PigRunner.run returns? It isn't clear to me that launchCompleteNotification() is useful. Once the launch has completed the user will start getting jobStartedNotification() calls.
        Hide
        Richard Ding added a comment -

        I don't understand the difference between launchStartedNotification() and jobsSubmittedNotification().

        launchStartedNotification() tells the listeners the total number of jobs ready to submit for the script. jobsSubmittedNotification() tells the listeners the number of jobs submitted in a batch. Because of the dependency between jobs, Pig may not be able to submit all the jobs together. So the numJobsToLaunch passed to launchStartedNotification() should equal to the sum of numJobsSubmitted of all jobsSubmittedNotification() calls.

        When will outputCompletedNotification() be called? Only after the job is completely done? What, if any, guarantees are we making on the order of this relative to when PigRunner.run returns?

        outputCompletedNotification() is called after the job that writes this output is done. This is only called for user outputs. As a script can have multiple user outputs, some outputs may be written before all jobs are done.

        It isn't clear to me that launchCompleteNotification() is useful. Once the launch has completed the user will start getting jobStartedNotification() calls.

        Just try to be complete. launchCompleteNotification() is called when all jobs are done. If a script is executed successfully, the numJobsSucceeded should equal to the numJobsToLaunch from launchStartedNotification().

        An example log trace looks like this:

        ---- numJobsToLaunch: 3
        ---- jobs submitted: 1
        ---- progress: 0%
        ---- job started: job_20100702195434153_0002
        ---- progress: 16%
        ---- progress: 33%
        ---- job finished: job_20100702195434153_0002
        ---- jobs submitted: 1
        ---- job started: job_20100702195434153_0003
        ---- progress: 50%
        ---- progress: 66%
        ---- job finished: job_20100702195434153_0003
        ---- jobs submitted: 1
        ---- job started: job_20100702195434153_0004
        ---- progress: 83%
        ---- output done: hdfs://localhost.localdomain:52083/user/pig/myoutput
        ---- job finished: job_20100702195434153_0004
        ---- progress: 100%
        ---- numJobsSucceeded: 3
        
        Show
        Richard Ding added a comment - I don't understand the difference between launchStartedNotification() and jobsSubmittedNotification(). launchStartedNotification() tells the listeners the total number of jobs ready to submit for the script. jobsSubmittedNotification() tells the listeners the number of jobs submitted in a batch. Because of the dependency between jobs, Pig may not be able to submit all the jobs together. So the numJobsToLaunch passed to launchStartedNotification() should equal to the sum of numJobsSubmitted of all jobsSubmittedNotification() calls. When will outputCompletedNotification() be called? Only after the job is completely done? What, if any, guarantees are we making on the order of this relative to when PigRunner.run returns? outputCompletedNotification() is called after the job that writes this output is done. This is only called for user outputs. As a script can have multiple user outputs, some outputs may be written before all jobs are done. It isn't clear to me that launchCompleteNotification() is useful. Once the launch has completed the user will start getting jobStartedNotification() calls. Just try to be complete. launchCompleteNotification() is called when all jobs are done. If a script is executed successfully, the numJobsSucceeded should equal to the numJobsToLaunch from launchStartedNotification(). An example log trace looks like this: ---- numJobsToLaunch: 3 ---- jobs submitted: 1 ---- progress: 0% ---- job started: job_20100702195434153_0002 ---- progress: 16% ---- progress: 33% ---- job finished: job_20100702195434153_0002 ---- jobs submitted: 1 ---- job started: job_20100702195434153_0003 ---- progress: 50% ---- progress: 66% ---- job finished: job_20100702195434153_0003 ---- jobs submitted: 1 ---- job started: job_20100702195434153_0004 ---- progress: 83% ---- output done: hdfs: //localhost.localdomain:52083/user/pig/myoutput ---- job finished: job_20100702195434153_0004 ---- progress: 100% ---- numJobsSucceeded: 3
        Hide
        Dmitriy V. Ryaboy added a comment -

        This seems to fit the bill.

        Show
        Dmitriy V. Ryaboy added a comment - This seems to fit the bill.

          People

          • Assignee:
            Richard Ding
            Reporter:
            Richard Ding
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development