Pig
  1. Pig
  2. PIG-1874

Make PigServer work in a multithreading environment

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.8.0
    • Fix Version/s: 0.9.0
    • Component/s: impl
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      This means that PigServers should work if one creates separate PigServer instances for each thread (PigServers are not synchronized).

      1. PIG-1874.patch
        10 kB
        Richard Ding
      2. PIG-1874_1.patch
        15 kB
        Richard Ding

        Activity

        Hide
        Santhosh Srinivasan added a comment -

        +1

        Show
        Santhosh Srinivasan added a comment - +1
        Hide
        Richard Ding added a comment -

        Attaching patch for review.

        This patch removed the static variables from PigServer and PigContext classes. It also made UDFContext instance thread local.

        To avoid sharing PigContext object, users should use following constructors to create PigServer instance in each thread:

        public PigServer(ExecType execType) throws ExecException;
        
        public PigServer(ExecType execType, Properties properties) throws ExecException;
        
        Show
        Richard Ding added a comment - Attaching patch for review. This patch removed the static variables from PigServer and PigContext classes. It also made UDFContext instance thread local. To avoid sharing PigContext object, users should use following constructors to create PigServer instance in each thread: public PigServer(ExecType execType) throws ExecException; public PigServer(ExecType execType, Properties properties) throws ExecException;
        Hide
        Alan Gates added a comment -

        Changes looks good. What kind of testing are we doing to make sure we can have PigServers running in multiple threads with no clashes?

        Show
        Alan Gates added a comment - Changes looks good. What kind of testing are we doing to make sure we can have PigServers running in multiple threads with no clashes?
        Hide
        Richard Ding added a comment -

        Attaching patch that added a unit test for UDFContext. There also are existing unit tests for parallel execution of bound script in embedded Pig.

        Test-patch output:

             [exec] -1 overall.  
             [exec] 
             [exec]     +1 @author.  The patch does not contain any @author tags.
             [exec] 
             [exec]     +1 tests included.  The patch appears to include 6 new or modified tests.
             [exec] 
             [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
             [exec] 
             [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
             [exec] 
             [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
             [exec] 
             [exec]     -1 release audit.  The applied patch generated 541 release audit warnings (more than the trunk's current 540 warnings).
        

        The release audit warning is html releted.

        Show
        Richard Ding added a comment - Attaching patch that added a unit test for UDFContext. There also are existing unit tests for parallel execution of bound script in embedded Pig. Test-patch output: [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 6 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] -1 release audit. The applied patch generated 541 release audit warnings (more than the trunk's current 540 warnings). The release audit warning is html releted.
        Hide
        Richard Ding added a comment -

        Patch committed to trunk.

        Show
        Richard Ding added a comment - Patch committed to trunk.
        Hide
        Vincent BARAT added a comment -

        Thanks guys ! You save my life with this patch !

        Show
        Vincent BARAT added a comment - Thanks guys ! You save my life with this patch !
        Hide
        Thomas Memenga added a comment -

        Be aware that the current implementation seems to have a memory leak if you reuse the threads.

        I have executed 1000s of (very small) pig jobs in parallel using a java.util.ExecutorService (fixed size thread pool)
        and I ran into memory problems after 3-4 hours. (Statistics related ?)

        My workaround: Spawning a new thread for each PigServer and let the garbage collector do the clean up.

        Show
        Thomas Memenga added a comment - Be aware that the current implementation seems to have a memory leak if you reuse the threads. I have executed 1000s of (very small) pig jobs in parallel using a java.util.ExecutorService (fixed size thread pool) and I ran into memory problems after 3-4 hours. (Statistics related ?) My workaround: Spawning a new thread for each PigServer and let the garbage collector do the clean up.

          People

          • Assignee:
            Richard Ding
            Reporter:
            Richard Ding
          • Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development