Pig
  1. Pig
  2. PIG-3050

Fix FindBugs multithreading warnings

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.11
    • Fix Version/s: 0.12.0
    • Component/s: None
    • Labels:
      None

      Description

      There was a race condition reported when running Pig in local mode on the user mailing list. This motivated me to fix potential multithreading bugs that can be identified by FindBugs.

      FindBugs identifies the following potential bugs:

      1. Mutable static field
      2. Inconsistent synchronization
      3. Incorrect lazy initialization of static field
      4. Incorrect lazy initialization and update of static field
      5. Unsynchronized get method, synchronized set method

      There are in total 1153 warnings that FindBugs complains, but they're outside of the scope of this jira.

      1. PIG-3050.patch
        16 kB
        Cheolsoo Park

        Issue Links

          Activity

          Hide
          Cheolsoo Park added a comment -

          Thank you for the review in the RB, Santhosh!
          (https://reviews.apache.org/r/8649/)

          I committed it to trunk.

          Show
          Cheolsoo Park added a comment - Thank you for the review in the RB, Santhosh! ( https://reviews.apache.org/r/8649/ ) I committed it to trunk.
          Hide
          Cheolsoo Park added a comment -

          Attached is a patch that fixes the following issues:

          • Mutual static field
            PhysicalOperator.java
            public static PigProgressable reporter;
            

            There was a reported race condition due to this static field (For details, see here). Since reporter should be local to thread, I converted it to ThreadLocal.

          • Inconsistent synchronization
            POStream.java
            public Result getNext(Tuple t) throws ExecException {
                ...
                if(initialized) {
                   ...
                }
                ...
            }
            ...
            public Result getNextHelper(Tuple t) throws ExecException {
                ...
                synchronized(this) {
                   ...
                   if(!initialized) {
                      ...
                   }
                   ...
                   initialized = true;
                   ...
                }
            }
            

            Synchronized access to initialized is performed inside getNextHelper(), but unsynchronized access was performed inside getNext(). I added a synchronized getter method and used that method inside getNext().

          • Incorrect lazy initialization of static field
            SpillableMemoryManager.java
                
            public static SpillableMemoryManager getInstance() {
                if (manager == null) {
                    manager = new SpillableMemoryManager();
                }
                return manager;
            }
            

            FindBugs says, "Because the compiler may reorder instructions, threads are not guaranteed to see a completely initialized object if the method can be called by multiple threads." So I declared manager as volatile.

          • Incorrect lazy initialization and update of static field
            SchemaTupleBackend.java
            public static void initialize(Configuration jConf, PigContext pigContext, boolean isLocal) throws IOException {
                if (stb != null) {
                    LOG.warn("SchemaTupleBackend has already been initialized");
                } else {
                    SchemaTupleFrontend.lazyReset(pigContext);
                    SchemaTupleFrontend.reset();
                    stb = new SchemaTupleBackend(jConf, isLocal);
                    stb.copyAndResolve();
                }
            }
            

            FindBugs says, "After the field is set, the object stored into that location is further updated. The setting of the field is visible to other threads as soon as it is set. If further accesses in the method that set the field serve to initialize the object, then you have a very serious multithreading bug." So I moved the assignment to the end of the method after all initialization is done.

          • Unsynchronized get method, synchronized set method
            PigHadoopLogger.java
            public synchronized void setReporter(PigStatusReporter rep) {
                this.reporter = rep;
            }
            public boolean getAggregate() {
                return aggregate;
            }
            

            I made getAggregate() synchronized.

          Show
          Cheolsoo Park added a comment - Attached is a patch that fixes the following issues: Mutual static field PhysicalOperator.java public static PigProgressable reporter; There was a reported race condition due to this static field (For details, see here ). Since reporter should be local to thread, I converted it to ThreadLocal. Inconsistent synchronization POStream.java public Result getNext(Tuple t) throws ExecException { ... if (initialized) { ... } ... } ... public Result getNextHelper(Tuple t) throws ExecException { ... synchronized ( this ) { ... if (!initialized) { ... } ... initialized = true ; ... } } Synchronized access to initialized is performed inside getNextHelper() , but unsynchronized access was performed inside getNext() . I added a synchronized getter method and used that method inside getNext() . Incorrect lazy initialization of static field SpillableMemoryManager.java public static SpillableMemoryManager getInstance() { if (manager == null ) { manager = new SpillableMemoryManager(); } return manager; } FindBugs says, "Because the compiler may reorder instructions, threads are not guaranteed to see a completely initialized object if the method can be called by multiple threads." So I declared manager as volatile. Incorrect lazy initialization and update of static field SchemaTupleBackend.java public static void initialize(Configuration jConf, PigContext pigContext, boolean isLocal) throws IOException { if (stb != null ) { LOG.warn( "SchemaTupleBackend has already been initialized" ); } else { SchemaTupleFrontend.lazyReset(pigContext); SchemaTupleFrontend.reset(); stb = new SchemaTupleBackend(jConf, isLocal); stb.copyAndResolve(); } } FindBugs says, "After the field is set, the object stored into that location is further updated. The setting of the field is visible to other threads as soon as it is set. If further accesses in the method that set the field serve to initialize the object, then you have a very serious multithreading bug." So I moved the assignment to the end of the method after all initialization is done. Unsynchronized get method, synchronized set method PigHadoopLogger.java public synchronized void setReporter(PigStatusReporter rep) { this .reporter = rep; } public boolean getAggregate() { return aggregate; } I made getAggregate() synchronized.
          Hide
          Cheolsoo Park added a comment -

          Hi Jonathan,

          There are two things:

          1. Single Pig job in local mode (LocalJobRunner).
          2. Multiple Pig jobs in the same VM as you mentioned.

          The direct motivation for this ticket is (1), but I am doing some experiments with (2) where I execute multiple jobs in a single VM.

          Thanks!

          Show
          Cheolsoo Park added a comment - Hi Jonathan, There are two things: Single Pig job in local mode (LocalJobRunner). Multiple Pig jobs in the same VM as you mentioned. The direct motivation for this ticket is (1), but I am doing some experiments with (2) where I execute multiple jobs in a single VM. Thanks!
          Hide
          Jonathan Coveney added a comment -

          For context, when you say multithreading do you mean running two Pig jobs in the same VM, or something else?

          Show
          Jonathan Coveney added a comment - For context, when you say multithreading do you mean running two Pig jobs in the same VM, or something else?

            People

            • Assignee:
              Cheolsoo Park
              Reporter:
              Cheolsoo Park
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development