Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1825

jobqueue_details.jsp and FairSchedulerServelet should not call finishedMaps and finishedReduces when job is not initialized

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.20.1
    • Fix Version/s: 0.22.0
    • Component/s: jobtracker
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      JobInProgress.finishedMaps() and finishedReduces() are synchronized. They are called from jobqueue_details.jsp and FairSchedulerServelet which iterates through all jobs. If any job is in initialization, these pages don't come up until the initialization finishes.

      See comment for more details

      1. jstacks.zip
        14 kB
        Priyo Mustafi
      2. MAPREDUCE-1825_1.txt
        3 kB
        Priyo Mustafi
      3. MAPREDUCE-1825_2.txt
        3 kB
        Scott Chen
      4. MAPREDUCE-1825_3.txt
        5 kB
        Scott Chen
      5. MAPREDUCE-1825.txt
        3 kB
        Scott Chen

        Activity

        Hide
        Scott Chen added a comment -

        Amareshwari: The same thing happens in FairSchedulerServlet.java. Do you think we should file another JIRA or it can be fixed here in this one?

        Show
        Scott Chen added a comment - Amareshwari: The same thing happens in FairSchedulerServlet.java. Do you think we should file another JIRA or it can be fixed here in this one?
        Hide
        Amareshwari Sriramadasu added a comment -

        I think it can be fixed here. I see that they are related.

        Show
        Amareshwari Sriramadasu added a comment - I think it can be fixed here. I see that they are related.
        Hide
        Scott Chen added a comment -

        FairSchedulerServlet suffer from this problem seriously because it holds the JobTracker lock while looping through jobs.
        So fixing this is important for FairScheduler.

        I make both pages skip the uninitialized jobs.
        Do you think this is a right fix?
        I am still thinking how to test it. Any suggestion on the unit test?

        Show
        Scott Chen added a comment - FairSchedulerServlet suffer from this problem seriously because it holds the JobTracker lock while looping through jobs. So fixing this is important for FairScheduler. I make both pages skip the uninitialized jobs. Do you think this is a right fix? I am still thinking how to test it. Any suggestion on the unit test?
        Hide
        Priyo Mustafi added a comment -

        The attached patch has problems applying. Maybe stale. Attaching a new patch (same code) what applies fine.

        Show
        Priyo Mustafi added a comment - The attached patch has problems applying. Maybe stale. Attaching a new patch (same code) what applies fine.
        Hide
        Konstantin Shvachko added a comment -

        This looks good to me.
        Writing a unit test would be hard here. But it can be manual testing, like holding a lock in debugger and triggering webUI refresh, would be very useful.
        +1 modular manual testing.
        Nit. Please add empty line before FairSchedulerServlet.getInitedJobs()

        Show
        Konstantin Shvachko added a comment - This looks good to me. Writing a unit test would be hard here. But it can be manual testing, like holding a lock in debugger and triggering webUI refresh, would be very useful. +1 modular manual testing. Nit. Please add empty line before FairSchedulerServlet.getInitedJobs()
        Hide
        Scott Chen added a comment -

        Thanks Priyo for rebase the patch.
        Fix the empty line pointed out by Konstantin.

        Show
        Scott Chen added a comment - Thanks Priyo for rebase the patch. Fix the empty line pointed out by Konstantin.
        Hide
        Priyo Mustafi added a comment -

        Hi Scott and Konstantin,
        I tested by putting a breakpoint in JobInProgress.initTasks().

        1) jobqueue_details.jsp is working fine i.e. not locking up
        2) FairSchedulerServlet is locking up in showPools method as soon as it tries to synchronize on "scheduler". It continues again as soon as JIB.initTasks() finish. Not sure how this is happening as initTasks() lock JIB's monitor and showPools lock scheduler's monitor. Anyway, the patch doesn't seem to address the FSS issue.

        Show
        Priyo Mustafi added a comment - Hi Scott and Konstantin, I tested by putting a breakpoint in JobInProgress.initTasks(). 1) jobqueue_details.jsp is working fine i.e. not locking up 2) FairSchedulerServlet is locking up in showPools method as soon as it tries to synchronize on "scheduler". It continues again as soon as JIB.initTasks() finish. Not sure how this is happening as initTasks() lock JIB's monitor and showPools lock scheduler's monitor. Anyway, the patch doesn't seem to address the FSS issue.
        Hide
        Scott Chen added a comment -

        > 2) FairSchedulerServlet is locking up in showPools method as soon as it tries to synchronize on "scheduler". It continues again as soon as JIB.initTasks() finish. Not sure how this is happening as initTasks() lock JIB's monitor and showPools lock scheduler's monitor. Anyway, the patch doesn't seem to address the FSS issue.

        Thanks for the testing, Priyo.
        Can you take a jstack at the moment that it waits for scheduler?
        Then we can figure out the lock dependency.

        Show
        Scott Chen added a comment - > 2) FairSchedulerServlet is locking up in showPools method as soon as it tries to synchronize on "scheduler". It continues again as soon as JIB.initTasks() finish. Not sure how this is happening as initTasks() lock JIB's monitor and showPools lock scheduler's monitor. Anyway, the patch doesn't seem to address the FSS issue. Thanks for the testing, Priyo. Can you take a jstack at the moment that it waits for scheduler? Then we can figure out the lock dependency.
        Hide
        Priyo Mustafi added a comment -

        Added 4 jstacks taken few seconds apart

        Show
        Priyo Mustafi added a comment - Added 4 jstacks taken few seconds apart
        Hide
        Priyo Mustafi added a comment -

        This is what appears to be happening. JIB.initTasks() locks JIB. FairScheduler.assignTasks() has locked JT and FS and is waiting to lock JIB. Finally FairSchedulerServlet.showPools() is waiting to lock FS and is blocked.

        Now the current case may have been aggravated because I had a breakpoint on initTasks() but on a busy cluster, assignTasks() must be happening extremely often so there is a very high probability that assignTasks() will block on initTasks() and thereby causing FSS to block as well.

        Your thoughts?

        Show
        Priyo Mustafi added a comment - This is what appears to be happening. JIB.initTasks() locks JIB. FairScheduler.assignTasks() has locked JT and FS and is waiting to lock JIB. Finally FairSchedulerServlet.showPools() is waiting to lock FS and is blocked. Now the current case may have been aggravated because I had a breakpoint on initTasks() but on a busy cluster, assignTasks() must be happening extremely often so there is a very high probability that assignTasks() will block on initTasks() and thereby causing FSS to block as well. Your thoughts?
        Hide
        Scott Chen added a comment -

        Nice observation, Priyo. I think we can also fix the logic inside assignTasks() so it skips the uninitialized jobs.

        Show
        Scott Chen added a comment - Nice observation, Priyo. I think we can also fix the logic inside assignTasks() so it skips the uninitialized jobs.
        Hide
        Scott Chen added a comment -

        Addressed the problem found by Priyo

        Show
        Scott Chen added a comment - Addressed the problem found by Priyo
        Hide
        Priyo Mustafi added a comment -

        Hi Scott,
        I tested the new patch and this looks good. Looks like the patch applies fine on trunk but not on 0.22 because MAPREDUCE-1783 was added to trunk.

        Show
        Priyo Mustafi added a comment - Hi Scott, I tested the new patch and this looks good. Looks like the patch applies fine on trunk but not on 0.22 because MAPREDUCE-1783 was added to trunk.
        Hide
        Scott Chen added a comment -

        Hey Priyo,
        Thanks again for the help. I will commit MAPREDUCE-1783 to 0.22.

        Show
        Scott Chen added a comment - Hey Priyo, Thanks again for the help. I will commit MAPREDUCE-1783 to 0.22.
        Hide
        Scott Chen added a comment -

        test-patch result:

             [exec] -1 overall.
             [exec]
             [exec]     +1 @author.  The patch does not contain any @author tags.
             [exec]
             [exec]     -1 tests included.  The patch doesn't appear to include any new or modified tests.
             [exec]                         Please justify why no new tests are needed for this patch.
             [exec]                         Also please list what manual steps were performed to verify this patch.
             [exec]
             [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
             [exec]
             [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
             [exec]
             [exec]     +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.
             [exec]
             [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
             [exec]
             [exec]     +1 system test framework.  The patch passed system test framework compile.
             [exec]
             [exec]
             [exec]
             [exec]
             [exec] ======================================================================
             [exec] ======================================================================
             [exec]     Finished build.
             [exec] ======================================================================
             [exec] ======================================================================
        
        Show
        Scott Chen added a comment - test-patch result: [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no new tests are needed for this patch. [exec] Also please list what manual steps were performed to verify this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] +1 system test framework. The patch passed system test framework compile. [exec] [exec] [exec] [exec] [exec] ====================================================================== [exec] ====================================================================== [exec] Finished build. [exec] ====================================================================== [exec] ======================================================================
        Hide
        Priyo Mustafi added a comment -

        Patch was manually tested. Ant test passed. Looks good. +1

        Show
        Priyo Mustafi added a comment - Patch was manually tested. Ant test passed. Looks good. +1
        Hide
        Konstantin Shvachko added a comment -

        I just committed this. Thank you Scott and Priyo.

        Show
        Konstantin Shvachko added a comment - I just committed this. Thank you Scott and Priyo.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk-Commit #605 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/605/)
        MAPREDUCE-1825. jobqueue_details.jsp and FairSchedulerServelet should not call finishedMaps and finishedReduces when job is not initialized. Contributed by Scott Chen.

        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #605 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/605/ ) MAPREDUCE-1825 . jobqueue_details.jsp and FairSchedulerServelet should not call finishedMaps and finishedReduces when job is not initialized. Contributed by Scott Chen.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk-Commit #606 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/606/)
        MAPREDUCE-1825. A dot CHANGES.txt.

        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #606 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/606/ ) MAPREDUCE-1825 . A dot CHANGES.txt.
        Hide
        Scott Chen added a comment -

        Thanks for the help, Konstantin and Priyo.

        Show
        Scott Chen added a comment - Thanks for the help, Konstantin and Priyo.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-22-branch #33 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-22-branch/33/)

        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-22-branch #33 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-22-branch/33/ )
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #643 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk/643/)

        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #643 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk/643/ )

          People

          • Assignee:
            Scott Chen
            Reporter:
            Amareshwari Sriramadasu
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development