Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.0.3
    • Fix Version/s: 1.1.0
    • Component/s: mrv1
    • Labels:
      None

      Description

      In several failure scenarios it would be very handy to have an option to quiesce the JobTracker.

      Recently, we saw a case where the NameNode had to be rebooted at a customer due to a random hardware failure - in such a case it would have been nice to not lose jobs by quiescing the JobTracker.

      1. TestJobTrackerQuiescence.java
        7 kB
        Tom White
      2. MAPREDUCE-4328.patch
        16 kB
        Arun C Murthy
      3. MAPREDUCE-4328.patch
        54 kB
        Arun C Murthy

        Issue Links

          Activity

          Hide
          Arun C Murthy added a comment -

          I'm thinking in the quiesced mode the JT:

          1. Doesn't schedule anymore tasks.
          2. Doesn't mark any task as FAILED (every task is KILLED).
          3. Doesn't accept new job submissions.
          Show
          Arun C Murthy added a comment - I'm thinking in the quiesced mode the JT: Doesn't schedule anymore tasks. Doesn't mark any task as FAILED (every task is KILLED). Doesn't accept new job submissions.
          Hide
          Arun C Murthy added a comment -

          Here is a preliminary patch - I figured it's simpler to call it 'safemode' for JT ala NN.

          Show
          Arun C Murthy added a comment - Here is a preliminary patch - I figured it's simpler to call it 'safemode' for JT ala NN.
          Hide
          Bikas Saha added a comment -

          But how would you programmatically know that the NameNode is not operational?
          Wouldn't it help to get that information directly via an API? Do you know if one exist?
          Let me open jira to add one if it does not.

          Show
          Bikas Saha added a comment - But how would you programmatically know that the NameNode is not operational? Wouldn't it help to get that information directly via an API? Do you know if one exist? Let me open jira to add one if it does not.
          Hide
          Aaron T. Myers added a comment -

          Seems like we should also implement an analogous feature in trunk/2.0, so as not to have a feature regression from branch-1.

          Show
          Aaron T. Myers added a comment - Seems like we should also implement an analogous feature in trunk/2.0, so as not to have a feature regression from branch-1.
          Hide
          Kang Xiao added a comment -

          It is useful in some condition such as NN is down. Actually we find a way to achieve the first goal by updating the fair scheduler's conf set each pool's max share to be zero.
          The second goal will protect the job from going to FAILED. But it seems so possible for a job to go to FAILED since no more task scheduled.

          It may be more simple to just not invoke assignTasks() in JobTracker to implement the first goal. And it will not burden the scheduler implementation since 'safemode' is a small probability event.

          Show
          Kang Xiao added a comment - It is useful in some condition such as NN is down. Actually we find a way to achieve the first goal by updating the fair scheduler's conf set each pool's max share to be zero. The second goal will protect the job from going to FAILED. But it seems so possible for a job to go to FAILED since no more task scheduled. It may be more simple to just not invoke assignTasks() in JobTracker to implement the first goal. And it will not burden the scheduler implementation since 'safemode' is a small probability event.
          Hide
          Tom White added a comment -

          > 3. Doesn't accept new job submissions.

          To be clear - the client would get a failure, right? The current patch doesn't do that yet as far as I can see.

          A few other pieces of feedback on the patch:

          • The -refreshNodes option in MRAdmin was deleted from the usage message.
          • Rather than putting markup in the JobTracker (in getSafeModeText()), do the formatting in the JSP or a utility class like JSPUtil (which already exists).
          • Change JobTracker's getSafeMode() method to isInSafeMode(), to mirror NameNode.
          • MRAdmin introduced a couple of unneeded imports: DistributedFileSystem, org.mortbay.log.Log
          Show
          Tom White added a comment - > 3. Doesn't accept new job submissions. To be clear - the client would get a failure, right? The current patch doesn't do that yet as far as I can see. A few other pieces of feedback on the patch: The -refreshNodes option in MRAdmin was deleted from the usage message. Rather than putting markup in the JobTracker (in getSafeModeText()), do the formatting in the JSP or a utility class like JSPUtil (which already exists). Change JobTracker's getSafeMode() method to isInSafeMode(), to mirror NameNode. MRAdmin introduced a couple of unneeded imports: DistributedFileSystem, org.mortbay.log.Log
          Hide
          Tom White added a comment -

          I wrote a unit test for this (attached), which might be useful.

          Show
          Tom White added a comment - I wrote a unit test for this (attached), which might be useful.
          Hide
          Arun C Murthy added a comment -

          I finally got around to wrapping this up.

          The difference b/w the original and the final is that I've added an optional thread to monitor the NN and put the JT automatically in safemode, bug-fixes and tests.

          Show
          Arun C Murthy added a comment - I finally got around to wrapping this up. The difference b/w the original and the final is that I've added an optional thread to monitor the NN and put the JT automatically in safemode, bug-fixes and tests.
          Hide
          Arun C Murthy added a comment -

          All tests pass, ready to go.

          Show
          Arun C Murthy added a comment - All tests pass, ready to go.
          Hide
          Arun C Murthy added a comment -

          Uh! Wrong patch, fixed now.

          Show
          Arun C Murthy added a comment - Uh! Wrong patch, fixed now.
          Hide
          Arun C Murthy added a comment -

          Forgot to grant license, fixed.

          Show
          Arun C Murthy added a comment - Forgot to grant license, fixed.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12541997/MAPREDUCE-4328.patch
          against trunk revision .

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2757//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12541997/MAPREDUCE-4328.patch against trunk revision . -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2757//console This message is automatically generated.
          Hide
          Eli Collins added a comment -

          Hey Arun,

          Per ATM's above comment shouldn't we do the analogous feature for trunk first? Seems like this would be YARN safemode since the AM really isn't equivalent to the JT for this feature.

          Also, please motivate this feature by outlining the primary use cases. I don't think you need to write a design doc but a basic paragraph or two would be good. From my experience admins would like to quiesce the JT so they can prevent new jobs from being launched while draining the queue of current jobs to facilitate a cluster upgrade.

          Thanks,
          Eli

          Show
          Eli Collins added a comment - Hey Arun, Per ATM's above comment shouldn't we do the analogous feature for trunk first? Seems like this would be YARN safemode since the AM really isn't equivalent to the JT for this feature. Also, please motivate this feature by outlining the primary use cases. I don't think you need to write a design doc but a basic paragraph or two would be good. From my experience admins would like to quiesce the JT so they can prevent new jobs from being launched while draining the queue of current jobs to facilitate a cluster upgrade. Thanks, Eli
          Hide
          Arun C Murthy added a comment -

          Ah, I thought I responded to ATM, my bad.

          As I've described in the description of the jira the primary use-case is to allow JobTracker to be resilient to NN failures (hardware or software).

          I did think long and hard about doing this in YARN, but with HDFS-HA this use-case is pretty much non-existent. Furthermore, since YARN isn't tied to HDFS as MR1 is; and since it's distributed across several AMs there is no single point of control like the JT in MR1. Thus, I think there isn't enough value in porting it as-is, conceptually (not code-wise).

          In many ways this is similar to MAPREDUCE-3837, i.e. no straight-backport.

          Having said that, I plan to make sure we pay attention to this when we get around to fixing RM Restart. This is something I definitely plan to do later this year, at which point we'll ensure there is no 'feature regression'.

          Makes sense?


          Eli's point about draining queues is a good one, I've opened MAPREDUCE-4575 and YARN-38 to track that. That feature is something we can do a straight-mapping conceptually across MR1 and YARN.

          Show
          Arun C Murthy added a comment - Ah, I thought I responded to ATM, my bad. As I've described in the description of the jira the primary use-case is to allow JobTracker to be resilient to NN failures (hardware or software). I did think long and hard about doing this in YARN, but with HDFS-HA this use-case is pretty much non-existent. Furthermore, since YARN isn't tied to HDFS as MR1 is; and since it's distributed across several AMs there is no single point of control like the JT in MR1. Thus, I think there isn't enough value in porting it as-is, conceptually (not code-wise). In many ways this is similar to MAPREDUCE-3837 , i.e. no straight-backport. Having said that, I plan to make sure we pay attention to this when we get around to fixing RM Restart. This is something I definitely plan to do later this year, at which point we'll ensure there is no 'feature regression'. Makes sense? Eli's point about draining queues is a good one, I've opened MAPREDUCE-4575 and YARN-38 to track that. That feature is something we can do a straight-mapping conceptually across MR1 and YARN.
          Hide
          Eli Collins added a comment -

          Thanks Arun, and thanks for working on this.

          Show
          Eli Collins added a comment - Thanks Arun, and thanks for working on this.
          Hide
          Aaron T. Myers added a comment -

          Thanks a lot for the explanation, Arun. Makes sense.

          Show
          Aaron T. Myers added a comment - Thanks a lot for the explanation, Arun. Makes sense.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Had a brief review of the patch. +1 for commit, but only with a minor fix:

          + throw new AccessControlException(user +

          + " is not authorized to refresh nodes.");

          Should be get/set safemode.

          Show
          Vinod Kumar Vavilapalli added a comment - Had a brief review of the patch. +1 for commit, but only with a minor fix: + throw new AccessControlException(user + + " is not authorized to refresh nodes."); Should be get/set safemode.
          Hide
          Arun C Murthy added a comment -

          I just committed this after fixing the copy-paste error in the exception message. Thanks for the review Vinod!

          Show
          Arun C Murthy added a comment - I just committed this after fixing the copy-paste error in the exception message. Thanks for the review Vinod!
          Hide
          Arun C Murthy added a comment -

          Matt - if you don't mind, I'd like to merge this into branch-1.1 since it's been well baked-in. Thoughts?

          Show
          Arun C Murthy added a comment - Matt - if you don't mind, I'd like to merge this into branch-1.1 since it's been well baked-in. Thoughts?
          Hide
          Arun C Murthy added a comment -

          I merged this to branch-1.1 after talking to Matt.

          Show
          Arun C Murthy added a comment - I merged this to branch-1.1 after talking to Matt.
          Hide
          Matt Foley added a comment -

          Accepted.

          Show
          Matt Foley added a comment - Accepted.
          Hide
          Matt Foley added a comment -

          Closed upon release of Hadoop-1.1.0.

          Show
          Matt Foley added a comment - Closed upon release of Hadoop-1.1.0.

            People

            • Assignee:
              Arun C Murthy
              Reporter:
              Arun C Murthy
            • Votes:
              0 Vote for this issue
              Watchers:
              28 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development