Hadoop Common
  1. Hadoop Common
  2. HADOOP-5913

Allow administrators to be able to start and stop queues

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change, Reviewed
    • Release Note:
      New mradmin command -refreshQueues reads new configuration of ACLs and queue states from mapred-queues.xml. If the new queue state is not "running," jobs in progress will continue, but no other jobs from that queue will be started.

      Description

      This feature would provide functionality to stop and start queues in Hadoop at runtime.

      1. C5913-15y20s.patch
        44 kB
        Chris Douglas
      2. C5913-14y20s.patch
        50 kB
        Chris Douglas
      3. HADOOP-5913-14.patch
        60 kB
        Hemanth Yamijala
      4. HADOOP-5913-13.patch
        60 kB
        Hemanth Yamijala
      5. HADOOP-5913-12.patch
        52 kB
        Hemanth Yamijala
      6. HADOOP-5913-11.patch
        46 kB
        Hemanth Yamijala
      7. hadoop-5913-10.patch
        51 kB
        rahul k singh
      8. hadoop-5913-9.patch
        50 kB
        rahul k singh
      9. hadoop-5913-8.patch
        50 kB
        rahul k singh
      10. hadoop-5913-7.patch
        47 kB
        rahul k singh
      11. hadoop-5913-6.patch
        46 kB
        rahul k singh
      12. hadoop-5913-5.patch
        41 kB
        rahul k singh
      13. hadoop-5913-4.patch
        34 kB
        rahul k singh
      14. hadoop-5913-3.patch
        33 kB
        rahul k singh
      15. hadoop-5913-2.patch
        22 kB
        rahul k singh
      16. hadoop-5913-1.patch
        19 kB
        rahul k singh

        Activity

        Hide
        rahul k singh added a comment -

        All the running jobs in the queue will be completed , and any further new jobs will not be accepted.

        Show
        rahul k singh added a comment - All the running jobs in the queue will be completed , and any further new jobs will not be accepted.
        Hide
        rahul k singh added a comment -

        Summary:
        -This feature would provide functionality to stop and start queues in Hadoop at runtime. All the running jobs in the queue will be
        completed , and any further new jobs will not be accepted. This state would be persisted in the configuration file.

        Requirements:

        • Administrators should be able to stop(stop accepting jobs) and start(start accepting jobs)
          queues at runtime.
        • if a queue is stopped at runtime , it should complete all the existing running job and stop
          accepting any new jobs.
        • Administators should be able to change the queue configuration to start and stop.
        • Once configuration is updated , administrators run " hadoop refreshQueues" command to refresh the existing
          queues.This configuration would be persisted across the Jobtracker restart.
        • Moving all the queue info(ACLS and state(stop or running)) in a common xml file , this would reduce the no of
          commands required to change any queue settings , we can simply change this xml file and execute "hadoop refreshQueues"
          to update the queue information.
        • In order to do the above , we would be renaming mapred-queue-acls.xml file to mapred-queues.xml.

        Design

        -Assumptions:

        • Administrators would have permission to execute the "hadoop refreshQueues" command.
        • Only settings which are mentioned in mapred-queue.xml would be changed , any queue settings which are not part of
          mapred-queue.xml(For ex. settings in scheduler's configuration) will not be effected.
          -Summary
        • Rename the"mapred-queue-acls.xml" to "mapred-queues.xml"
        • Move all the queue related data from mapred-site.xml to mapred-queues.xml
          Following tags would be moved.
        • mapred.queue.names
        • mapred.acls.enabled
        • mapred.queue.<queueName>.acl-submit-job //This setting is for each queue
        • mapred.queue.<queueName>.acl-administer-job //This setting is for each queue
        • Refactor the existing code to encapsulate this new mapred-queues.xml file
        • Introduce the new command "hadoop refreshQueues" which reads from the existing mapred-queues.xml
        • Introduce new api in QueuesManager.java to check for state of the Queue.
        • Introduction of new property for Queue in mapred-queues.xml.
          "mapred.queue.<queueName>.state" values being "stopped" | "running"
        Show
        rahul k singh added a comment - Summary: -This feature would provide functionality to stop and start queues in Hadoop at runtime. All the running jobs in the queue will be completed , and any further new jobs will not be accepted. This state would be persisted in the configuration file. Requirements: Administrators should be able to stop(stop accepting jobs) and start(start accepting jobs) queues at runtime. if a queue is stopped at runtime , it should complete all the existing running job and stop accepting any new jobs. Administators should be able to change the queue configuration to start and stop. Once configuration is updated , administrators run " hadoop refreshQueues" command to refresh the existing queues.This configuration would be persisted across the Jobtracker restart. Moving all the queue info(ACLS and state(stop or running)) in a common xml file , this would reduce the no of commands required to change any queue settings , we can simply change this xml file and execute "hadoop refreshQueues" to update the queue information. In order to do the above , we would be renaming mapred-queue-acls.xml file to mapred-queues.xml. Design -Assumptions: Administrators would have permission to execute the "hadoop refreshQueues" command. Only settings which are mentioned in mapred-queue.xml would be changed , any queue settings which are not part of mapred-queue.xml(For ex. settings in scheduler's configuration) will not be effected. -Summary Rename the"mapred-queue-acls.xml" to "mapred-queues.xml" Move all the queue related data from mapred-site.xml to mapred-queues.xml Following tags would be moved. mapred.queue.names mapred.acls.enabled mapred.queue.<queueName>.acl-submit-job //This setting is for each queue mapred.queue.<queueName>.acl-administer-job //This setting is for each queue Refactor the existing code to encapsulate this new mapred-queues.xml file Introduce the new command "hadoop refreshQueues" which reads from the existing mapred-queues.xml Introduce new api in QueuesManager.java to check for state of the Queue. Introduction of new property for Queue in mapred-queues.xml. "mapred.queue.<queueName>.state" values being "stopped" | "running"
        Hide
        rahul k singh added a comment -

        the state would be part of queue configuration. Earlier statement of state being persisted in configuration is not correct.

        The queue "state" would be a configuration entry , administrator can change this entry and execute "hadoop mradmin -refreshQueues" command to update this value

        Show
        rahul k singh added a comment - the state would be part of queue configuration. Earlier statement of state being persisted in configuration is not correct. The queue "state" would be a configuration entry , administrator can change this entry and execute "hadoop mradmin -refreshQueues" command to update this value
        Hide
        rahul k singh added a comment -

        Submitting the first cut.
        Would be submitting the patch again with state related testcases.

        Show
        rahul k singh added a comment - Submitting the first cut. Would be submitting the patch again with state related testcases.
        Hide
        rahul k singh added a comment -

        Administrator can change the property values at runtime by
        1.changing the property values in mapred-queues.xml
        2. running "hadoop mradmin -refreshQueues"

        Show
        rahul k singh added a comment - Administrator can change the property values at runtime by 1.changing the property values in mapred-queues.xml 2. running "hadoop mradmin -refreshQueues"
        Hide
        rahul k singh added a comment -

        added testcase for state.

        Show
        rahul k singh added a comment - added testcase for state.
        Hide
        rahul k singh added a comment -

        Attaching new patch with doc changes

        Show
        rahul k singh added a comment - Attaching new patch with doc changes
        Hide
        rahul k singh added a comment -

        implemented sreekanth's offline comments

        Show
        rahul k singh added a comment - implemented sreekanth's offline comments
        Hide
        Sreekanth Ramakrishnan added a comment -
        • Please recheck mapred-queues.xml.template file. The file seems to have been duplicated from the last patch to this.
        • The file Queue.java which is a new file seems to be missing in the diff.

        Can you please correct this and upload a new patch?

        Show
        Sreekanth Ramakrishnan added a comment - Please recheck mapred-queues.xml.template file. The file seems to have been duplicated from the last patch to this. The file Queue.java which is a new file seems to be missing in the diff. Can you please correct this and upload a new patch?
        Hide
        rahul k singh added a comment -

        reformated Queue.java and QueueManager.java

        Show
        rahul k singh added a comment - reformated Queue.java and QueueManager.java
        Hide
        Sreekanth Ramakrishnan added a comment -

        +1 to the latest patch.

        Show
        Sreekanth Ramakrishnan added a comment - +1 to the latest patch.
        Hide
        rahul k singh added a comment -

        runtest

        [exec]
        [exec] -1 overall.
        [exec]
        [exec] +1 @author. The patch does not contain any @author tags.
        [exec]
        [exec] +1 tests included. The patch appears to include 11 new or modified tests.
        [exec]
        [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
        [exec]
        [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
        [exec]
        [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
        [exec]
        [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
        [exec]
        [exec] -1 release audit. The applied patch generated 494 release audit warnings (more than the trunk's current 493 warnings).

        -1 in release audit is due to addition of new xml file , and with out Apache license text at the top

        Show
        rahul k singh added a comment - runtest [exec] [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 11 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. [exec] [exec] -1 release audit. The applied patch generated 494 release audit warnings (more than the trunk's current 493 warnings). -1 in release audit is due to addition of new xml file , and with out Apache license text at the top
        Hide
        rahul k singh added a comment -

        run test patch

        [exec] -1 overall.
        [exec]
        [exec] +1 @author. The patch does not contain any @author tags.
        [exec]
        [exec] +1 tests included. The patch appears to include 11 new or modified tests.
        [exec]
        [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
        [exec]
        [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
        [exec]
        [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
        [exec]
        [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
        [exec]
        [exec] -1 release audit. The applied patch generated 494 release audit warnings (more than the trunk's current 493 warnings).
        [exec]
        [exec]
        [exec]

        -1 in release audit is due to addition of new xml file , and with out Apache license text at the top
        [exec]
        [exec] ======================================================================
        [exec] ======================================================================
        [exec] Finished build.
        [exec] ======================================================================
        [exec] ======================================================================
        [exec]

        Show
        rahul k singh added a comment - run test patch [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 11 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. [exec] [exec] -1 release audit. The applied patch generated 494 release audit warnings (more than the trunk's current 493 warnings). [exec] [exec] [exec] -1 in release audit is due to addition of new xml file , and with out Apache license text at the top [exec] [exec] ====================================================================== [exec] ====================================================================== [exec] Finished build. [exec] ====================================================================== [exec] ====================================================================== [exec]
        Hide
        Hemanth Yamijala added a comment -

        I tried to see if this is ready for a commit. But I had a few comments.

        • In the commands manual, the documentation of the refreshQueues command seems a little too terse. We should expand on it a bit more.
        • In cluster_setup documentation, it appears that mapred.queue.names property is described after the ACLs. This order should be reversed.
        • QueueManager.isRunning() should be synchronized.
        • In checkDeprecation, the checks for mapred.queue.names is using a getStrings() with a default value, which means it will never be null. Hence, it doesn't serve the purpose.
        • Methods in Queue don't seem to need to be public.
        • Should they also be synchronized ?
        • The QueueState enum has a method called 'equalTo'. Why doesn't the idiomatic 'equals' apply here ?
        • QueueManager.getQueueAcls() should be similarly delegated like hasAccess(), no ?
        Show
        Hemanth Yamijala added a comment - I tried to see if this is ready for a commit. But I had a few comments. In the commands manual, the documentation of the refreshQueues command seems a little too terse. We should expand on it a bit more. In cluster_setup documentation, it appears that mapred.queue.names property is described after the ACLs. This order should be reversed. QueueManager.isRunning() should be synchronized. In checkDeprecation, the checks for mapred.queue.names is using a getStrings() with a default value, which means it will never be null. Hence, it doesn't serve the purpose. Methods in Queue don't seem to need to be public. Should they also be synchronized ? The QueueState enum has a method called 'equalTo'. Why doesn't the idiomatic 'equals' apply here ? QueueManager.getQueueAcls() should be similarly delegated like hasAccess(), no ?
        Hide
        Hemanth Yamijala added a comment -

        Modified patch that incorporates the comments I raised. I also changed the refactoring slightly. Now, the Queue class is a simple data class with accessors. All the logic still rests with the QueueManager. I moved it to this model because it seemed like the logic was split between Queue and QueueManager and was making it more confusing that way.

        Queue related tests pass. Running test-patch and tests.

        Show
        Hemanth Yamijala added a comment - Modified patch that incorporates the comments I raised. I also changed the refactoring slightly. Now, the Queue class is a simple data class with accessors. All the logic still rests with the QueueManager. I moved it to this model because it seemed like the logic was split between Queue and QueueManager and was making it more confusing that way. Queue related tests pass. Running test-patch and tests.
        Hide
        Hemanth Yamijala added a comment -

        test-patch results:

        [exec] +1 overall.
        [exec]
        [exec] +1 @author. The patch does not contain any @author tags.
        [exec]
        [exec] +1 tests included. The patch appears to include 11 new or modified tests.
        [exec]
        [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
        [exec]
        [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
        [exec]
        [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
        [exec]
        [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
        [exec]
        [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
        [exec]

        Show
        Hemanth Yamijala added a comment - test-patch results: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 11 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec]
        Hide
        Hemanth Yamijala added a comment -

        A new patch that cleans up the code a bit, also fixed some test failures in the HADOOP-5913-11.patch I uploaded. Improves javadocs. ant test-patch continues passing with this patch. I also ran all tests and they pass locally.

        I will run a few manual tests for sanity, and if things work fine, I will commit this patch.

        Show
        Hemanth Yamijala added a comment - A new patch that cleans up the code a bit, also fixed some test failures in the HADOOP-5913 -11.patch I uploaded. Improves javadocs. ant test-patch continues passing with this patch. I also ran all tests and they pass locally. I will run a few manual tests for sanity, and if things work fine, I will commit this patch.
        Hide
        Hemanth Yamijala added a comment -

        Sigh ! The manual tests worked, but unfortunately, I realized the UI does not reflect the state of the queues, which I think is necessary for completeness of this patch.

        Show
        Hemanth Yamijala added a comment - Sigh ! The manual tests worked, but unfortunately, I realized the UI does not reflect the state of the queues, which I think is necessary for completeness of this patch.
        Hide
        Hemanth Yamijala added a comment -

        Added queue state information to the UI. Rahul, can you please take a look at the changes ?

        Show
        Hemanth Yamijala added a comment - Added queue state information to the UI. Rahul, can you please take a look at the changes ?
        Hide
        Hemanth Yamijala added a comment -

        I missed incrementing version number for JobSubmissionProtocol. This patch takes care of that.

        Show
        Hemanth Yamijala added a comment - I missed incrementing version number for JobSubmissionProtocol. This patch takes care of that.
        Hide
        rahul k singh added a comment -

        a small comment
        -In Queue.java schedulingInfo need to be set to "N/A" as done in JobQueueInfo

        Show
        rahul k singh added a comment - a small comment -In Queue.java schedulingInfo need to be set to "N/A" as done in JobQueueInfo
        Hide
        Hemanth Yamijala added a comment -

        Rahul, the schedulingInfo member in Queue is an Object and not a string. Therefore, it cannot be set to a default value like in JobQueueInfo. I verified that if the object is null, it will not cause any damage elsewhere in code. This seems to be a valid assumption. For instance, all calls to Queue.getSchedulingInfo (except in test cases) check for null before using the object. Does this make sense for you ?

        Show
        Hemanth Yamijala added a comment - Rahul, the schedulingInfo member in Queue is an Object and not a string. Therefore, it cannot be set to a default value like in JobQueueInfo. I verified that if the object is null, it will not cause any damage elsewhere in code. This seems to be a valid assumption. For instance, all calls to Queue.getSchedulingInfo (except in test cases) check for null before using the object. Does this make sense for you ?
        Hide
        rahul k singh added a comment -

        +1

        Show
        rahul k singh added a comment - +1
        Hide
        rahul k singh added a comment -

        +1

        Show
        rahul k singh added a comment - +1
        Hide
        Hemanth Yamijala added a comment -

        Running tests manually, but just in case hudson can pick it up...

        Show
        Hemanth Yamijala added a comment - Running tests manually, but just in case hudson can pick it up...
        Hide
        Hemanth Yamijala added a comment -

        test-patch output:

             [exec] +1 overall.
             [exec]
             [exec]     +1 @author.  The patch does not contain any @author tags.
             [exec]
             [exec]     +1 tests included.  The patch appears to include 18 new or modified tests.
             [exec]
             [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
             [exec]
             [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
             [exec]
             [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
             [exec]
             [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
             [exec]
             [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
        

        I ran all tests. Some tests timed out on my local machine. But they timed out on trunk as well, or were clearly not related to the patch. These are HADOOP-6061, HADOOP-6062 and HADOOP-6064. Based on this, I am planning to commit this patch soon.

        Show
        Hemanth Yamijala added a comment - test-patch output: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 18 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. I ran all tests. Some tests timed out on my local machine. But they timed out on trunk as well, or were clearly not related to the patch. These are HADOOP-6061 , HADOOP-6062 and HADOOP-6064 . Based on this, I am planning to commit this patch soon.
        Hide
        Hemanth Yamijala added a comment -

        I just committed this to trunk. Thanks, Rahul !

        Show
        Hemanth Yamijala added a comment - I just committed this to trunk. Thanks, Rahul !
        Hide
        Hudson added a comment -

        Integrated in Hadoop-trunk #870 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/870/)
        . Provide ability to an administrator to stop and start job queues. Contributed by Rahul Kumar Singh and Hemanth Yamijala.

        Show
        Hudson added a comment - Integrated in Hadoop-trunk #870 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/870/ ) . Provide ability to an administrator to stop and start job queues. Contributed by Rahul Kumar Singh and Hemanth Yamijala.
        Hide
        Robert Chansler added a comment -

        Editorial pass over all release notes prior to publication of 0.21.

        Show
        Robert Chansler added a comment - Editorial pass over all release notes prior to publication of 0.21.
        Hide
        Chris Douglas added a comment -

        Backport to 0.20

        Show
        Chris Douglas added a comment - Backport to 0.20
        Hide
        Chris Douglas added a comment -

        Avoid incompatible change to queue configuration.

        Show
        Chris Douglas added a comment - Avoid incompatible change to queue configuration.

          People

          • Assignee:
            rahul k singh
            Reporter:
            rahul k singh
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development