Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-893

Provide an ability to refresh queue configuration without restart.

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: jobtracker
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      Extended the framework's refresh-queue mechanism to support refresh of scheduler specific queue properties and implemented this refresh operation for some of the capacity scheduler properties. With this feature, one can refresh some of the capacity-scheduler's queue related properties - queue capacities, user-limits per queue, max map/reduce capacity and max-jobs per user to initialize while the system is running and without restarting JT. Even after this, some features like changing enable/disable priorities, adding/removing queues are not supported in capacity-scheduler.
      Show
      Extended the framework's refresh-queue mechanism to support refresh of scheduler specific queue properties and implemented this refresh operation for some of the capacity scheduler properties. With this feature, one can refresh some of the capacity-scheduler's queue related properties - queue capacities, user-limits per queue, max map/reduce capacity and max-jobs per user to initialize while the system is running and without restarting JT. Even after this, some features like changing enable/disable priorities, adding/removing queues are not supported in capacity-scheduler.

      Description

      While administering a cluster using multiple queues, administrators feel a need to refresh queue properties on the fly without needing to restart the JobTracker. This is partially supported for some properties such as queue ACLs (HADOOP-5396) and state (HADOOP-5913). The idea is to extend the facility to refresh other queue properties as well, including scheduler properties.

      1. MAPREDUCE-893-7.patch
        159 kB
        rahul k singh
      2. MAPREDUCE-893-20090918.over.849.patch
        168 kB
        Vinod Kumar Vavilapalli
      3. MAPREDUCE-893-20090918.2.txt
        171 kB
        Vinod Kumar Vavilapalli
      4. MAPREDUCE-893-20090918.txt
        171 kB
        Vinod Kumar Vavilapalli
      5. MAPREDUCE-893-20090917.4.txt
        170 kB
        Vinod Kumar Vavilapalli
      6. MAPREDUCE-893-20090917.2.txt
        172 kB
        Vinod Kumar Vavilapalli
      7. MAPREDUCE-893-20090915.1.txt
        128 kB
        Vinod Kumar Vavilapalli

        Issue Links

          Activity

          Hide
          Hemanth Yamijala added a comment -

          As explained above, we have this feature for the properties that the framework handles w.r.to queues - ACLs and state.

          We can draw the scope of this JIRA from what exists already.

          • The existing framework for refresh of ACLs and state relies on a command hadoop mradmin -refreshQueues.
          • This command causes QueueManager to reload the configuration from the mapred-queues.xml file.
          • If there's a syntactic or semantic error in reload, the refresh command fails with an exception that is sent back to the hadoop mradmin command.
          • Importantly, the existing configuration is untouched and the system is left in a consistent state
          • The UGI of the administrator who raised the refresh is logged to the JT log for audit purposes.

          I believe all of these generic requirements will apply for the current JIRA as well.

          The extension of scope is in the following manner:

          • First the framework processes a reload of the configuration for properties it manages.
          • If it passes, the framework will call the scheduler to refresh queue properties that are managed by the scheduler. However, it will not commit its changes until this call succeeds.
          • In MAPREDUCE-861, we suggested that schedulers which could define their properties as key-value pairs can do so in the format we suggest here. (Look at the properties tag under queue). The framework can pass the list of properties per queue to the scheduler, maybe as a Map of queue-name and properties.
          • If the scheduler cannot process this information, it will throw an error (via an exception or a return value) and discard the changes itself. The framework will likewise discard it own changes and return error to the client with an appropriate message.
          • It is possible that some of the properties may not be refresh-able. For e.g. we are not going to handle new queues getting added or deleted. I think we should give a return value back to the refreshQueues to indicate this.

          Does this work ?

          Show
          Hemanth Yamijala added a comment - As explained above, we have this feature for the properties that the framework handles w.r.to queues - ACLs and state. We can draw the scope of this JIRA from what exists already. The existing framework for refresh of ACLs and state relies on a command hadoop mradmin -refreshQueues . This command causes QueueManager to reload the configuration from the mapred-queues.xml file. If there's a syntactic or semantic error in reload, the refresh command fails with an exception that is sent back to the hadoop mradmin command. Importantly, the existing configuration is untouched and the system is left in a consistent state The UGI of the administrator who raised the refresh is logged to the JT log for audit purposes. I believe all of these generic requirements will apply for the current JIRA as well. The extension of scope is in the following manner: First the framework processes a reload of the configuration for properties it manages. If it passes, the framework will call the scheduler to refresh queue properties that are managed by the scheduler. However, it will not commit its changes until this call succeeds. In MAPREDUCE-861 , we suggested that schedulers which could define their properties as key-value pairs can do so in the format we suggest here . (Look at the properties tag under queue). The framework can pass the list of properties per queue to the scheduler, maybe as a Map of queue-name and properties. If the scheduler cannot process this information, it will throw an error (via an exception or a return value) and discard the changes itself. The framework will likewise discard it own changes and return error to the client with an appropriate message. It is possible that some of the properties may not be refresh-able. For e.g. we are not going to handle new queues getting added or deleted. I think we should give a return value back to the refreshQueues to indicate this. Does this work ?
          Hide
          Nigel Daley added a comment -

          However, it will not commit its changes until this call succeeds.

          What is 'it"? Commit to where?

          Show
          Nigel Daley added a comment - However, it will not commit its changes until this call succeeds. What is 'it"? Commit to where?
          Hide
          Hemanth Yamijala added a comment -

          What is 'it"? Commit to where?

          The framework will not commit changes to queue properties it has validated until the scheduler validates properties it owns. Commit to it's internal data structures which are the source of truth for answering queries on ACLs, state, etc.

          Show
          Hemanth Yamijala added a comment - What is 'it"? Commit to where? The framework will not commit changes to queue properties it has validated until the scheduler validates properties it owns. Commit to it's internal data structures which are the source of truth for answering queries on ACLs, state, etc.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Uploading first patch for review.

          • This defines TaskScheduler.QueueRefresher which scheduler should override so as to support refresh of the queues.
          • QueueManager's refresh code is modified so as to also includes refresh of queues by the schedulers.
          • Implemented TaskScheduler.QueueRefresher in CapacityTaskScheduler so as to support queue-refresh in this scheduler.
          • Added some basic tests w.r.t verifying capacities and user-limits' refresh.

          This needs some more tests to handle failures during refresh. Uploading patch so as to get the first refresh started.

          Show
          Vinod Kumar Vavilapalli added a comment - Uploading first patch for review. This defines TaskScheduler.QueueRefresher which scheduler should override so as to support refresh of the queues. QueueManager's refresh code is modified so as to also includes refresh of queues by the schedulers. Implemented TaskScheduler.QueueRefresher in CapacityTaskScheduler so as to support queue-refresh in this scheduler. Added some basic tests w.r.t verifying capacities and user-limits' refresh. This needs some more tests to handle failures during refresh. Uploading patch so as to get the first refresh started.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Forgot to mention that this patch has to be applied over the latest patch at MAPREDUCE-861.

          Show
          Vinod Kumar Vavilapalli added a comment - Forgot to mention that this patch has to be applied over the latest patch at MAPREDUCE-861 .
          Hide
          rahul k singh added a comment -

          Some comments:
          QueueManager
          1. TO DO : Comment for getQueueConfigurationParser is not required.
          Dont think there is any need of introducing parser.parse method. As for , there will always be 1 parser that is QueueConfigurationParser
          and existing DeprecatedConfigurationParser is just for deprecation and would go away.

          2.The parameter name in "synchronized void refreshQueues(Configuration conf, QueueRefresher scheduler)"
          ) should be refresher instead of scheduler.

          QueueRefresher
          1.QueueRefresher should be an abstract class with refreshQueues being abstract.
          Can it be something like this:
          -----------------------------
          abstract class QueueRefresher

          { abstract void refreshQueues(List<JobQueueInfo> newRootQueues) throws Throwable; }

          QueueRefresher getQueueRefresher()

          { return null; }

          }

          Queue
          1.In QueueState no need for enumMap. QueueState.valueOf(String name) would give the QueueState object.

          2.newState.getChildren().size() state in isHierarchySame would return NullPointerException
          incase of no children.

          3.Do we need to define an extra state of UNDEFINED in QueueState , as Queues with no state would have a default state of RUNNING.
          which is same behaviour before states were introduced.
          It basically boils down to the question , what is the value of state in case of queues with children.

          CapacityTaskScheduler
          1. Remove instance of schedConf.
          CapacitySchedulerConf constructor should just be instantiated and get the value most of the time we any how use QueueSchedulingContext values.
          Incase of not passing any properties ,CapacitySchedulerConf would give default values.
          Incase of distributedUnConfiguredCapacity some of the testcases directly check the configuration xml.So they need to be taken care of.

          2. Remove SchedulingDisplayInfo object alltogether , instead pass QueueSchedulingContext directly.
          Also check incase above case do we need queueNameToQueueContextMap anymore.

          3. We can remove the synchronized block from initializeQueues , instead we can have more fine grained locking
          when we just change root or update root or not at all.

          4. We can move most of the code inside initializeQueues to QueueHierarchyBuilder.

          so initializeQueues becomes following :
          QueueHierarchyBuilder builder = new QueueHierarchyBuilder();
          if(refreshingQueues)

          { //do current stuff }

          else

          { this.root = builder.getRoot(); }

          5. instead of updateConfigurationOfAbstractQueues in CapacityTaskScheduler ,
          can we move this method to AbstractQueues class ,
          so we can call root.refresh(newRoot) such that parents let there children update themselves first then parents update themselves.
          so refreshing logic becomes simply as.
          root.sort(comparator);
          newRoot.sort(comparator);
          root.refresh(newRoot);
          6.updateQueueMaps updates jobQueueManager unnecessarily as the list of JobQueue should not be changed in case of refresh.
          this method might need to be revisited once we implement 2. This would also remove the need of synchronization in JobQueuesManager

          7.getOrderedQueue signature should be reverted back to return AbstractQueue
          8.initializationPoller instance variable scope is changed to package it shuold be private
          9.getDisplayInfo neednot be synchronized as it is only reading data.

          AbstractQueue
          1. It is incorrect to change the signature of getDescendentJobQueues to return List of JobQueue. return List of AbstractQueue provides
          the correct abstraction. JobQueue is just another specialized AbstractQueue with containers for JIP.

          JobQueuesManager
          1. We dont need to synchronize any of the methods in JobQueuesManager. As it is always called from JobTracker in synchronized block
          also the queue structure jobQueues is only created once and never changed. double check ??

          TaskSchedulingContext
          1.Removal of unused code. not related to patch though

          JobQueueInfo
          1. there is no need for maintaining the fullyQualifiedName and name separatly . Instead we shuold always use fullyQualified name everywhere
          and can provide method to get the normal queueName.

          Show
          rahul k singh added a comment - Some comments: QueueManager 1. TO DO : Comment for getQueueConfigurationParser is not required. Dont think there is any need of introducing parser.parse method. As for , there will always be 1 parser that is QueueConfigurationParser and existing DeprecatedConfigurationParser is just for deprecation and would go away. 2.The parameter name in "synchronized void refreshQueues(Configuration conf, QueueRefresher scheduler)" ) should be refresher instead of scheduler. QueueRefresher 1.QueueRefresher should be an abstract class with refreshQueues being abstract. Can it be something like this: ----------------------------- abstract class QueueRefresher { abstract void refreshQueues(List<JobQueueInfo> newRootQueues) throws Throwable; } QueueRefresher getQueueRefresher() { return null; } } Queue 1.In QueueState no need for enumMap. QueueState.valueOf(String name) would give the QueueState object. 2.newState.getChildren().size() state in isHierarchySame would return NullPointerException incase of no children. 3.Do we need to define an extra state of UNDEFINED in QueueState , as Queues with no state would have a default state of RUNNING. which is same behaviour before states were introduced. It basically boils down to the question , what is the value of state in case of queues with children. CapacityTaskScheduler 1. Remove instance of schedConf. CapacitySchedulerConf constructor should just be instantiated and get the value most of the time we any how use QueueSchedulingContext values. Incase of not passing any properties ,CapacitySchedulerConf would give default values. Incase of distributedUnConfiguredCapacity some of the testcases directly check the configuration xml.So they need to be taken care of. 2. Remove SchedulingDisplayInfo object alltogether , instead pass QueueSchedulingContext directly. Also check incase above case do we need queueNameToQueueContextMap anymore. 3. We can remove the synchronized block from initializeQueues , instead we can have more fine grained locking when we just change root or update root or not at all. 4. We can move most of the code inside initializeQueues to QueueHierarchyBuilder. so initializeQueues becomes following : QueueHierarchyBuilder builder = new QueueHierarchyBuilder(); if(refreshingQueues) { //do current stuff } else { this.root = builder.getRoot(); } 5. instead of updateConfigurationOfAbstractQueues in CapacityTaskScheduler , can we move this method to AbstractQueues class , so we can call root.refresh(newRoot) such that parents let there children update themselves first then parents update themselves. so refreshing logic becomes simply as. root.sort(comparator); newRoot.sort(comparator); root.refresh(newRoot); 6.updateQueueMaps updates jobQueueManager unnecessarily as the list of JobQueue should not be changed in case of refresh. this method might need to be revisited once we implement 2. This would also remove the need of synchronization in JobQueuesManager 7.getOrderedQueue signature should be reverted back to return AbstractQueue 8.initializationPoller instance variable scope is changed to package it shuold be private 9.getDisplayInfo neednot be synchronized as it is only reading data. AbstractQueue 1. It is incorrect to change the signature of getDescendentJobQueues to return List of JobQueue. return List of AbstractQueue provides the correct abstraction. JobQueue is just another specialized AbstractQueue with containers for JIP. JobQueuesManager 1. We dont need to synchronize any of the methods in JobQueuesManager. As it is always called from JobTracker in synchronized block also the queue structure jobQueues is only created once and never changed. double check ?? TaskSchedulingContext 1.Removal of unused code. not related to patch though JobQueueInfo 1. there is no need for maintaining the fullyQualifiedName and name separatly . Instead we shuold always use fullyQualified name everywhere and can provide method to get the normal queueName.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Attaching patch that includes the above comments. In particular:

          • Documented QueueConfigurationParser's constructor that it also does parsing itself.
          • Renamed scheduler parameter of refreshQueues() to schedulerRefresher.
          • Inner class CapacitySchedulerQueueRefresher uses 'initializationPoller' instance. Left it as is to avoid synthetic constructors.
          • getDisplayInfo() is knocked off completely.

          TestRefreshOfQueues has minor problems and is being fixed. Everything else can be reviewed. I'll fix it as the review progresses.

          Some questions:

          • There is a need for getting a QueueState from a string. QueueState().valueOf() returns exact name i.e. RUNNING, STOPPED etc, so we cannot use them directly as display names can be different. Is maintaining the Queue.QueueState.enumMap bad? What is the standard way of handling this in enums?
          • I left my earlier change which assigns the state UNDEFINED to Container-Queues. This I did mainly because state doesn't make sense to Container-Queues - they cannot be stopped, started for e.g. as of now. Is this not OK?
          • Capacity-scheduler.xml still contains the default queue specific properties. Shouldn't we remove them? If so, where shall the documentation of scheduler properties be?
          • Removed the schedConf instance in scheduler because maintaining it would add extra effort to keep it consistent during refresh. Is this fine?
          • Removed SchedulingDisplayInfo object all together and instead passing QueueSchedulingContext to the QueueManager directly. Also, the corresponding queueInfoMap completely. Is this change OK?
          • Minor: I am confused about the name TestContainerQueue and what it is actually testing. Should we rename it to TestHierarchicalQueues or something like that?
          Show
          Vinod Kumar Vavilapalli added a comment - Attaching patch that includes the above comments. In particular: Documented QueueConfigurationParser's constructor that it also does parsing itself. Renamed scheduler parameter of refreshQueues() to schedulerRefresher. Inner class CapacitySchedulerQueueRefresher uses 'initializationPoller' instance. Left it as is to avoid synthetic constructors. getDisplayInfo() is knocked off completely. TestRefreshOfQueues has minor problems and is being fixed. Everything else can be reviewed. I'll fix it as the review progresses. Some questions: There is a need for getting a QueueState from a string. QueueState().valueOf() returns exact name i.e. RUNNING, STOPPED etc, so we cannot use them directly as display names can be different. Is maintaining the Queue.QueueState.enumMap bad? What is the standard way of handling this in enums? I left my earlier change which assigns the state UNDEFINED to Container-Queues. This I did mainly because state doesn't make sense to Container-Queues - they cannot be stopped, started for e.g. as of now. Is this not OK? Capacity-scheduler.xml still contains the default queue specific properties. Shouldn't we remove them? If so, where shall the documentation of scheduler properties be? Removed the schedConf instance in scheduler because maintaining it would add extra effort to keep it consistent during refresh. Is this fine? Removed SchedulingDisplayInfo object all together and instead passing QueueSchedulingContext to the QueueManager directly. Also, the corresponding queueInfoMap completely. Is this change OK? Minor: I am confused about the name TestContainerQueue and what it is actually testing. Should we rename it to TestHierarchicalQueues or something like that?
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Also, for those who are listening, this is the new QueueRefresher interface that I am adding to TaskScheduler:

          abstract class TaskScheduler implements Configurable {
             ......
             ......
          
            /**
             * Abstract QueueRefresher class. Scheduler's can extend this and return an
             * instance of this in the {@link #getQueueRefresher()} method. The
             * {@link #refreshQueues(List)} method of this instance will be invoked by the
             * {@link QueueManager} whenever it gets a request from an administrator to
             * refresh its own queue-configuration. This method has a documented contract
             * between the {@link QueueManager} and the {@link TaskScheduler}.
             */
            abstract class QueueRefresher {
          
              /**
               * Refresh the queue-configuration in the scheduler. This method has the
               * following contract.
               * <ol>
               * <li>Before this method, {@link QueueManager} does a validation of the new
               * queue-configuration. For e.g, currently addition of new queues, or
               * removal of queues at any level in the hierarchy is not supported by
               * {@link QueueManager} and so are not supported for schedulers too.</li>
               * <li>Schedulers will be passed a list of {@link JobQueueInfo}s of the root
               * queues i.e. the queues at the top level. All the descendants are properly
               * linked from these top-level queues.</li>
               * <li>Schedulers should use the scheduler specific queue properties from
               * the newRootQueues, validate the properties themselves and apply them
               * internally.</li>
               * <li>
               * Once the method returns successfully from the schedulers, it is assumed
               * that the refresh of queue properties is successful throughout and will be
               * 'committed' internally to {@link QueueManager} too. It is guaranteed that
               * at no point, after successful return from the scheduler, is the queue
               * refresh in QueueManager failed. If ever, such abnormalities happen, the
               * queue framework will be inconsistent and will need a JT restart.</li>
               * <li>If scheduler throws an exception during {@link #refreshQueues()},
               * {@link QueueManager} throws away the newly read configuration, retains
               * the old (consistent) configuration and informs the request issuer about
               * the error appropriately.</li>
               * </ol>
               * 
               * @param newRootQueues
               */
              abstract void refreshQueues(List<JobQueueInfo> newRootQueues)
                  throws Throwable;
            }
          
            /**
             * Get the {@link QueueRefresher} for this scheduler. By default, no
             * {@link QueueRefresher} exists for a scheduler and is set to null.
             * Schedulers need to return an instance of {@link QueueRefresher} if they
             * wish to refresh their queue-configuration when {@link QueueManager}
             * refreshes its own queue-configuration via an administrator request.
             * 
             * @return
             */
            QueueRefresher getQueueRefresher() {
              return null;
            }
          }
          
          Show
          Vinod Kumar Vavilapalli added a comment - Also, for those who are listening, this is the new QueueRefresher interface that I am adding to TaskScheduler: abstract class TaskScheduler implements Configurable { ...... ...... /** * Abstract QueueRefresher class. Scheduler's can extend this and return an * instance of this in the {@link #getQueueRefresher()} method. The * {@link #refreshQueues(List)} method of this instance will be invoked by the * {@link QueueManager} whenever it gets a request from an administrator to * refresh its own queue-configuration. This method has a documented contract * between the {@link QueueManager} and the {@link TaskScheduler}. */ abstract class QueueRefresher { /** * Refresh the queue-configuration in the scheduler. This method has the * following contract. * <ol> * <li>Before this method, {@link QueueManager} does a validation of the new * queue-configuration. For e.g, currently addition of new queues, or * removal of queues at any level in the hierarchy is not supported by * {@link QueueManager} and so are not supported for schedulers too.</li> * <li>Schedulers will be passed a list of {@link JobQueueInfo}s of the root * queues i.e. the queues at the top level. All the descendants are properly * linked from these top-level queues.</li> * <li>Schedulers should use the scheduler specific queue properties from * the newRootQueues, validate the properties themselves and apply them * internally.</li> * <li> * Once the method returns successfully from the schedulers, it is assumed * that the refresh of queue properties is successful throughout and will be * 'committed' internally to {@link QueueManager} too. It is guaranteed that * at no point, after successful return from the scheduler, is the queue * refresh in QueueManager failed. If ever, such abnormalities happen, the * queue framework will be inconsistent and will need a JT restart.</li> * <li>If scheduler throws an exception during {@link #refreshQueues()}, * {@link QueueManager} throws away the newly read configuration, retains * the old (consistent) configuration and informs the request issuer about * the error appropriately.</li> * </ol> * * @param newRootQueues */ abstract void refreshQueues(List<JobQueueInfo> newRootQueues) throws Throwable; } /** * Get the {@link QueueRefresher} for this scheduler. By default , no * {@link QueueRefresher} exists for a scheduler and is set to null . * Schedulers need to return an instance of {@link QueueRefresher} if they * wish to refresh their queue-configuration when {@link QueueManager} * refreshes its own queue-configuration via an administrator request. * * @ return */ QueueRefresher getQueueRefresher() { return null ; } }
          Hide
          Hemanth Yamijala added a comment -

          Capacity-scheduler.xml still contains the default queue specific properties. Shouldn't we remove them? If so, where shall the documentation of scheduler properties be?

          I think we should remove the properties that are no longer supposed to be capacity-scheduler.xml and add a new file - something like mapred-queues-for-cs.xml.template. This would only be for documentation purposes and we could comment that users of capacity scheduler can rename this to mapred-queues.xml adding more queues etc, based on the documentation provided. Works ?

          Removed SchedulingDisplayInfo object all together and instead passing QueueSchedulingContext to the QueueManager directly. Also, the corresponding queueInfoMap completely. Is this change OK?

          As discussed offline, I am not completely comfortable with this idea. I see the QueueSchedulingContext as a core object of the CS, and wouldn't want the framework to hold an instance of it (that too, in ways that are not very easily detectable). I think the existing model of providing just a 'view' of the scheduler state to the framework is better.

          Regarding the rest of the questions, I am not really too strong on an opinion. Let's try and minimize change to the extent possible at this stage in the interest of caution. smile

          Show
          Hemanth Yamijala added a comment - Capacity-scheduler.xml still contains the default queue specific properties. Shouldn't we remove them? If so, where shall the documentation of scheduler properties be? I think we should remove the properties that are no longer supposed to be capacity-scheduler.xml and add a new file - something like mapred-queues-for-cs.xml.template. This would only be for documentation purposes and we could comment that users of capacity scheduler can rename this to mapred-queues.xml adding more queues etc, based on the documentation provided. Works ? Removed SchedulingDisplayInfo object all together and instead passing QueueSchedulingContext to the QueueManager directly. Also, the corresponding queueInfoMap completely. Is this change OK? As discussed offline, I am not completely comfortable with this idea. I see the QueueSchedulingContext as a core object of the CS, and wouldn't want the framework to hold an instance of it (that too, in ways that are not very easily detectable). I think the existing model of providing just a 'view' of the scheduler state to the framework is better. Regarding the rest of the questions, I am not really too strong on an opinion. Let's try and minimize change to the extent possible at this stage in the interest of caution. smile
          Hide
          rahul k singh added a comment -

          comments :
          CapacityTestUtils:

          1. ((FakeQueueManager) qm).getQueue(q.queueName).setProperties(p);
          No need for casting qm is already FakeQueueManager.

          TestQueueManagerRefresh
          1.testRefreshFailureWithChangeOfHierarchy(JobQueueInfo[] queues => This parameter is not used.

          QueueManager
          1.Check if QueueRefresher parameter is null in refreshQueues method.

          CapacityTaskScheduler.
          1.Check for supportsPriority , it should not be updated as part of refresh.

          2.JobInitializationPoller entries are part of capacity-scheduler.xml , but we are not refreshing capacity-scheduler.xml so we need to
          queue specific JobInitializationPoller entries to QueueSchedulingContext.

          AbstractQueue
          1.addChild method should be documented as being only used for testing.

          Queue
          1. there is no need for maintaining the fullyQualifiedName and name separatly . Instead we shuold always use fullyQualified name everywhere
          and can provide method to get the normal queueName.

          Show
          rahul k singh added a comment - comments : CapacityTestUtils: 1. ((FakeQueueManager) qm).getQueue(q.queueName).setProperties(p); No need for casting qm is already FakeQueueManager. TestQueueManagerRefresh 1.testRefreshFailureWithChangeOfHierarchy(JobQueueInfo[] queues => This parameter is not used. QueueManager 1.Check if QueueRefresher parameter is null in refreshQueues method. CapacityTaskScheduler. 1.Check for supportsPriority , it should not be updated as part of refresh. 2.JobInitializationPoller entries are part of capacity-scheduler.xml , but we are not refreshing capacity-scheduler.xml so we need to queue specific JobInitializationPoller entries to QueueSchedulingContext. AbstractQueue 1.addChild method should be documented as being only used for testing. Queue 1. there is no need for maintaining the fullyQualifiedName and name separatly . Instead we shuold always use fullyQualified name everywhere and can provide method to get the normal queueName.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Attaching patch that addresses the above comments. Test cases are also fixed now. Will run it through Hudson.

          Show
          Vinod Kumar Vavilapalli added a comment - Attaching patch that addresses the above comments. Test cases are also fixed now. Will run it through Hudson.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          I think we should remove the properties that are no longer supposed to be capacity-scheduler.xml and add a new file - something like mapred-queues-for-cs.xml.template. This would only be for documentation purposes and we could comment that users of capacity scheduler can rename this to mapred-queues.xml adding more queues etc, based on the documentation provided. Works ?

          I am not taking this up in this JIRA. Can do it in a follow up.

          Show
          Vinod Kumar Vavilapalli added a comment - I think we should remove the properties that are no longer supposed to be capacity-scheduler.xml and add a new file - something like mapred-queues-for-cs.xml.template. This would only be for documentation purposes and we could comment that users of capacity scheduler can rename this to mapred-queues.xml adding more queues etc, based on the documentation provided. Works ? I am not taking this up in this JIRA. Can do it in a follow up.
          Hide
          rahul k singh added a comment -

          A quick comment:
          QueueManager.java

          1. refresh of schedulingInfo in refreshQueue method is incorrect.

          this.root.copySchedulingInfo(cp.getRoot());
          //instead of above line is should be
          cp.getRoot().copySchedulingInfo(root);

          at this stage we know that cp is fully updated. so we can initialize the other containers in QM.

          Show
          rahul k singh added a comment - A quick comment: QueueManager.java 1. refresh of schedulingInfo in refreshQueue method is incorrect. this.root.copySchedulingInfo(cp.getRoot()); //instead of above line is should be cp.getRoot().copySchedulingInfo(root); at this stage we know that cp is fully updated. so we can initialize the other containers in QM.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12419905/MAPREDUCE-893-20090917.4.txt
          against trunk revision 816240.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 39 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 1 new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/46/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/46/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/46/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/46/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12419905/MAPREDUCE-893-20090917.4.txt against trunk revision 816240. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 39 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/46/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/46/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/46/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/46/console This message is automatically generated.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Attaching patch that fixes the test-case problems. Also fixing the bug that Rahuk pointed out and added a simple test to verify the same. This bug crept in while trying to fix and incorporate MAPREDUCE-996 in this issue itself.

          Show
          Vinod Kumar Vavilapalli added a comment - Attaching patch that fixes the test-case problems. Also fixing the bug that Rahuk pointed out and added a simple test to verify the same. This bug crept in while trying to fix and incorporate MAPREDUCE-996 in this issue itself.
          Hide
          rahul k singh added a comment -

          Code looks in decent shape now , some minor nits:

          QueueManager.
          1.Move update of schedulingInfo in refreshQueues
          CapacityTaskScheduler
          1.Make access of some of the method as private (addToQueueInfoMap , createDisplayInfo)

          Show
          rahul k singh added a comment - Code looks in decent shape now , some minor nits: QueueManager. 1.Move update of schedulingInfo in refreshQueues CapacityTaskScheduler 1.Make access of some of the method as private (addToQueueInfoMap , createDisplayInfo)
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Uploading patch including the above comments.

          • Also fixed some error messages, log messages.
          • Fixed a bug in refresh w.r.t copying TaskSchedulingInfos.
          • Fixed all the tests.
          • Updated patch with the changes in MAPREDUCE-777.
          Show
          Vinod Kumar Vavilapalli added a comment - Uploading patch including the above comments. Also fixed some error messages, log messages. Fixed a bug in refresh w.r.t copying TaskSchedulingInfos. Fixed all the tests. Updated patch with the changes in MAPREDUCE-777 .
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Either I am dreaming or I am just too lucky, this patch is the only one in the path-queue! Am running sanity tests (integration) locally.

          Show
          Vinod Kumar Vavilapalli added a comment - Either I am dreaming or I am just too lucky, this patch is the only one in the path-queue! Am running sanity tests (integration) locally.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          I ran run-commit-test and they ran successfully.

          Show
          Vinod Kumar Vavilapalli added a comment - I ran run-commit-test and they ran successfully.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Apparently MAPREDUCE-849 is almost ready for commit, as I've heard from Ravi/Hemanth. So I am uploading a patch that should be applied over the latest one at MAPREDUCE-849.

          Also, thanks to Jothi, I am adding a small change to suppress the unwarranted findBugs warning earlier reported by Hudson.

          Show
          Vinod Kumar Vavilapalli added a comment - Apparently MAPREDUCE-849 is almost ready for commit, as I've heard from Ravi/Hemanth. So I am uploading a patch that should be applied over the latest one at MAPREDUCE-849 . Also, thanks to Jothi, I am adding a small change to suppress the unwarranted findBugs warning earlier reported by Hudson.
          Hide
          Hemanth Yamijala added a comment -

          I ran test-patch on the updated trunk after MAPREDUCE-849 and Vinod's last patch.

               [exec] +1 overall.
               [exec]
               [exec]     +1 @author.  The patch does not contain any @author tags.
               [exec]
               [exec]     +1 tests included.  The patch appears to include 45 new or modified tests.
               [exec]
               [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
               [exec]
               [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
               [exec]
               [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
               [exec]
               [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
               [exec]
               [exec]
          
          Show
          Hemanth Yamijala added a comment - I ran test-patch on the updated trunk after MAPREDUCE-849 and Vinod's last patch. [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 45 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec]
          Hide
          Hemanth Yamijala added a comment -

          The ant run-commit-test target succeeded. We're also running the full suite of tests in parallel.

          Show
          Hemanth Yamijala added a comment - The ant run-commit-test target succeeded. We're also running the full suite of tests in parallel.
          Hide
          rahul k singh added a comment -

          ran ant test .
          TestJobQueueInformation failed .

          Attaching the new patch with solution for the above testcase.

          Show
          rahul k singh added a comment - ran ant test . TestJobQueueInformation failed . Attaching the new patch with solution for the above testcase.
          Hide
          rahul k singh added a comment -

          submit the patch for hudson run

          Show
          rahul k singh added a comment - submit the patch for hudson run
          Hide
          Hemanth Yamijala added a comment -

          Sigh ! I verified the fix, and it is fine. +1. Running test-patch, etc again. Rahul, can you sanity test the changes on a single node cluster ?

          Show
          Hemanth Yamijala added a comment - Sigh ! I verified the fix, and it is fine. +1. Running test-patch, etc again. Rahul, can you sanity test the changes on a single node cluster ?
          Hide
          Hemanth Yamijala added a comment -

          test-patch:

               [exec] +1 overall.
               [exec]
               [exec]     +1 @author.  The patch does not contain any @author tags.
               [exec]
               [exec]     +1 tests included.  The patch appears to include 47 new or modified tests.
               [exec]
               [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
               [exec]
               [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
               [exec]
               [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
               [exec]
               [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
               [exec]
          
          Show
          Hemanth Yamijala added a comment - test-patch: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 47 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec]
          Hide
          rahul k singh added a comment -

          ran ant test on the existing patch . All passed.

          Show
          rahul k singh added a comment - ran ant test on the existing patch . All passed.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12420052/MAPREDUCE-893-7.patch
          against trunk revision 816735.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 47 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h1.grid.sp2.yahoo.net/1/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h1.grid.sp2.yahoo.net/1/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h1.grid.sp2.yahoo.net/1/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h1.grid.sp2.yahoo.net/1/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12420052/MAPREDUCE-893-7.patch against trunk revision 816735. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 47 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h1.grid.sp2.yahoo.net/1/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h1.grid.sp2.yahoo.net/1/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h1.grid.sp2.yahoo.net/1/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h1.grid.sp2.yahoo.net/1/console This message is automatically generated.
          Hide
          Hemanth Yamijala added a comment -

          I just committed this. Thanks, Vinod and Rahul !

          Show
          Hemanth Yamijala added a comment - I just committed this. Thanks, Vinod and Rahul !
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #57 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/57/)
          . Provides an ability to refresh queue configuration without restarting the JobTracker. Contributed by Vinod Kumar Vavilapalli and Rahul Kumar Singh.

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #57 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/57/ ) . Provides an ability to refresh queue configuration without restarting the JobTracker. Contributed by Vinod Kumar Vavilapalli and Rahul Kumar Singh.

            People

            • Assignee:
              Vinod Kumar Vavilapalli
              Reporter:
              Hemanth Yamijala
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development