Hadoop Common
  1. Hadoop Common
  2. HADOOP-5396

Queue ACLs should be refreshed without requiring a restart of the job tracker

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Job Tracker queue ACLs can be changed without restarting Job Tracker.

      Description

      In large shared deployments of the M/R clusters, it is normal that new users will periodically want to get access to some queues on the M/R framework. Requiring a JT restart for each such change is operationally inconvenient and seems an overkill. There should be a way for updating ACLs with new users without requiring a JT restart.

      1. HADOOP-5396-3-svn.txt
        37 kB
        Vinod Kumar Vavilapalli
      2. HADOOP-5396-4-svn.txt
        29 kB
        Vinod Kumar Vavilapalli
      3. hadoop-5396-1.patch
        24 kB
        Sreekanth Ramakrishnan
      4. cluster_setup.pdf
        36 kB
        Sreekanth Ramakrishnan
      5. commands_manual.pdf
        43 kB
        Sreekanth Ramakrishnan
      6. hadoop-5396-2.patch
        27 kB
        Sreekanth Ramakrishnan
      7. hadoop-5396-3.patch
        30 kB
        Sreekanth Ramakrishnan
      8. hadoop-5396-4.patch
        30 kB
        Sreekanth Ramakrishnan
      9. hadoop-5396-5.patch
        30 kB
        Sreekanth Ramakrishnan
      10. 5396-20.patch
        30 kB
        Robert Chansler

        Activity

        Hide
        Vinod Kumar Vavilapalli added a comment -

        Here are the two alternatives I can think of:

        • Have queue configuration in hadoop- {default|site}

          .xml and reload only the queue related properties. The pro for this proposal is no addition of new files and the con is that changes in any parameters other than the queue related ones leads to a confusion - someone trying to see the config file won't be able to make out the parameters in use by JT and those that have changed.

        • Have a separate configuration file for queue configuration. This adds a new file to be managed but it's very clear as to what the current state of configuration is. Any change needed in queue configuration can be done separately without (accidentally) garbling up other parameters.

        And two for the way in which the file can be reloaded, whether it's the hadoop-site.xml or the separate config file:

        • Run a thread in JT every so often, and reload the config file if the modification time changes.
        • Have a mradmin utility that asks the JT to reload the configuration. This involves adding a new RPC in a new protocol that JT would implement.

        My preference is to have a separate queue configuration file with a mradmin utility to reload the configuration. I think it's clean this way. Thoughts?

        Show
        Vinod Kumar Vavilapalli added a comment - Here are the two alternatives I can think of: Have queue configuration in hadoop- {default|site} .xml and reload only the queue related properties. The pro for this proposal is no addition of new files and the con is that changes in any parameters other than the queue related ones leads to a confusion - someone trying to see the config file won't be able to make out the parameters in use by JT and those that have changed. Have a separate configuration file for queue configuration. This adds a new file to be managed but it's very clear as to what the current state of configuration is. Any change needed in queue configuration can be done separately without (accidentally) garbling up other parameters. And two for the way in which the file can be reloaded, whether it's the hadoop-site.xml or the separate config file: Run a thread in JT every so often, and reload the config file if the modification time changes. Have a mradmin utility that asks the JT to reload the configuration. This involves adding a new RPC in a new protocol that JT would implement. My preference is to have a separate queue configuration file with a mradmin utility to reload the configuration. I think it's clean this way. Thoughts?
        Hide
        Sreekanth Ramakrishnan added a comment -

        I think there was a discussion related to this issue for schedulers to be allowed to re-read the configuration it was discussed on HADOOP-4522, there is a patch also out there which actually implements the mradmin which refreshes the configuration. We should take a look at that.

        In my opinion,adding a seperate configuration file for queue acl's would increase administrators burden to deal with one extra file for getting something done, we should avoid it if required.. Best way should be to add Configuration.reload(String key) that way we can re-read the conf file and then add/update the appropriate key.

        Show
        Sreekanth Ramakrishnan added a comment - I think there was a discussion related to this issue for schedulers to be allowed to re-read the configuration it was discussed on HADOOP-4522 , there is a patch also out there which actually implements the mradmin which refreshes the configuration. We should take a look at that. In my opinion,adding a seperate configuration file for queue acl's would increase administrators burden to deal with one extra file for getting something done, we should avoid it if required.. Best way should be to add Configuration.reload(String key) that way we can re-read the conf file and then add/update the appropriate key.
        Hide
        Hemanth Yamijala added a comment -

        ... adding a seperate configuration file for queue acl's would increase administrators burden to deal with one extra file for getting something done

        Actually, in HADOOP-4348, there is feedback that indicates the other way round. A separate file is what is being viewed as not only desirable, but necessary for managing these kinds of things.

        Given that, I am +1 for Vinod's preference as well.

        Show
        Hemanth Yamijala added a comment - ... adding a seperate configuration file for queue acl's would increase administrators burden to deal with one extra file for getting something done Actually, in HADOOP-4348 , there is feedback that indicates the other way round. A separate file is what is being viewed as not only desirable, but necessary for managing these kinds of things. Given that, I am +1 for Vinod's preference as well.
        Hide
        Sreekanth Ramakrishnan added a comment -

        So, then by adding an extra file would mean that administrators would have to deal with Queue ACL's in seperate file plus Service level acl's in another file? Why don't we atleast merge these two together and use the same configuration file hadoop-policy.xml for the queue related operations. There is already refresh enabled to hadoop-policy.xml, so can we make use of the same?

        Show
        Sreekanth Ramakrishnan added a comment - So, then by adding an extra file would mean that administrators would have to deal with Queue ACL's in seperate file plus Service level acl's in another file? Why don't we atleast merge these two together and use the same configuration file hadoop-policy.xml for the queue related operations. There is already refresh enabled to hadoop-policy.xml , so can we make use of the same?
        Hide
        Vinod Kumar Vavilapalli added a comment -

        As Hemanth pointed out, administrators themselves want a separate file for Queue ACLs for better operations. Also, queue properties cannot be merged into policy file as they are completely independent. Further, hadoop-policy.xml is applicable to all of Hadoop components whereas ACL file is applicable only to mapred JT.

        Owen also, in an offline discussion, +1'ed the proposal for a separate file with a SIGHUP like a mechanism, also reusing mradmin refresh utility.

        So, I am going with this approach, will soon upload a patch.

        Show
        Vinod Kumar Vavilapalli added a comment - As Hemanth pointed out, administrators themselves want a separate file for Queue ACLs for better operations. Also, queue properties cannot be merged into policy file as they are completely independent. Further, hadoop-policy.xml is applicable to all of Hadoop components whereas ACL file is applicable only to mapred JT. Owen also, in an offline discussion, +1'ed the proposal for a separate file with a SIGHUP like a mechanism, also reusing mradmin refresh utility. So, I am going with this approach, will soon upload a patch.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Going into the details:

        • JT implements a AdminOperationsProtocol which will encompass all admin related RPCs. Currently it only has a refreshQueues RPC. RefreshAuthoroziationPolicy cannot be used for this as it is a public interface that serves an independant purpose and used both in JT as well as NN.
               public class jobTracker implements AdminOperationsProtocol ... {
                   .....
                   .....
                  public void refreshQueues() {
                      // reload the mapred-queues.xml
                  }
               }
           
        • Adding another option to MRAdmin to refresh queue properties.
        • Moving mapred.queue.names, mapred.acls, all queue acl related properties moved to mapred-queues.xml

        One issue with the above: We want the AdminOperationsProtocol to be a package private interface so that it can evolve over time. But MRAdmin is in mapred.tools subpackage and so cannot access this protocl unless it is is public. One way to resolve this is by moving MRAdmin tool into the mapred package and deprecating the current mapred.tools.MRAdmin. Thoughts?

        Show
        Vinod Kumar Vavilapalli added a comment - Going into the details: JT implements a AdminOperationsProtocol which will encompass all admin related RPCs. Currently it only has a refreshQueues RPC. RefreshAuthoroziationPolicy cannot be used for this as it is a public interface that serves an independant purpose and used both in JT as well as NN. public class jobTracker implements AdminOperationsProtocol ... { ..... ..... public void refreshQueues() { // reload the mapred-queues.xml } } Adding another option to MRAdmin to refresh queue properties. Moving mapred.queue.names, mapred.acls, all queue acl related properties moved to mapred-queues.xml One issue with the above: We want the AdminOperationsProtocol to be a package private interface so that it can evolve over time. But MRAdmin is in mapred.tools subpackage and so cannot access this protocl unless it is is public. One way to resolve this is by moving MRAdmin tool into the mapred package and deprecating the current mapred.tools.MRAdmin. Thoughts?
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Attaching patch incorporating the above.

        • Also deprecated o.a.h.m.tools.MRAdmin in favour of o.a.h.m.MRAdmin.

        `ant test-patch` gave

             [exec] -1 overall.
             [exec]
             [exec]     +1 @author.  The patch does not contain any @author tags.
             [exec]
             [exec]     +1 tests included.  The patch appears to include 7 new or modified tests.
             [exec]
             [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
             [exec]
             [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
             [exec]
             [exec]     -1 findbugs.  The patch appears to introduce 1 new Findbugs warnings.
             [exec]
             [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
             [exec]
             [exec]     -1 release audit.  The applied patch generated 653 release audit warnings (more than the trunk's current 652 warnings).
        

        The release audit warning is for the newly added conf/mapred-queues.xml. The FindBugs warning "The class name org.apache.hadoop.mapred.tools.MRAdmin shadows the simple name of the superclass org.apache.hadoop.mapred.MRAdmin" is regarding the deprecated MRAdmin class.

        Show
        Vinod Kumar Vavilapalli added a comment - Attaching patch incorporating the above. Also deprecated o.a.h.m.tools.MRAdmin in favour of o.a.h.m.MRAdmin. `ant test-patch` gave [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 7 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] -1 findbugs. The patch appears to introduce 1 new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. [exec] [exec] -1 release audit. The applied patch generated 653 release audit warnings (more than the trunk's current 652 warnings). The release audit warning is for the newly added conf/mapred-queues.xml. The FindBugs warning "The class name org.apache.hadoop.mapred.tools.MRAdmin shadows the simple name of the superclass org.apache.hadoop.mapred.MRAdmin" is regarding the deprecated MRAdmin class.
        Hide
        Doug Cutting added a comment -

        > One way to resolve this is by moving MRAdmin tool into the mapred package and deprecating the current mapred.tools.MRAdmin. Thoughts?

        Eventually we will restructure the mapred code into multiple packages, separating public APIs (for which we'll generate javadoc) from internal APIs (for which we won't). Once that's done, we'll be able to have public classes which are not documented.

        Do we yet have a Jira for this restructuring? If that doesn't happen before this issue is to be committed, then your workaround sounds reasonable.

        Show
        Doug Cutting added a comment - > One way to resolve this is by moving MRAdmin tool into the mapred package and deprecating the current mapred.tools.MRAdmin. Thoughts? Eventually we will restructure the mapred code into multiple packages, separating public APIs (for which we'll generate javadoc) from internal APIs (for which we won't). Once that's done, we'll be able to have public classes which are not documented. Do we yet have a Jira for this restructuring? If that doesn't happen before this issue is to be committed, then your workaround sounds reasonable.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Eventually we will restructure the mapred code into multiple packages, separating public APIs (for which we'll generate javadoc) from internal APIs (for which we won't). Once that's done, we'll be able to have public classes which are not documented.

        Talked to Arun offline and he is also of the same opinioin. In which case, I would let MRAdmin be in the tools sub-package as it is now and have AdminOperationsProtocol as a public interface with a javadoc warning saying that it is not to be used by users directly and that it is prone to changes.

        Show
        Vinod Kumar Vavilapalli added a comment - Eventually we will restructure the mapred code into multiple packages, separating public APIs (for which we'll generate javadoc) from internal APIs (for which we won't). Once that's done, we'll be able to have public classes which are not documented. Talked to Arun offline and he is also of the same opinioin. In which case, I would let MRAdmin be in the tools sub-package as it is now and have AdminOperationsProtocol as a public interface with a javadoc warning saying that it is not to be used by users directly and that it is prone to changes.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Uploading latest patch.

        Show
        Vinod Kumar Vavilapalli added a comment - Uploading latest patch.
        Hide
        Sreekanth Ramakrishnan added a comment -

        Took a look at the patch, everything looks fine, except few minor issues, listed down below:

        • Remove mapred-queues.xml, should be generated when you do build. Remove it from patch.
        • Is the mapred-queues.xml really required to be default resource in JobConf? The QueueManager is the one which requires the list of queues and acls related to it. Reloading of the same is done by AdminProtocol. So this resource ideally should be removed from JobConf.
        • Should we define the queue config resource file name in JobTracker? Can we move it to QueueManager? i.e. we can just change QueueManager.initalize(conf) to add the queue resource file there and continue?

        Minor Nit:

        • There is "\ No newline at end of file" in the file MRAdmin
        Show
        Sreekanth Ramakrishnan added a comment - Took a look at the patch, everything looks fine, except few minor issues, listed down below: Remove mapred-queues.xml, should be generated when you do build. Remove it from patch. Is the mapred-queues.xml really required to be default resource in JobConf? The QueueManager is the one which requires the list of queues and acls related to it. Reloading of the same is done by AdminProtocol. So this resource ideally should be removed from JobConf. Should we define the queue config resource file name in JobTracker? Can we move it to QueueManager? i.e. we can just change QueueManager.initalize(conf) to add the queue resource file there and continue? Minor Nit: There is "\ No newline at end of file" in the file MRAdmin
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12404299/HADOOP-5396-4-svn.txt
        against trunk revision 760783.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 7 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-minerva.apache.org/90/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-minerva.apache.org/90/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-minerva.apache.org/90/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-minerva.apache.org/90/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12404299/HADOOP-5396-4-svn.txt against trunk revision 760783. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 7 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-minerva.apache.org/90/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-minerva.apache.org/90/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-minerva.apache.org/90/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-minerva.apache.org/90/console This message is automatically generated.
        Hide
        Sreekanth Ramakrishnan added a comment -

        Attaching patch incorporating comments and offline comments from Hemanth:

        • The resource file addition is done in QueueManager.
        • The list of queues and if or not acl is enabled is retained within mapred-default.xml
        • Only acls have been moved to new file as mentioned by Vinod.
        • Changes are made backward compatible whenever queue acl is found in mapred-default.xml QueueManager logs a warning. This value is overriden by the value mapred-default.xml
        • Moved documentation to commands manual and partly to cluster setup.
        Show
        Sreekanth Ramakrishnan added a comment - Attaching patch incorporating comments and offline comments from Hemanth: The resource file addition is done in QueueManager . The list of queues and if or not acl is enabled is retained within mapred-default.xml Only acls have been moved to new file as mentioned by Vinod. Changes are made backward compatible whenever queue acl is found in mapred-default.xml QueueManager logs a warning. This value is overriden by the value mapred-default.xml Moved documentation to commands manual and partly to cluster setup.
        Hide
        Sreekanth Ramakrishnan added a comment -

        Attaching the changed documentation in the patch.

        Show
        Sreekanth Ramakrishnan added a comment - Attaching the changed documentation in the patch.
        Hide
        Hemanth Yamijala added a comment -

        Some comments:

        AdminOperationsProtocol & MRAdmin:

        • Should the API be refreshQueueAcls() ?

        QueueManager:

        • QUEUE_PROPERTIES_FILE_NAME: Can be QUEUE_ACLS_FILE_NAME.
        • file name can be mapred-queue-acls.xml. The decision to keep queues and acl enabled flag in mapred-default.xml seems ok to me.
        • Some improved error message ? "Queue ACL in job tracker conf support is deprecated, configure the same in : QUEUE_PROPERTIES_FILE_NAME". Something like: "Configuring queue ACLs in mapred-site.xml or hadoop-site.xml is deprecated. Configure queue ACLs in QUEUE_PROPERTIES_FILE_NAME."
        • Code leading to backward compatibiliy, if we find that one ACL is configured, it is sufficient to warn. I think we can break out at this point.
        • Clearing the schedulerInfoObjects on refresh seems dangerous. These objects are set by the scheduler. If the objects are cleared, and the schedulers are not notified about this, they may not set them back and this would result in loss of information.
        • Since this refresh happens while the system is running, I think more care should be taken towards error handling. For e.g. if an admin makes a mistake in the queue config file - doesn't close a tag, say - and fires the command, currently I think the old information is lost and the new information will be lost too. I think it is mandatory that the old information is not lost and important that the admin be told there's a problem with the configuration.
        • Combining the two comments above, maybe one solution could be to factor out the part parsing the ACLs into a separate method, something like HashMap parseQueueAcls(). This can be called from initialize and the aclsMap member varible should be replaced iff the returned map is valid (like non-null, or so on). The refreshQueueAcls method can have an equivalent call in the QueueManager (i.e. QueueManager.refreshAcls()) which in turn can call the same parseQueueAcls() method. Would this work ?

        Tests:

        • Regarding the test case, I am wondering if we can move to a more conventional unit test model to save on test time. So, we could directly test the logic in the QueueManager by calling the hasAccess API before and after QueueManager.refreshQueueAcls is called. We could set up multiple queues and add/remove users to each queue for both types of operations and verify that the operations after the refresh are appropriately different.
        • This should also test that erroneously created files do not clear the current settings
        • The mapred-queue-acls template file should probably be described as "This is a template file for queue ACLs configuration properties"
        Show
        Hemanth Yamijala added a comment - Some comments: AdminOperationsProtocol & MRAdmin: Should the API be refreshQueueAcls() ? QueueManager: QUEUE_PROPERTIES_FILE_NAME: Can be QUEUE_ACLS_FILE_NAME. file name can be mapred-queue-acls.xml. The decision to keep queues and acl enabled flag in mapred-default.xml seems ok to me. Some improved error message ? "Queue ACL in job tracker conf support is deprecated, configure the same in : QUEUE_PROPERTIES_FILE_NAME". Something like: "Configuring queue ACLs in mapred-site.xml or hadoop-site.xml is deprecated. Configure queue ACLs in QUEUE_PROPERTIES_FILE_NAME." Code leading to backward compatibiliy, if we find that one ACL is configured, it is sufficient to warn. I think we can break out at this point. Clearing the schedulerInfoObjects on refresh seems dangerous. These objects are set by the scheduler. If the objects are cleared, and the schedulers are not notified about this, they may not set them back and this would result in loss of information. Since this refresh happens while the system is running, I think more care should be taken towards error handling. For e.g. if an admin makes a mistake in the queue config file - doesn't close a tag, say - and fires the command, currently I think the old information is lost and the new information will be lost too. I think it is mandatory that the old information is not lost and important that the admin be told there's a problem with the configuration. Combining the two comments above, maybe one solution could be to factor out the part parsing the ACLs into a separate method, something like HashMap parseQueueAcls(). This can be called from initialize and the aclsMap member varible should be replaced iff the returned map is valid (like non-null, or so on). The refreshQueueAcls method can have an equivalent call in the QueueManager (i.e. QueueManager.refreshAcls()) which in turn can call the same parseQueueAcls() method. Would this work ? Tests: Regarding the test case, I am wondering if we can move to a more conventional unit test model to save on test time. So, we could directly test the logic in the QueueManager by calling the hasAccess API before and after QueueManager.refreshQueueAcls is called. We could set up multiple queues and add/remove users to each queue for both types of operations and verify that the operations after the refresh are appropriately different. This should also test that erroneously created files do not clear the current settings The mapred-queue-acls template file should probably be described as "This is a template file for queue ACLs configuration properties"
        Hide
        Hemanth Yamijala added a comment -

        Also as discussed, please log the call when the refresh comes to the jobtracker. The log should contain the user name who issued the refresh request.

        Show
        Hemanth Yamijala added a comment - Also as discussed, please log the call when the refresh comes to the jobtracker. The log should contain the user name who issued the refresh request.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Some comments:

        • I think QueueManager.refresh(Configuration conf) should be deprecated in favour of QueueManager.refresh() which should be added in this patch.
        • Should the API be refreshQueueAcls() ?

          The command should be -refreshQueueAcls instead of -refreshQueueAcl in the same vein. Also, the command summary in MRAdmin.java should read "-refreshQueueAcls: Reload the queue acls\n\t\tJobTracker will reload the mapred-queue-acls file.\n";

        Show
        Vinod Kumar Vavilapalli added a comment - Some comments: I think QueueManager.refresh(Configuration conf) should be deprecated in favour of QueueManager.refresh() which should be added in this patch. Should the API be refreshQueueAcls() ? The command should be -refreshQueueAcls instead of -refreshQueueAcl in the same vein. Also, the command summary in MRAdmin.java should read "-refreshQueueAcls: Reload the queue acls\n\t\tJobTracker will reload the mapred-queue-acls file .\n";
        Hide
        Sreekanth Ramakrishnan added a comment -

        Attaching patch incorporating both Vinod's and Hemanths comments.

        Following changes have been made to QueueManager:

        • Changed initalize(Configuration) to initialize()
        • Created a new method called refreshAcls() in QueueManager which only refreshes queue acls.

        Changed test cases to directly call QueueManager.refreshAcls instead of starting MiniMRCluster

        Show
        Sreekanth Ramakrishnan added a comment - Attaching patch incorporating both Vinod's and Hemanths comments. Following changes have been made to QueueManager : Changed initalize(Configuration) to initialize() Created a new method called refreshAcls() in QueueManager which only refreshes queue acls. Changed test cases to directly call QueueManager.refreshAcls instead of starting MiniMRCluster
        Hide
        Hemanth Yamijala added a comment -

        JobTracker.refreshQueueAcls():

        • Use UserGroupInformation instead of UnixUserGroupInformation.
        • The current patch stores an instance of the Configuration object which is unnecessary. I propose:
          • We remove the refresh() API completely
          • Create a package private refreshAcls(Configuration conf) API
          • Create a new configuration in JobTracker (from the original conf) and send it to the refreshAcls() API.
          • When we remove the support of the ACLs in the mapred-site.xml, we can modify the signature of refreshAcls to not take any configuration parameter.
        • Indentation in the getQueueAcls method seems inconsistent around the try..catch block
        • "Exception thrown while parsing Configuration" : modify this to "Queue ACLs could not be refreshed because there was an exception in parsing the configuration: <Message from exception>. Existing ACLs are retained."
        • Better catch Throwable ? There was an instance where the Configuration threw some uncaught exception in the Job tracker recovery code.
        • The changes in initialize are incompatible. Previously if there's a failure in the initialization code, an exception is thrown back to the JobTracker. I think it is better to retain this behavior. I think the only change required in initialize is to factor out the code related to the parsing of ACLs to the new method getQueueAcls(). From that perspective, I think getQueueAcls() should simply throw the exception and the handling (conversion to IOException with the error message) should be done in refreshAcls()
        • Tests are still validating with single queue. Having multiple queues and multiple users (many of whom are dummy, and one valid) will exercise more of the queuemanager code - just a more conventional way of testing.
        • The test for the erroneous file should be a separate test. It's very good that we are checking the old behavior is retained. It would be nicer to test that the existing user can submit jobs rather than not submit. So in this test case, the initial setup can allow the current UGI to submit jobs and verify after the test that the user can still submit.
        Show
        Hemanth Yamijala added a comment - JobTracker.refreshQueueAcls(): Use UserGroupInformation instead of UnixUserGroupInformation. The current patch stores an instance of the Configuration object which is unnecessary. I propose: We remove the refresh() API completely Create a package private refreshAcls(Configuration conf) API Create a new configuration in JobTracker (from the original conf) and send it to the refreshAcls() API. When we remove the support of the ACLs in the mapred-site.xml, we can modify the signature of refreshAcls to not take any configuration parameter. Indentation in the getQueueAcls method seems inconsistent around the try..catch block "Exception thrown while parsing Configuration" : modify this to "Queue ACLs could not be refreshed because there was an exception in parsing the configuration: <Message from exception>. Existing ACLs are retained." Better catch Throwable ? There was an instance where the Configuration threw some uncaught exception in the Job tracker recovery code. The changes in initialize are incompatible. Previously if there's a failure in the initialization code, an exception is thrown back to the JobTracker. I think it is better to retain this behavior. I think the only change required in initialize is to factor out the code related to the parsing of ACLs to the new method getQueueAcls(). From that perspective, I think getQueueAcls() should simply throw the exception and the handling (conversion to IOException with the error message) should be done in refreshAcls() Tests are still validating with single queue. Having multiple queues and multiple users (many of whom are dummy, and one valid) will exercise more of the queuemanager code - just a more conventional way of testing. The test for the erroneous file should be a separate test. It's very good that we are checking the old behavior is retained. It would be nicer to test that the existing user can submit jobs rather than not submit. So in this test case, the initial setup can allow the current UGI to submit jobs and verify after the test that the user can still submit.
        Hide
        Sreekanth Ramakrishnan added a comment -

        Attaching patch which incorporates Hemanth's comments.

        Show
        Sreekanth Ramakrishnan added a comment - Attaching patch which incorporates Hemanth's comments.
        Hide
        Hemanth Yamijala added a comment -

        Close.. but little work still.

        • I still feel QueueManager.initialize() is doing more than it should. I think this snippet of code is sufficient:
            private void initialize(Configuration conf) {
              aclsEnabled = conf.getBoolean("mapred.acls.enabled", false);
              String[] queues = conf.getStrings("mapred.queue.names", 
                                            new String[] {JobConf.DEFAULT_QUEUE_NAME});
              addToSet(queueNames, queues);
              aclsMap = getQueueAcls(conf);
            }
          

        Does this not seem to work ?

        • In the current patch no warning is being printed if the queue ACLs are present in the mapred-site.xml on initialization. I think the problem will be fixed if we move the call to check deprecation inside the getQueueAcls method, just before the QUEUE_ACLS_FILE_NAME resource is added. I also think the array of queue names can be constructed in the getQueueAcls() method itself. So its signature will look like this:
           private HashMap<String, AccessControlList> getQueueAcls(
                                                        Configuration conf)  {
          
        • The test for the invalid file is printing an incorrect message. It says "User Job Submission Succeeded before refresh.", shouldn't it be failed ?
        • Another point is that the test should have a fail() call after the refreshAcls(). This will ensure the exception is actually thrown, else the test will fail.
        • Commands setup documentation still says refreshQueueAcl.
        Show
        Hemanth Yamijala added a comment - Close.. but little work still. I still feel QueueManager.initialize() is doing more than it should. I think this snippet of code is sufficient: private void initialize(Configuration conf) { aclsEnabled = conf.getBoolean( "mapred.acls.enabled" , false ); String [] queues = conf.getStrings( "mapred.queue.names" , new String [] {JobConf.DEFAULT_QUEUE_NAME}); addToSet(queueNames, queues); aclsMap = getQueueAcls(conf); } Does this not seem to work ? In the current patch no warning is being printed if the queue ACLs are present in the mapred-site.xml on initialization. I think the problem will be fixed if we move the call to check deprecation inside the getQueueAcls method, just before the QUEUE_ACLS_FILE_NAME resource is added. I also think the array of queue names can be constructed in the getQueueAcls() method itself. So its signature will look like this: private HashMap< String , AccessControlList> getQueueAcls( Configuration conf) { The test for the invalid file is printing an incorrect message. It says "User Job Submission Succeeded before refresh.", shouldn't it be failed ? Another point is that the test should have a fail() call after the refreshAcls(). This will ensure the exception is actually thrown, else the test will fail. Commands setup documentation still says refreshQueueAcl.
        Hide
        Sreekanth Ramakrishnan added a comment -

        Attaching patch incorporating Hemanth's comments:

        • The initalize method is changed to way in the comment.
        • CheckDeprecation and getQueueAcls now take only Configuration as argument. The queuenames are gotten from set queuenames in QueueManager.
        • Test case reorganized for readability.
        • Corrected the documentation.
        Show
        Sreekanth Ramakrishnan added a comment - Attaching patch incorporating Hemanth's comments: The initalize method is changed to way in the comment. CheckDeprecation and getQueueAcls now take only Configuration as argument. The queuenames are gotten from set queuenames in QueueManager. Test case reorganized for readability. Corrected the documentation.
        Hide
        Hemanth Yamijala added a comment -

        Looks fine. One minor nit. In refreshAcls, we have an unused queues variable. I think this might flag a findbugs warning. Either way, better to remove it.

        After this, can you please make this patch available, because it is good to go ?

        Show
        Hemanth Yamijala added a comment - Looks fine. One minor nit. In refreshAcls, we have an unused queues variable. I think this might flag a findbugs warning. Either way, better to remove it. After this, can you please make this patch available, because it is good to go ?
        Hide
        Sreekanth Ramakrishnan added a comment -

        Removing the unused variables.

        Show
        Sreekanth Ramakrishnan added a comment - Removing the unused variables.
        Hide
        Sreekanth Ramakrishnan added a comment -

        Running thro' Hudson

        Show
        Sreekanth Ramakrishnan added a comment - Running thro' Hudson
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Tested the patch with various test scenarios including 1) different acls in both mapred-site.xml and mapred-queue-acls.xml and 2) malformed config files. All the tests passed. +1.

        Show
        Vinod Kumar Vavilapalli added a comment - Tested the patch with various test scenarios including 1) different acls in both mapred-site.xml and mapred-queue-acls.xml and 2) malformed config files. All the tests passed. +1.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12405404/hadoop-5396-5.patch
        against trunk revision 765427.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 10 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/199/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/199/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/199/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/199/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12405404/hadoop-5396-5.patch against trunk revision 765427. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 10 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/199/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/199/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/199/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/199/console This message is automatically generated.
        Hide
        Sreekanth Ramakrishnan added a comment -

        The test failures are not related to this patch.

        Show
        Sreekanth Ramakrishnan added a comment - The test failures are not related to this patch.
        Hide
        Hemanth Yamijala added a comment -

        I just committed this. Thanks, Sreekanth and Vinod !

        Show
        Hemanth Yamijala added a comment - I just committed this. Thanks, Sreekanth and Vinod !
        Hide
        Hudson added a comment -
        Show
        Hudson added a comment - Integrated in Hadoop-trunk #811 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/811/ )
        Hide
        Robert Chansler added a comment -

        Attached example for earlier version not to be committed.

        Show
        Robert Chansler added a comment - Attached example for earlier version not to be committed.
        Hide
        Robert Chansler added a comment -

        Editorial pass over all release notes prior to publication of 0.21.

        Show
        Robert Chansler added a comment - Editorial pass over all release notes prior to publication of 0.21.

          People

          • Assignee:
            Vinod Kumar Vavilapalli
            Reporter:
            Hemanth Yamijala
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development