Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1542

Deprecate mapred.permissions.supergroup in favor of hadoop.cluster.administrators

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: security
    • Labels:

      Description

      HADOOP-6568 added the configuration hadoop.cluster.administrators through which admins can configure who the superusers/supergroups for the cluster are. MAPREDUCE itself already has mapred.permissions.supergroup (which is just a single group). As agreed upon at HADOOP-6568, this should be deprecated in favor of hadoop.cluster.administrators.

      1. 1542.20S.1.patch
        39 kB
        Ravi Gummadi
      2. 1542.20S.patch
        38 kB
        Ravi Gummadi
      3. 1542.patch
        34 kB
        Ravi Gummadi
      4. 1542.v1.patch
        36 kB
        Ravi Gummadi
      5. mapreduce-1542-y20s.patch
        29 kB
        Hemanth Yamijala

        Issue Links

          Activity

          Hide
          Amareshwari Sriramadasu added a comment -

          Closing this as Won't fix in favor of HADOOP-6748 and MAPREDCE-1754.
          Please reopen if you disagree.

          Show
          Amareshwari Sriramadasu added a comment - Closing this as Won't fix in favor of HADOOP-6748 and MAPREDCE-1754. Please reopen if you disagree.
          Hide
          Amareshwari Sriramadasu added a comment -

          I created HADOOP-6748, MAPREDUCE-1754 and HDFS-1130 for the proposal @ this

          Show
          Amareshwari Sriramadasu added a comment - I created HADOOP-6748 , MAPREDUCE-1754 and HDFS-1130 for the proposal @ this
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Allen, you could say we didn't understand your intent. Clearly there are differences between the way devs and the services folks think. We could have worked better to reach a proper consensus earlier itself on the JIRA. Anyways, it's good that atleast now everyone is in agreement and we have the right solution finally..

          Show
          Vinod Kumar Vavilapalli added a comment - Allen, you could say we didn't understand your intent. Clearly there are differences between the way devs and the services folks think. We could have worked better to reach a proper consensus earlier itself on the JIRA. Anyways, it's good that atleast now everyone is in agreement and we have the right solution finally..
          Hide
          Allen Wittenauer added a comment -

          You clearly don't want to know my thoughts, given the conversation that happened in this very jira months ago.

          Show
          Allen Wittenauer added a comment - You clearly don't want to know my thoughts, given the conversation that happened in this very jira months ago.
          Hide
          Amareshwari Sriramadasu added a comment -

          After discussing with Grid operations at Yahoo, we realized that they are in favor of having separate configuration properties for MAPREDUCE and HDFS projects, because usually the conf directory is not different for MAPREDUCE and HDFS, it is just a single HADOOP_CONF_DIR; If the same property name is used for both the administrators it will be confusing and might have other problems if both files are loaded.
          Currently we have hadoop.clusrer.administrators, mapred.permissions.supergroup and dfs.permissions.supergroup configuration properties for controlling admins. I propose the following change in the configuration for fixing administrator access:
          1. Remove hadoop.cluster.administrators
          2. Replace mapred.permissions.supergroup with an acl: mapreduce.cluster.administrators
          3. The change in HDFS can be decided later, in a different HDFS issue. Will leave dfs.persmissions.supergrop as is, for now.
          So, when individual daemons create HTTPServer they will set appropriate administrator acl for the HTTPServer. Here, mapred daemons will set the value of mapreduce.cluster.administrators and HDFS daemons will set ' '(space) followed by hdfs.persmissions.supergrop.

          Thoughts?

          Show
          Amareshwari Sriramadasu added a comment - After discussing with Grid operations at Yahoo, we realized that they are in favor of having separate configuration properties for MAPREDUCE and HDFS projects, because usually the conf directory is not different for MAPREDUCE and HDFS, it is just a single HADOOP_CONF_DIR; If the same property name is used for both the administrators it will be confusing and might have other problems if both files are loaded. Currently we have hadoop.clusrer.administrators, mapred.permissions.supergroup and dfs.permissions.supergroup configuration properties for controlling admins. I propose the following change in the configuration for fixing administrator access: 1. Remove hadoop.cluster.administrators 2. Replace mapred.permissions.supergroup with an acl: mapreduce.cluster.administrators 3. The change in HDFS can be decided later, in a different HDFS issue. Will leave dfs.persmissions.supergrop as is, for now. So, when individual daemons create HTTPServer they will set appropriate administrator acl for the HTTPServer. Here, mapred daemons will set the value of mapreduce.cluster.administrators and HDFS daemons will set ' '(space) followed by hdfs.persmissions.supergrop. Thoughts?
          Hide
          Ravi Gummadi added a comment -

          > So, why can't the mrowner and acls be created inside this class when it is instantiated ? Then, isMROwnerOrAdmin can be implemented directly in JobACLsManager itself.

          Moved adminsACL and the method isMROwnerOrAdmin() to JobACLsManager. Since mrOwner is used in JobTracker and TaskTracker at few other places, keeping mrOwner as part of JobTracker and TaskTracker objects instead of moving to JobACLsManager.

          Attaching patch for earlier version of hadoop with review comments incorporated. Not for commit here.

          Show
          Ravi Gummadi added a comment - > So, why can't the mrowner and acls be created inside this class when it is instantiated ? Then, isMROwnerOrAdmin can be implemented directly in JobACLsManager itself. Moved adminsACL and the method isMROwnerOrAdmin() to JobACLsManager. Since mrOwner is used in JobTracker and TaskTracker at few other places, keeping mrOwner as part of JobTracker and TaskTracker objects instead of moving to JobACLsManager. Attaching patch for earlier version of hadoop with review comments incorporated. Not for commit here.
          Hide
          Ravi Gummadi added a comment -

          Right Hemanth.

          Show
          Ravi Gummadi added a comment - Right Hemanth.
          Hide
          Hemanth Yamijala added a comment -

          I looked again at TestWebUIAuthorization. It does look like it is covering everything required. Can you confirm if my understanding on this test case is right:

          Basically, we have configured adminGroup as the admin ACL. 'admin' user is part of the adminGroup (as defined in the custom user to group mapping). Then we drive all operations setting 'admin' as the user. All of these will cover the code path of checking the adminUser because these will be mapped to the adminGroup which is in the administrators ACL. Correct ?

          Show
          Hemanth Yamijala added a comment - I looked again at TestWebUIAuthorization. It does look like it is covering everything required. Can you confirm if my understanding on this test case is right: Basically, we have configured adminGroup as the admin ACL. 'admin' user is part of the adminGroup (as defined in the custom user to group mapping). Then we drive all operations setting 'admin' as the user. All of these will cover the code path of checking the adminUser because these will be mapped to the adminGroup which is in the administrators ACL. Correct ?
          Hide
          Ravi Gummadi added a comment -

          Thanks Hemanth for the review. It is fine to remove HDFS related stuff from this patch.

          1. I think it makes sense to add checks for queue ACLs refresh, service refresh and user-to-group mapping refresh also similar to node refresh. I don't see a semantic difference between these operations and node refresh and they should be implemented in the same way, I think. But I am OK if these are taken up in a following JIRA, as their addition may not be in the scope of this JIRA.

          Right. Let us do it in a follow-up JIRA.

          1. Regarding tests, I was expecting to see tests that set up users and groups in the hadoop.cluster.administrators ACL and then checks that various operations of view and modify succeed with combinations of both allowed and disallowed users. Exhaustive tests need not be end-to-end I think - i.e. they need not run real jobs. Since most of the code goes through JobInProgress.checkAccess, can we just create fake objects and test the checkAccess method ? And maybe have some end-to-end tests for very important functions like killJob, killTask and view job details in a JSP.

          TestWebUIAuthorization code changes of the patch already check
          (a) view-job as admin through almost all JSPs,
          (b) modify-job as admin through almost all JSPs – includes killJob, killTask, setJobPriority.
          TestNodeRefresh checks refreshNodes() as admin.
          I guess these are covering tests we want ?

          Will incorporate other comments given above and upload new patch.

          Show
          Ravi Gummadi added a comment - Thanks Hemanth for the review. It is fine to remove HDFS related stuff from this patch. I think it makes sense to add checks for queue ACLs refresh, service refresh and user-to-group mapping refresh also similar to node refresh. I don't see a semantic difference between these operations and node refresh and they should be implemented in the same way, I think. But I am OK if these are taken up in a following JIRA, as their addition may not be in the scope of this JIRA. Right. Let us do it in a follow-up JIRA. Regarding tests, I was expecting to see tests that set up users and groups in the hadoop.cluster.administrators ACL and then checks that various operations of view and modify succeed with combinations of both allowed and disallowed users. Exhaustive tests need not be end-to-end I think - i.e. they need not run real jobs. Since most of the code goes through JobInProgress.checkAccess, can we just create fake objects and test the checkAccess method ? And maybe have some end-to-end tests for very important functions like killJob, killTask and view job details in a JSP. TestWebUIAuthorization code changes of the patch already check (a) view-job as admin through almost all JSPs, (b) modify-job as admin through almost all JSPs – includes killJob, killTask, setJobPriority. TestNodeRefresh checks refreshNodes() as admin. I guess these are covering tests we want ? Will incorporate other comments given above and upload new patch.
          Hide
          Hemanth Yamijala added a comment -

          I looked at this patch, and have a few comments:

          • HttpServer: In the error message, is 'clusterOwner' a familiar term ? Maybe 'User who started the cluster' while more verbose conveys the meaning better.
          • JobTracker: JobConf.MR_SUPERGROUP reads mapred.permissions.supergroup. Error message here says mapreduce.cluster.permissions.supergroup.
          • JobTracker.checkAccess is saying that mrOwner and admins can do any operation on the job; however the code is checking queue ACLs first. Hence if the UGI does not have access to do the appropriate operation for the queue, this documentation will be incorrect.
          • JobACLsManager.checkAccess has a typo in javadoc: "The mrOwner and admins always" -> 'are' is missing.
          • One reference to mapred.permissions.supergroup in description of mapreduce.job.acl-modify-job in mapred-default.xml
          • isMROwnerOrAdmin seems out of place in JobTracker. Note that it takes the owner and adminsACL as parameters, and hence can live in any class. Had this not been the case, then the API is fine. I see JobACLsManager as the class responsible for all ACLs checks. So, why can't the mrowner and acls be created inside this class when it is instantiated ? Then, isMROwnerOrAdmin can be implemented directly in JobACLsManager itself. This will also avoid the static call dependencies between TaskTracker classes and JobTracker which also seem a little odd.
          • I think it makes sense to add checks for queue ACLs refresh, service refresh and user-to-group mapping refresh also similar to node refresh. I don't see a semantic difference between these operations and node refresh and they should be implemented in the same way, I think. But I am OK if these are taken up in a following JIRA, as their addition may not be in the scope of this JIRA.
          • Regarding tests, I was expecting to see tests that set up users and groups in the hadoop.cluster.administrators ACL and then checks that various operations of view and modify succeed with combinations of both allowed and disallowed users. Exhaustive tests need not be end-to-end I think - i.e. they need not run real jobs. Since most of the code goes through JobInProgress.checkAccess, can we just create fake objects and test the checkAccess method ? And maybe have some end-to-end tests for very important functions like killJob, killTask and view job details in a JSP.
          Show
          Hemanth Yamijala added a comment - I looked at this patch, and have a few comments: HttpServer: In the error message, is 'clusterOwner' a familiar term ? Maybe 'User who started the cluster' while more verbose conveys the meaning better. JobTracker: JobConf.MR_SUPERGROUP reads mapred.permissions.supergroup. Error message here says mapreduce.cluster.permissions.supergroup. JobTracker.checkAccess is saying that mrOwner and admins can do any operation on the job; however the code is checking queue ACLs first. Hence if the UGI does not have access to do the appropriate operation for the queue, this documentation will be incorrect. JobACLsManager.checkAccess has a typo in javadoc: "The mrOwner and admins always" -> 'are' is missing. One reference to mapred.permissions.supergroup in description of mapreduce.job.acl-modify-job in mapred-default.xml isMROwnerOrAdmin seems out of place in JobTracker. Note that it takes the owner and adminsACL as parameters, and hence can live in any class. Had this not been the case, then the API is fine. I see JobACLsManager as the class responsible for all ACLs checks. So, why can't the mrowner and acls be created inside this class when it is instantiated ? Then, isMROwnerOrAdmin can be implemented directly in JobACLsManager itself. This will also avoid the static call dependencies between TaskTracker classes and JobTracker which also seem a little odd. I think it makes sense to add checks for queue ACLs refresh, service refresh and user-to-group mapping refresh also similar to node refresh. I don't see a semantic difference between these operations and node refresh and they should be implemented in the same way, I think. But I am OK if these are taken up in a following JIRA, as their addition may not be in the scope of this JIRA. Regarding tests, I was expecting to see tests that set up users and groups in the hadoop.cluster.administrators ACL and then checks that various operations of view and modify succeed with combinations of both allowed and disallowed users. Exhaustive tests need not be end-to-end I think - i.e. they need not run real jobs. Since most of the code goes through JobInProgress.checkAccess, can we just create fake objects and test the checkAccess method ? And maybe have some end-to-end tests for very important functions like killJob, killTask and view job details in a JSP.
          Hide
          Hemanth Yamijala added a comment -

          The patch Ravi attached earlier on had changes for HDFS also. After discussing with him, Vinod and Devaraj, we feel that the changes in HDFS are not required at the moment and do not form the scope of this JIRA. I am attaching a new patch that removes those changes. I will be reviewing this. The patch is still not meant for commit here.

          Show
          Hemanth Yamijala added a comment - The patch Ravi attached earlier on had changes for HDFS also. After discussing with him, Vinod and Devaraj, we feel that the changes in HDFS are not required at the moment and do not form the scope of this JIRA. I am attaching a new patch that removes those changes. I will be reviewing this. The patch is still not meant for commit here.
          Hide
          Ravi Gummadi added a comment -

          Attaching patch for earlier version of hadoop. Not for commit here.

          Show
          Ravi Gummadi added a comment - Attaching patch for earlier version of hadoop. Not for commit here.
          Hide
          Ravi Gummadi added a comment -

          Configuration without default resources does not load *site.xml files also. So needs to change the logic in buildAdminsACL().

          Show
          Ravi Gummadi added a comment - Configuration without default resources does not load *site.xml files also. So needs to change the logic in buildAdminsACL().
          Hide
          Ravi Gummadi added a comment -

          Attaching patch with review comments incorporated.

          Show
          Ravi Gummadi added a comment - Attaching patch with review comments incorporated.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          One more thing, we really don't need the wait loops for the job state to change to KILLED in TestWebUIAuthorization, because we are disabling job-setup/job-cleanup tasks.

          Show
          Vinod Kumar Vavilapalli added a comment - One more thing, we really don't need the wait loops for the job state to change to KILLED in TestWebUIAuthorization, because we are disabling job-setup/job-cleanup tasks.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          The patch has gone stale, needs update w.r.t TestWebUIAuthorization.

          JobTracker.java

          • The log statement after building acls should also log the configured admin acls. Same in TaskTracker.java
          • buildAdminACLs()
            • Instead of getting the new configruation property and comparing it with mrOwner, it is better to create a Configuration object without loading default resources (Hemanth's smart tip!), get the configuration parameter and compare it will null.
            • If the new configuration is not present, there is no need for augmenting mrOwner also to the ACL as we specially check for mrOwner everywhere. Correspondingly buildACLs() doesn't need to take mrOwner as a parameter.

          Fix the javadoc for MR_SUPERGROUP and JT_SUPERGROUP. The link should be "

          {@link CommonConfigurationKeys#HADOOP_CLUSTER_ADMINISTRATORS_PROPERTY}

          "

          All tests should use the new configuration property only. We can add a simple unit test for handling deprecation in TestJobConf.

          Still, there are some occurrences of the old configuration property. Can you do a (case-insensitive) grep for 'supergroup' on the source code and replace'em all?

          Show
          Vinod Kumar Vavilapalli added a comment - The patch has gone stale, needs update w.r.t TestWebUIAuthorization. JobTracker.java The log statement after building acls should also log the configured admin acls. Same in TaskTracker.java buildAdminACLs() Instead of getting the new configruation property and comparing it with mrOwner, it is better to create a Configuration object without loading default resources (Hemanth's smart tip!), get the configuration parameter and compare it will null. If the new configuration is not present, there is no need for augmenting mrOwner also to the ACL as we specially check for mrOwner everywhere. Correspondingly buildACLs() doesn't need to take mrOwner as a parameter. Fix the javadoc for MR_SUPERGROUP and JT_SUPERGROUP. The link should be " {@link CommonConfigurationKeys#HADOOP_CLUSTER_ADMINISTRATORS_PROPERTY} " All tests should use the new configuration property only. We can add a simple unit test for handling deprecation in TestJobConf. Still, there are some occurrences of the old configuration property. Can you do a (case-insensitive) grep for 'supergroup' on the source code and replace'em all?
          Hide
          Ravi Gummadi added a comment -

          Attaching patch that makes mapreduce.cluster.permissions.supergroup deprecated and makes JobTracker ans TaskTracker use hadoop.cluster.administrators.

          Please review and provide your comments.

          Show
          Ravi Gummadi added a comment - Attaching patch that makes mapreduce.cluster.permissions.supergroup deprecated and makes JobTracker ans TaskTracker use hadoop.cluster.administrators. Please review and provide your comments.
          Hide
          Ravi Gummadi added a comment -

          OK. Planning to make the config property mapred.permissions.supergroup work. This is done by doing some code changes spicifically for this config property's compatinility. This should be fine as we need this config property only when daemons are starting.

          Show
          Ravi Gummadi added a comment - OK. Planning to make the config property mapred.permissions.supergroup work. This is done by doing some code changes spicifically for this config property's compatinility. This should be fine as we need this config property only when daemons are starting.
          Hide
          Ravi Gummadi added a comment -

          Removing mapred.permissions.supergroup and the recently renamed mapreduce.jobtracker.permissions.supergroup in favour of the config hadoop.cluster.administrators that can take sequence of users followed by sequence of groups.

          As mapred.permissions.supergroup is not documented anywhere and as maintaining compatibility between mapred.permissions.supergroup and hadoop.cluster.administrators seems to be very difficult(would need lot of hacking in the code as the values for these 2 keys are different and in different formats), the old config will not be supported with this patch.

          Show
          Ravi Gummadi added a comment - Removing mapred.permissions.supergroup and the recently renamed mapreduce.jobtracker.permissions.supergroup in favour of the config hadoop.cluster.administrators that can take sequence of users followed by sequence of groups. As mapred.permissions.supergroup is not documented anywhere and as maintaining compatibility between mapred.permissions.supergroup and hadoop.cluster.administrators seems to be very difficult(would need lot of hacking in the code as the values for these 2 keys are different and in different formats), the old config will not be supported with this patch.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Allen, configuration of commong components is bound to be like this because of the fact that we have a common codebase/project which both MAPREDUCE and HDFS use. The configuration of the common components like Http servers, RPC servers/clients will not have mapred/hdfs specific naming conventions. For e.g., I don't see ipc.server.listen.queue.size or ipc.client.connect.max.retries being split into mapred and hdfs specific configuration items despite the fact that JT and NN need different values. I think hadoop.cluster.administrators also falls under the same category.

          Show
          Vinod Kumar Vavilapalli added a comment - Allen, configuration of commong components is bound to be like this because of the fact that we have a common codebase/project which both MAPREDUCE and HDFS use. The configuration of the common components like Http servers, RPC servers/clients will not have mapred/hdfs specific naming conventions. For e.g., I don't see ipc.server.listen.queue.size or ipc.client.connect.max.retries being split into mapred and hdfs specific configuration items despite the fact that JT and NN need different values. I think hadoop.cluster.administrators also falls under the same category.
          Hide
          Allen Wittenauer added a comment -

          Every time someone says "oh you can use a separate config file" I need to start thanking them. It just means that Hadoop is that much more operationally complex and helps to make sure that I'll have future employment.

          Show
          Allen Wittenauer added a comment - Every time someone says "oh you can use a separate config file" I need to start thanking them. It just means that Hadoop is that much more operationally complex and helps to make sure that I'll have future employment.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Allen, it is NOT assumed that both frameworks are owned by a single set of admins. Even though the configuration property is named the same, HDFS and MAPRED can both be configured separately to use different set of admins by setting the configuration separately in their own config files.

          Show
          Vinod Kumar Vavilapalli added a comment - Allen, it is NOT assumed that both frameworks are owned by a single set of admins. Even though the configuration property is named the same, HDFS and MAPRED can both be configured separately to use different set of admins by setting the configuration separately in their own config files.
          Hide
          Allen Wittenauer added a comment -

          Wait, what? This is a horrible idea. I can't believe that such a major shift was hidden in another jira.

          This makes the assumption that both frameworks are owned by a single set of admins.

          Show
          Allen Wittenauer added a comment - Wait, what? This is a horrible idea. I can't believe that such a major shift was hidden in another jira. This makes the assumption that both frameworks are owned by a single set of admins.

            People

            • Assignee:
              Ravi Gummadi
              Reporter:
              Vinod Kumar Vavilapalli
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development