Hadoop Common
  1. Hadoop Common
  2. HADOOP-6944

[Herriot] Implement a functionality for getting proxy users definitions like groups and hosts.

    Details

    • Type: Task Task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.21.1
    • Fix Version/s: 0.21.1
    • Component/s: test
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      I have just committed this to 0.21 and trunk. Thanks Vinay.
    • Tags:
      herriot

      Description

      Gridmix should require a proxy user's file for impersonating various jobs. So, implement couple of methods for getting the proxy users list and a proxy users file (it's a combination of proxy users and groups) based on cluster configuration.

      The proxy users list should require for map reduce jobs and proxy users file should require for gridmix jobs.

      The following are methods signature,
      public ProxyUserDefinitions getHadoopProxyUsers() - get the list of proxy users list based on cluster configuration.

      1. HADOOP-6944.patch
        1 kB
        Vinay Kumar Thota
      2. HADOOP-6944.patch
        4 kB
        Vinay Kumar Thota
      3. HADOOP-6944.patch
        6 kB
        Vinay Kumar Thota
      4. HADOOP-6944.patch
        7 kB
        Vinay Kumar Thota
      5. HADOOP-6944.patch
        7 kB
        Konstantin Boudnik

        Issue Links

          Activity

          Hide
          Vinay Kumar Thota added a comment -

          Initial Patch

          Show
          Vinay Kumar Thota added a comment - Initial Patch
          Hide
          Konstantin Boudnik added a comment -

          I have went over that concept of proxy users once more and I think I have some
          issues with it.

          Here they are:

          • essentially the new proxyuser file doesn't have any new information to the
            mix but is simply reformat whatever is already available in the
            local-superuser-conf.xml or else
          • because this reformatting is happening manually this is error prone
          • besides it complicates Herriot configuration without any need for it

          Instead of adding up more complexities I'd suggest to scrap the very notion of
          proxyuser file and property and change the logic of the
          public ArrayList<String> getHadoopMultiUsersList()
          to extract that information from configuration object. If a cluster has
          proxyusers configured then you'll have to have something like this
          hadoop.proxyuser.sup_user.groups: users
          hadoop.proxyuser.sup_user.hosts: host1.domain.com,host2.domain.com
          in the daemons configurations.
          It'd be easy to convert this information into the list you want/need.

          Show
          Konstantin Boudnik added a comment - I have went over that concept of proxy users once more and I think I have some issues with it. Here they are: essentially the new proxyuser file doesn't have any new information to the mix but is simply reformat whatever is already available in the local-superuser-conf.xml or else because this reformatting is happening manually this is error prone besides it complicates Herriot configuration without any need for it Instead of adding up more complexities I'd suggest to scrap the very notion of proxyuser file and property and change the logic of the public ArrayList<String> getHadoopMultiUsersList() to extract that information from configuration object. If a cluster has proxyusers configured then you'll have to have something like this hadoop.proxyuser.sup_user.groups: users hadoop.proxyuser.sup_user.hosts: host1.domain.com,host2.domain.com in the daemons configurations. It'd be easy to convert this information into the list you want/need.
          Hide
          Vinay Kumar Thota added a comment -

          I agreed your idea, however the local-super-conf.xml file contains different users which serves for other purposes. Is it ok to impersonate the test jobs with those users instead of using a user which is specific to testing purpose?
          Actually i thought that there are some users which specific to testing and those users can utilize for impersonating the test jobs. In the current functionality the testing users are available in the proxyusers file and we pick them using the getHadoopMultiUsersList methis and same way we can use the file for gridmix testing also for impersonating the gridmix jobs.

          Show
          Vinay Kumar Thota added a comment - I agreed your idea, however the local-super-conf.xml file contains different users which serves for other purposes. Is it ok to impersonate the test jobs with those users instead of using a user which is specific to testing purpose? Actually i thought that there are some users which specific to testing and those users can utilize for impersonating the test jobs. In the current functionality the testing users are available in the proxyusers file and we pick them using the getHadoopMultiUsersList methis and same way we can use the file for gridmix testing also for impersonating the gridmix jobs.
          Hide
          Konstantin Boudnik added a comment -

          Well, the fact that you are seeing different users in this file lead to the conclusion that the deployment process of configs is broken. Apparently in the correct deployment only cluster specific proxy users should be added to the list.

          Another thing about proxy users is that they normally should have a set of hosts from where they are allowed to perform their impersonation. That means that the way proxyusers are implemented for Herriot are error prone because one has to make sure that these users are accurately configured in superusers configuartaion file.

          This said, proxyusers file doesn't solve any problems but rather adds new ones. This is good idea though to have an API to extract the list of proxy users from the configuration of a daemon. It also seems to be beneficial to provide an information about allowed hosts for the users in question.

          As for gridmix testing: providing a file means that gridmix tests will have to parse it as well. grdmix uses Hadoop Configuration class and the tests should be able to benefit from this fact, e.g. it will do xml parsing and provide all needed API to access its content.

          In other words I'm strongly opposing this proxyuser thing. The fact that we have this in the code is partially my fault: I had to think it over better when I was reviewing the patch initially.

          Show
          Konstantin Boudnik added a comment - Well, the fact that you are seeing different users in this file lead to the conclusion that the deployment process of configs is broken. Apparently in the correct deployment only cluster specific proxy users should be added to the list. Another thing about proxy users is that they normally should have a set of hosts from where they are allowed to perform their impersonation. That means that the way proxyusers are implemented for Herriot are error prone because one has to make sure that these users are accurately configured in superusers configuartaion file. This said, proxyusers file doesn't solve any problems but rather adds new ones. This is good idea though to have an API to extract the list of proxy users from the configuration of a daemon. It also seems to be beneficial to provide an information about allowed hosts for the users in question. As for gridmix testing: providing a file means that gridmix tests will have to parse it as well. grdmix uses Hadoop Configuration class and the tests should be able to benefit from this fact, e.g. it will do xml parsing and provide all needed API to access its content. In other words I'm strongly opposing this proxyuser thing. The fact that we have this in the code is partially my fault: I had to think it over better when I was reviewing the patch initially.
          Hide
          Vinay Kumar Thota added a comment -

          Attached latest patch by addressing cos comments.

          Show
          Vinay Kumar Thota added a comment - Attached latest patch by addressing cos comments.
          Hide
          Konstantin Boudnik added a comment -

          Ok, looks more sensible now. One question: if a gridmix in question has the access to Herriot APIs (such as proposed one) then why the functionality in the patch can't be implemented by the test? Please correct me if I'm wrong and gridmix tests don't have an access to Herriot APIs aleady.

          It seems to me that you want to create new functionality that is required by just certain types of tests only, which doesn't look right to me.

          Besides, writing a content to a file looks potentially thread unsafe.

          Show
          Konstantin Boudnik added a comment - Ok, looks more sensible now. One question: if a gridmix in question has the access to Herriot APIs (such as proposed one) then why the functionality in the patch can't be implemented by the test? Please correct me if I'm wrong and gridmix tests don't have an access to Herriot APIs aleady. It seems to me that you want to create new functionality that is required by just certain types of tests only, which doesn't look right to me. Besides, writing a content to a file looks potentially thread unsafe.
          Hide
          Vinay Kumar Thota added a comment -

          The getProxyUsersFile() method is using in the test. Gridmix should require a proxy users file whenever it tries to impersonate girdmix jobs with different user. I am using the method in the test for one the tests.

          It seems to me that you want to create new functionality that is required by just certain types of tests only, which doesn't look right to me.

          I agreed your point. However, initially I thought of implementing this functionality in gridmix utilities. But I felt that, if I implement under common space, it would be useful for other components also if they require in future. If you are not ok to keep it in common space, let me know. So that I will move this functionality under gridmix utilities.

          Show
          Vinay Kumar Thota added a comment - The getProxyUsersFile() method is using in the test. Gridmix should require a proxy users file whenever it tries to impersonate girdmix jobs with different user. I am using the method in the test for one the tests. It seems to me that you want to create new functionality that is required by just certain types of tests only, which doesn't look right to me. I agreed your point. However, initially I thought of implementing this functionality in gridmix utilities. But I felt that, if I implement under common space, it would be useful for other components also if they require in future. If you are not ok to keep it in common space, let me know. So that I will move this functionality under gridmix utilities.
          Hide
          Konstantin Boudnik added a comment -

          I suggest to make the following changes:

          • change the signature of getHadoopMultiUsersList to
            public ProxyUserDefinitions getHadoopMultiUsersList() throws IOException
            and then
            public abstract class ProxyUserDefinitions {
              class GroupsAndHost {
                private List<String> groups;
                private List<String> hosts;
                public List<String> getGroups() {
                  return groups;
                }
                public void setGroups(List<String> groups) {
                  this.groups = groups;
                }
                public List<String> getHosts() {
                  return hosts;
                }
                public void setHosts(List<String> hosts) {
                  this.hosts = hosts;
                }
              }
              
              protected Map<String, GroupsAndHost> proxyUsers;
              protected ProxyUserDefinitions () {
                proxyUsers = new HashMap<String, GroupsAndHost>();
              }  
              public void addProxyUser (String userName, GroupsAndHost definitions) {
                proxyUsers.put(userName, definitions);
              }
              public GroupsAndHost getProxyUser (String userName) {
                return proxyUsers.get(userName);
              }
            
              /**
               * The implementation of this method has to be provided by a child of the class
               * @param filePath
               * @return
               * @throws IOException
               */
              public abstract boolean writeToFile(URI filePath) throws IOException;
            }

          so you will be able to store information in this form username: group[,group]* as well as the hostnames from where proxy users are allowed to connect. The purpose of this new method will be to extract only proxy-users related information from a daemon configuration and present it in the form a data container (above).

          The data container provides the method writeToFile which has to be overwritten by an implementer in order to provide needed logic of how the content of object is stored into a file. Thus, you can inherit the class in your testcase (or whatever) and create writeToFile implementation you need (basically, the second method getProxyUsersFile of your patch with some small modificaitons) for gridmix without adding unrelated functionality to Herriot base classes.

          Makes sense?

          Show
          Konstantin Boudnik added a comment - I suggest to make the following changes: change the signature of getHadoopMultiUsersList to public ProxyUserDefinitions getHadoopMultiUsersList() throws IOException and then public abstract class ProxyUserDefinitions { class GroupsAndHost { private List<String> groups; private List<String> hosts; public List<String> getGroups() { return groups; } public void setGroups(List<String> groups) { this.groups = groups; } public List<String> getHosts() { return hosts; } public void setHosts(List<String> hosts) { this.hosts = hosts; } } protected Map<String, GroupsAndHost> proxyUsers; protected ProxyUserDefinitions () { proxyUsers = new HashMap<String, GroupsAndHost>(); } public void addProxyUser (String userName, GroupsAndHost definitions) { proxyUsers.put(userName, definitions); } public GroupsAndHost getProxyUser (String userName) { return proxyUsers.get(userName); } /** * The implementation of this method has to be provided by a child of the class * @param filePath * @return * @throws IOException */ public abstract boolean writeToFile(URI filePath) throws IOException; } so you will be able to store information in this form username: group [,group] * as well as the hostnames from where proxy users are allowed to connect. The purpose of this new method will be to extract only proxy-users related information from a daemon configuration and present it in the form a data container (above). The data container provides the method writeToFile which has to be overwritten by an implementer in order to provide needed logic of how the content of object is stored into a file. Thus, you can inherit the class in your testcase (or whatever) and create writeToFile implementation you need (basically, the second method getProxyUsersFile of your patch with some small modificaitons) for gridmix without adding unrelated functionality to Herriot base classes. Makes sense?
          Hide
          Vinay Kumar Thota added a comment -

          Addressed cos comments.

          Show
          Vinay Kumar Thota added a comment - Addressed cos comments.
          Hide
          Konstantin Boudnik added a comment -

          Yup, I think that's it.

          One nit:

          • would it be better to throw NoSuchMethodException from the body of default writeToFile implementation here
            +    ProxyUserDefinitions pud = new ProxyUserDefinitions() {
            +      public boolean writeToFile(URI filePath) throws IOException {
            +        return false;
            +      };
            

          Other than that it looks good.

          Show
          Konstantin Boudnik added a comment - Yup, I think that's it. One nit: would it be better to throw NoSuchMethodException from the body of default writeToFile implementation here + ProxyUserDefinitions pud = new ProxyUserDefinitions() { + public boolean writeToFile(URI filePath) throws IOException { + return false; + }; Other than that it looks good.
          Hide
          Vinay Kumar Thota added a comment -

          Addressed the comment.

          Show
          Vinay Kumar Thota added a comment - Addressed the comment.
          Hide
          Konstantin Boudnik added a comment -

          Almost there except that try-catch seems to be unnecessary around throw statement. Or you want to mask the exception?

          Show
          Konstantin Boudnik added a comment - Almost there except that try-catch seems to be unnecessary around throw statement. Or you want to mask the exception?
          Hide
          Vinay Kumar Thota added a comment -

          Yes, I want to mask the exception, otherwise you can see unhandled exception type NoSuchMethodException while compiling the code because the writeToFile method handles only IOExceptions.

          Show
          Vinay Kumar Thota added a comment - Yes, I want to mask the exception, otherwise you can see unhandled exception type NoSuchMethodException while compiling the code because the writeToFile method handles only IOExceptions.
          Hide
          Konstantin Boudnik added a comment -

          I think we'll need to revert HDFS-1277 and MAPREDUCE-1896 when this is committed to 0.22. Also, we need to make sure that reversion of the above JIRAs won't affect any tests.

          Show
          Konstantin Boudnik added a comment - I think we'll need to revert HDFS-1277 and MAPREDUCE-1896 when this is committed to 0.22. Also, we need to make sure that reversion of the above JIRAs won't affect any tests.
          Hide
          Konstantin Boudnik added a comment -

          I think I know where I was wrong: when I said UnsupportedOperationException I have meant, of course, UnsupportedOperationException. I wonder how noone could've guessed that

          Show
          Konstantin Boudnik added a comment - I think I know where I was wrong: when I said UnsupportedOperationException I have meant, of course, UnsupportedOperationException . I wonder how noone could've guessed that
          Hide
          Konstantin Boudnik added a comment -

          Here's the patch for my last comment.

          Show
          Konstantin Boudnik added a comment - Here's the patch for my last comment.
          Hide
          Vinay Kumar Thota added a comment -

          +1 for the patch.

          Show
          Vinay Kumar Thota added a comment - +1 for the patch.
          Hide
          Konstantin Boudnik added a comment -

          I have manually ran test-patch.sh to make sure that patch is Ok

          +1 overall.  
          
              +1 @author.  The patch does not contain any @author tags.
          
              +1 tests included.  The patch appears to include 5 new or modified tests.
          
              +1 javadoc.  The javadoc tool did not generate any warning messages.
          
              +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
          
              +1 findbugs.  The patch does not introduce any new Findbugs warnings.
          
              +1 release audit.  The applied patch does not increase the total number of release audit warnings.
          
              +1 system tests framework.  The patch passed system tests framework compile.
          
          Show
          Konstantin Boudnik added a comment - I have manually ran test-patch.sh to make sure that patch is Ok +1 overall. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 5 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 system tests framework. The patch passed system tests framework compile.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #395 (See https://hudson.apache.org/hudson/job/Hadoop-Common-trunk-Commit/395/)

          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #395 (See https://hudson.apache.org/hudson/job/Hadoop-Common-trunk-Commit/395/ )
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk #489 (See https://hudson.apache.org/hudson/job/Hadoop-Common-trunk/489/)

          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk #489 (See https://hudson.apache.org/hudson/job/Hadoop-Common-trunk/489/ )

            People

            • Assignee:
              Vinay Kumar Thota
              Reporter:
              Vinay Kumar Thota
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development