Hive
  1. Hive
  2. HIVE-2504

Warehouse table subdirectories should inherit the group permissions of the warehouse parent directory

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.9.0
    • Component/s: Metastore
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      When the Hive Metastore creates a subdirectory in the Hive warehouse for
      a new table it does so with the default HDFS permissions. Since the default
      dfs.umask value is 022, this means that the new subdirectory will not inherit the
      group write permissions of the hive warehouse directory.

      We should make the umask used by Warehouse.mkdirs() configurable, and set
      it to use a default value of 002.

      1. HIVE-2504.patch
        2 kB
        Chinna Rao Lalam
      2. HIVE-2504.patch
        7 kB
        Rohini Palaniswamy
      3. HIVE-2504-1.patch
        8 kB
        Rohini Palaniswamy

        Issue Links

          Activity

          Hide
          Ashutosh Chauhan added a comment -

          Can't you achieve this already by setting dfs.umask to 002 in hdfs-site.xml

          Show
          Ashutosh Chauhan added a comment - Can't you achieve this already by setting dfs.umask to 002 in hdfs-site.xml
          Hide
          Carl Steinbach added a comment -

          @Ashutosh: Yes, but that then shifts the configuration burden to the administrator.
          The point of this ticket is to reduce the configuration burden on admins/users by
          providing a sensible set of default configuration parameters. I'm also selfishly interested
          in heading-off the inevitable stream of emails to hive-user from folks wondering
          why Hive won't let them insert data into the table they just created.

          Do you disagree with this approach?

          Show
          Carl Steinbach added a comment - @Ashutosh: Yes, but that then shifts the configuration burden to the administrator. The point of this ticket is to reduce the configuration burden on admins/users by providing a sensible set of default configuration parameters. I'm also selfishly interested in heading-off the inevitable stream of emails to hive-user from folks wondering why Hive won't let them insert data into the table they just created. Do you disagree with this approach?
          Hide
          Ashutosh Chauhan added a comment -

          No, I don't disagree with the approach. I was just interested to know whether its already possible to do so.

          Show
          Ashutosh Chauhan added a comment - No, I don't disagree with the approach. I was just interested to know whether its already possible to do so.
          Hide
          Chinna Rao Lalam added a comment -

          for directory creation passed the umask value as 0002.

          Show
          Chinna Rao Lalam added a comment - for directory creation passed the umask value as 0002.
          Hide
          Namit Jain added a comment -

          +1

          Show
          Namit Jain added a comment - +1
          Hide
          Namit Jain added a comment -

          Committed. Thanks Chinna Rao

          Show
          Namit Jain added a comment - Committed. Thanks Chinna Rao
          Hide
          Hudson added a comment -

          Integrated in Hive-trunk-h0.21 #1198 (See https://builds.apache.org/job/Hive-trunk-h0.21/1198/)
          HIVE-2504 Warehouse table subdirectories should inherit the group permissions of the warehouse
          parent directory (Chinna Rao Lalam via namit)

          namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1230774
          Files :

          • /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
          • /hive/trunk/conf/hive-default.xml.template
          • /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java
          Show
          Hudson added a comment - Integrated in Hive-trunk-h0.21 #1198 (See https://builds.apache.org/job/Hive-trunk-h0.21/1198/ ) HIVE-2504 Warehouse table subdirectories should inherit the group permissions of the warehouse parent directory (Chinna Rao Lalam via namit) namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1230774 Files : /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java /hive/trunk/conf/hive-default.xml.template /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java
          Hide
          Hudson added a comment -

          Integrated in Hive-trunk-h0.21-dbg #3 (See https://builds.apache.org/job/Hive-trunk-h0.21-dbg/3/)
          HIVE-2504 Warehouse table subdirectories should inherit the group permissions of the warehouse
          parent directory (Chinna Rao Lalam via namit)

          namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1230774
          Files :

          • /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
          • /hive/trunk/conf/hive-default.xml.template
          • /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java
          Show
          Hudson added a comment - Integrated in Hive-trunk-h0.21-dbg #3 (See https://builds.apache.org/job/Hive-trunk-h0.21-dbg/3/ ) HIVE-2504 Warehouse table subdirectories should inherit the group permissions of the warehouse parent directory (Chinna Rao Lalam via namit) namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1230774 Files : /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java /hive/trunk/conf/hive-default.xml.template /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java
          Hide
          Sho Shimauchi added a comment -

          I found a typo.

          hive.files.umask.vlaue

          should be:

          hive.files.umask.value

          Show
          Sho Shimauchi added a comment - I found a typo. hive.files.umask.vlaue should be: hive.files.umask.value
          Hide
          Carl Steinbach added a comment -

          @Sho: Can you please file a JIRA for this? Thanks!

          Show
          Carl Steinbach added a comment - @Sho: Can you please file a JIRA for this? Thanks!
          Hide
          Sho Shimauchi added a comment -

          sure, I'll file a JIRA and submit a patch.

          Show
          Sho Shimauchi added a comment - sure, I'll file a JIRA and submit a patch.
          Hide
          Rohini Palaniswamy added a comment -

          This fix makes the table directory always have permission of 775 and does not actually inherit the group permissions of the parent directory. In most of our cases, the owner produces the data while the group permissions are meant for users reading the data. Now hive.files.umask.value always need to be set to ensure the users belonging to the group cannot write or modify the table. Can't we change the fix to set the same permissions as the parent directory and get rid of this configuration?
          Also it is always prudent to do fs.mkdirs() and then fs.setPermission() as setting the umask in configuration is not guaranteed to work. The DistributedFileSystem is cached and it refers to the Configuration that it was first initialized with.

          Show
          Rohini Palaniswamy added a comment - This fix makes the table directory always have permission of 775 and does not actually inherit the group permissions of the parent directory. In most of our cases, the owner produces the data while the group permissions are meant for users reading the data. Now hive.files.umask.value always need to be set to ensure the users belonging to the group cannot write or modify the table. Can't we change the fix to set the same permissions as the parent directory and get rid of this configuration? Also it is always prudent to do fs.mkdirs() and then fs.setPermission() as setting the umask in configuration is not guaranteed to work. The DistributedFileSystem is cached and it refers to the Configuration that it was first initialized with.
          Hide
          Thomas Weise added a comment -

          Reopening per Rohini's request.

          Show
          Thomas Weise added a comment - Reopening per Rohini's request.
          Hide
          Rohini Palaniswamy added a comment -

          Had Thomas reopen the issue as I could not.

          Couple of issues with the fix:

          • The config that is actually being looked for is HIVE_FILES_UMASK_VALUE instead of hive.files.umask.value as the Enum name() function is used and not varname.
            short umaskVal = (short) conf.getInt(HiveConf.ConfVars.HIVE_FILES_UMASK_VALUE.name(), 0002);
          • Because of the new FsPermission(short mode) the value for the configuration has to be specified in decimal instead of a octal umask.

          Would be more happy if the fix did not resort to using this setting at all and instead applied the permissions of the parent directory.

          Show
          Rohini Palaniswamy added a comment - Had Thomas reopen the issue as I could not. Couple of issues with the fix: The config that is actually being looked for is HIVE_FILES_UMASK_VALUE instead of hive.files.umask.value as the Enum name() function is used and not varname. short umaskVal = (short) conf.getInt(HiveConf.ConfVars.HIVE_FILES_UMASK_VALUE.name(), 0002); Because of the new FsPermission(short mode) the value for the configuration has to be specified in decimal instead of a octal umask. Would be more happy if the fix did not resort to using this setting at all and instead applied the permissions of the parent directory.
          Hide
          Rohini Palaniswamy added a comment -

          Haven't gotten any response on this for more than a week. So I am assuming no one is looking at it.

          Chinna,
          If you are not working on this, I can pick this one up.

          Show
          Rohini Palaniswamy added a comment - Haven't gotten any response on this for more than a week. So I am assuming no one is looking at it. Chinna, If you are not working on this, I can pick this one up.
          Hide
          Rohini Palaniswamy added a comment -

          Made the newly created subdirectories get the same permissions as the parent by setting it explicitly on them. The table directories will get the permissions of the database directory and the partition directories will get the permissions of the table directory.

          Removed the hive.files.umask.value configuration parameter.

          Show
          Rohini Palaniswamy added a comment - Made the newly created subdirectories get the same permissions as the parent by setting it explicitly on them. The table directories will get the permissions of the database directory and the partition directories will get the permissions of the table directory. Removed the hive.files.umask.value configuration parameter.
          Hide
          Rohini Palaniswamy added a comment -

          Unit tests pass with ant clean package test

          Show
          Rohini Palaniswamy added a comment - Unit tests pass with ant clean package test
          Hide
          Ashutosh Chauhan added a comment -

          I agree that fiddling with umask is not the cleanest approach here. But, I am not sure about always inheriting permissions either, since this effectively implies the whole sub-tree of warehouse dir will have same permissions as warehouse dir itself. Concretely, lets consider following example. Lets say, wh dir has 700 perms. Then, if I create table (which only owner of wh can do) I will end up with either 775 or 755 (depending on whether it was before or after the earlier patch of jira). However, with your patch, table dir will end up with 700. In the earlier case, anyone could have read the tables, but now with your approach only owner can read. Now, which of this is correct behavior is open for debate and depends on which security model you have as your premise. Additionally, this will be change of behavior then the current behavior. So, I suggest you define a new config variable like hive.warehouse.inherit.perms or something similar and set it to false by default. And then take your code path of inheriting parent perms in case it is set to true. Thoughts?

          Show
          Ashutosh Chauhan added a comment - I agree that fiddling with umask is not the cleanest approach here. But, I am not sure about always inheriting permissions either, since this effectively implies the whole sub-tree of warehouse dir will have same permissions as warehouse dir itself. Concretely, lets consider following example. Lets say, wh dir has 700 perms. Then, if I create table (which only owner of wh can do) I will end up with either 775 or 755 (depending on whether it was before or after the earlier patch of jira). However, with your patch, table dir will end up with 700. In the earlier case, anyone could have read the tables, but now with your approach only owner can read. Now, which of this is correct behavior is open for debate and depends on which security model you have as your premise. Additionally, this will be change of behavior then the current behavior. So, I suggest you define a new config variable like hive.warehouse.inherit.perms or something similar and set it to false by default. And then take your code path of inheriting parent perms in case it is set to true. Thoughts?
          Hide
          Rohini Palaniswamy added a comment -

          Agree with you on not breaking backward compatibility. Will post a new patch with the config.

          Show
          Rohini Palaniswamy added a comment - Agree with you on not breaking backward compatibility. Will post a new patch with the config.
          Hide
          Rohini Palaniswamy added a comment -

          Added hive.warehouse.subdir.inherit.perms which is false by default. The default behaviour of hive-0.8 stays. i.e directories will be created with the permissions of dfs.umask or dfs.umaskmode. If hive.warehouse.subdir.inherit.perms is set to true, then table directories will inherit the permission of the default warehouse or the custom database location. This comes in handy when you have databases created with different permissions or if the warehouse directory has permissions like 775.

          Ashutosh,
          Your argument of warehouse having 700 permissions will not hold. If warehouse has 700 and even if the table directories have 755 or 775 they will not be accessible by any one other than the owner because if you don't have access to the parent directory you cannot access sub-directories in dfs (Same as Linux). So the warehouse has to be at least 755 or 750 to start with. So with the initial patch, the sub-directories would have been created with 775. But with my patch, they would have been created the same as warehouse directory(755 or 750) which would still allow read access to group. But anyways to avoid any confusion and provide backward compatibility added the new config.

          Show
          Rohini Palaniswamy added a comment - Added hive.warehouse.subdir.inherit.perms which is false by default. The default behaviour of hive-0.8 stays. i.e directories will be created with the permissions of dfs.umask or dfs.umaskmode. If hive.warehouse.subdir.inherit.perms is set to true, then table directories will inherit the permission of the default warehouse or the custom database location. This comes in handy when you have databases created with different permissions or if the warehouse directory has permissions like 775. Ashutosh, Your argument of warehouse having 700 permissions will not hold. If warehouse has 700 and even if the table directories have 755 or 775 they will not be accessible by any one other than the owner because if you don't have access to the parent directory you cannot access sub-directories in dfs (Same as Linux). So the warehouse has to be at least 755 or 750 to start with. So with the initial patch, the sub-directories would have been created with 775. But with my patch, they would have been created the same as warehouse directory(755 or 750) which would still allow read access to group. But anyways to avoid any confusion and provide backward compatibility added the new config.
          Hide
          Carl Steinbach added a comment -

          Closing this ticket again since Chinna's patch for HIVE-2504 was committed back in January.

          Please do not reopen tickets that have been marked fixed/resolved (i.e. tickets for which a patch has already been committed). Open a new ticket if you think there's a problem with the original patch. Thanks.

          Show
          Carl Steinbach added a comment - Closing this ticket again since Chinna's patch for HIVE-2504 was committed back in January. Please do not reopen tickets that have been marked fixed/resolved (i.e. tickets for which a patch has already been committed). Open a new ticket if you think there's a problem with the original patch. Thanks.
          Hide
          Rohini Palaniswamy added a comment -

          Sorry. I am new to the hive community and was not aware of the process and did not see it in HowToContribute page also. Went with the general practice of reopening bugs if there was an issue. Also this patch was still in trunk and not part of any release. Thanks for pointing out the process. Have created HIVE-2936 to track the new patch.

          Show
          Rohini Palaniswamy added a comment - Sorry. I am new to the hive community and was not aware of the process and did not see it in HowToContribute page also. Went with the general practice of reopening bugs if there was an issue. Also this patch was still in trunk and not part of any release. Thanks for pointing out the process. Have created HIVE-2936 to track the new patch.
          Hide
          Ashutosh Chauhan added a comment -

          This issue is closed now. It was released with the fix in 0.9.0. If there is a problem, please open a new jira and link this one with that.

          Show
          Ashutosh Chauhan added a comment - This issue is closed now. It was released with the fix in 0.9.0. If there is a problem, please open a new jira and link this one with that.
          Hide
          Raja Ahmed added a comment -

          Hi,I am new to Hadoop, Hive, and Linux. and i need some help. I am recieving an error in Hive which I think is permission related and I think this patch may help. But I dont know how to install this patch. Can someone help me with step by step directions on how to install this patch? The error message I get when I run the Hive test is:
          CREATE TABLE invites (foo INT, bar STRING) PARTITIONED BY (ds STRING);
          FAILED: Error in metadata: MetaException(message:Got exception: org.apache.hadoop.security.AccessControlException org.apache.hadoop.security.AccessControlException: Permission denied: user=myID, access=WRITE, inode="/user/hive/warehouse":hdfs:hadoop:drwxr-xr-x)
          Reading up on this patch seems like this may be the fix. If not then any assistance will be greatly appreciated.

          Show
          Raja Ahmed added a comment - Hi,I am new to Hadoop, Hive, and Linux. and i need some help. I am recieving an error in Hive which I think is permission related and I think this patch may help. But I dont know how to install this patch. Can someone help me with step by step directions on how to install this patch? The error message I get when I run the Hive test is: CREATE TABLE invites (foo INT, bar STRING) PARTITIONED BY (ds STRING); FAILED: Error in metadata: MetaException(message:Got exception: org.apache.hadoop.security.AccessControlException org.apache.hadoop.security.AccessControlException: Permission denied: user=myID, access=WRITE, inode="/user/hive/warehouse":hdfs:hadoop:drwxr-xr-x) Reading up on this patch seems like this may be the fix. If not then any assistance will be greatly appreciated.
          Hide
          Rohini Palaniswamy added a comment -

          Please send questions to the user mailing list. You cannot create the table because your warehouse directory is writable only by hdfs user. You can use LOCATION clause in create table statement and create the table in a directory where you have access.

          Show
          Rohini Palaniswamy added a comment - Please send questions to the user mailing list. You cannot create the table because your warehouse directory is writable only by hdfs user. You can use LOCATION clause in create table statement and create the table in a directory where you have access.
          Hide
          Hudson added a comment -

          Integrated in Hive-trunk-hadoop2 #54 (See https://builds.apache.org/job/Hive-trunk-hadoop2/54/)
          HIVE-2504 Warehouse table subdirectories should inherit the group permissions of the warehouse
          parent directory (Chinna Rao Lalam via namit) (Revision 1230774)

          Result = ABORTED
          namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1230774
          Files :

          • /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
          • /hive/trunk/conf/hive-default.xml.template
          • /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java
          Show
          Hudson added a comment - Integrated in Hive-trunk-hadoop2 #54 (See https://builds.apache.org/job/Hive-trunk-hadoop2/54/ ) HIVE-2504 Warehouse table subdirectories should inherit the group permissions of the warehouse parent directory (Chinna Rao Lalam via namit) (Revision 1230774) Result = ABORTED namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1230774 Files : /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java /hive/trunk/conf/hive-default.xml.template /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java

            People

            • Assignee:
              Chinna Rao Lalam
              Reporter:
              Carl Steinbach
            • Votes:
              2 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development