Hive
  1. Hive
  2. HIVE-3756

"LOAD DATA" does not honor permission inheritence

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.9.0
    • Fix Version/s: 0.12.0
    • Component/s: Authorization, Security
    • Labels:
      None

      Description

      When a "LOAD DATA" operation is performed the resulting data in hdfs for the table does not maintain permission inheritance. This remains true even with the "hive.warehouse.subdir.inherit.perms" set to true.

      The issue is easily reproducible by creating a table and loading some data into it. After the load is complete just do a "dfs -ls -R" on the warehouse directory and you will see that the inheritance of permissions worked for the table directory but not for the data.

      1. HIVE-3756.patch
        6 kB
        Chaoyu Tang
      2. HIVE-3756_1.patch
        5 kB
        Chaoyu Tang
      3. HIVE-3756_2.patch
        7 kB
        Chaoyu Tang

        Issue Links

          Activity

          Johndee Burks created issue -
          Chaoyu Tang made changes -
          Field Original Value New Value
          Link This issue is related to HIVE-3094 [ HIVE-3094 ]
          Hide
          Chaoyu Tang added a comment -

          HIVE-3094 has the similar root cause to this one

          Show
          Chaoyu Tang added a comment - HIVE-3094 has the similar root cause to this one
          Chaoyu Tang made changes -
          Attachment HIVE-3756.patch [ 12589266 ]
          Chaoyu Tang made changes -
          Assignee Chaoyu Tang [ ctang.ma ]
          Hide
          Chaoyu Tang added a comment -

          Please review the changes in https://reviews.apache.org/r/12050/

          Show
          Chaoyu Tang added a comment - Please review the changes in https://reviews.apache.org/r/12050/
          Hide
          Sushanth Sowmyan added a comment -

          I've commented on the review, regarding use of FileStatus.isDirectory(). If that issue is handled, I'm +1 on this patch.

          Show
          Sushanth Sowmyan added a comment - I've commented on the review, regarding use of FileStatus.isDirectory(). If that issue is handled, I'm +1 on this patch.
          Hide
          Chaoyu Tang added a comment -

          Update the changes based on review comments

          Show
          Chaoyu Tang added a comment - Update the changes based on review comments
          Chaoyu Tang made changes -
          Attachment HIVE-3756_1.patch [ 12590459 ]
          Hide
          Ashutosh Chauhan added a comment -

          Left few comments on RB.

          Show
          Ashutosh Chauhan added a comment - Left few comments on RB.
          Sushanth Sowmyan made changes -
          Link This issue is duplicated by HIVE-3094 [ HIVE-3094 ]
          Hide
          Sushanth Sowmyan added a comment -

          I have a few more thoughts on this. Let's walk through an example:

          Let's say Parent Dir d1 has permission/group combination A.
          Let's say directory d2 inside Parent Dir has permission/group combination B.

          In the case of non-partitioned tables, d1 will be the database/warehouse dir, and d2 the table dir.
          In the case of partitioned tables, d1 will be the table directory and d2 the appropriate partition directories.

          If we did not have the flag to inherit permissions on, then whatever data is loaded, be it files inside d2 (as during a load operation) or replacing d2 and everything in it (as during an insert overwrite operation), will have yet another permission/group combination C, which is a function of the user's current umask and the user's default group

          The purpose behind the subdir inherit permissions flag is to make this behaviour go away, and to be able to use the parent dir's permissions/group when possible. So far, so good.

          Let's say, for purposes of this entire discussion from now onwards, the flag to inherit permissions is on.

          Now, if we load data into d2, without using overwrite, files inside d2 get permission B.
          If we load data into d2, using overwrite, we now overwrite d2, and thus, d2 takes on d1's permissions, and so do the files inside, thus resulting in d2 and files inside d2 having permissions/group combination A.

          While this behaviour is consistent, I find that from a user's perspective, if they create a table (say unpartitioned), then chmod/chgrp it to B, and then they try to load data into it using an Insert-Overwrite, then they still expect that they're only overwriting data inside the table dir, and their expectation is that the table still have permissions/group-combination B. They don't want it to be replaced by "A", the parent db dir's permissions/group , and they don't want "C", the umask/current-user-default-group.

          Now, as to whether this requires a new flag that overrides "hive.warehouse.subdir.inherit.perms" or whether they want "hive.warehouse.subdir.inherit.perms" to work in this way is still up for discussion, but there is now need for an additional requirement, that of the following:

          "If the directory being moved in already exists, and will be deleted so that this can be placed, then instead of going with the parent permissions, it should go with the previous dir's permissions."

          Thoughts?

          This can be a separate jira if people feel like it should be, but I think it's also a minor modification of this current jira.

          Show
          Sushanth Sowmyan added a comment - I have a few more thoughts on this. Let's walk through an example: Let's say Parent Dir d1 has permission/group combination A. Let's say directory d2 inside Parent Dir has permission/group combination B. In the case of non-partitioned tables, d1 will be the database/warehouse dir, and d2 the table dir. In the case of partitioned tables, d1 will be the table directory and d2 the appropriate partition directories. If we did not have the flag to inherit permissions on, then whatever data is loaded, be it files inside d2 (as during a load operation) or replacing d2 and everything in it (as during an insert overwrite operation), will have yet another permission/group combination C, which is a function of the user's current umask and the user's default group The purpose behind the subdir inherit permissions flag is to make this behaviour go away, and to be able to use the parent dir's permissions/group when possible. So far, so good. Let's say, for purposes of this entire discussion from now onwards, the flag to inherit permissions is on. Now, if we load data into d2, without using overwrite, files inside d2 get permission B. If we load data into d2, using overwrite, we now overwrite d2, and thus, d2 takes on d1's permissions, and so do the files inside, thus resulting in d2 and files inside d2 having permissions/group combination A. – While this behaviour is consistent, I find that from a user's perspective, if they create a table (say unpartitioned), then chmod/chgrp it to B, and then they try to load data into it using an Insert-Overwrite, then they still expect that they're only overwriting data inside the table dir, and their expectation is that the table still have permissions/group-combination B. They don't want it to be replaced by "A", the parent db dir's permissions/group , and they don't want "C", the umask/current-user-default-group. Now, as to whether this requires a new flag that overrides "hive.warehouse.subdir.inherit.perms" or whether they want "hive.warehouse.subdir.inherit.perms" to work in this way is still up for discussion, but there is now need for an additional requirement, that of the following: "If the directory being moved in already exists, and will be deleted so that this can be placed, then instead of going with the parent permissions, it should go with the previous dir's permissions." Thoughts? This can be a separate jira if people feel like it should be, but I think it's also a minor modification of this current jira.
          Hide
          Chaoyu Tang added a comment -

          Yes, IMO, the table should preserve its own permission/group B in the insert-overwrite case. Here is a use case, a database is created to allow a group to access (the mode of /dbdir can be 770) and a certain table in this db (/dbdir/tbldir) is only allowed to admin himself (say permission mode 700). If the admin insert overwrite data to this table, it will change the /dbdir/tbldir to 770, breaking the security unexpectedly.
          I can change code to preserve this permission/group of the overwritten table. It seems a minor changes.

          Show
          Chaoyu Tang added a comment - Yes, IMO, the table should preserve its own permission/group B in the insert-overwrite case. Here is a use case, a database is created to allow a group to access (the mode of /dbdir can be 770) and a certain table in this db (/dbdir/tbldir) is only allowed to admin himself (say permission mode 700). If the admin insert overwrite data to this table, it will change the /dbdir/tbldir to 770, breaking the security unexpectedly. I can change code to preserve this permission/group of the overwritten table. It seems a minor changes.
          Hide
          Sushanth Sowmyan added a comment -

          I think it is important to implement, and if you're willing to change your patch to implement this, I think this would be much appreciated.

          Ashutosh Chauhan, are you on board with this change as well?

          Show
          Sushanth Sowmyan added a comment - I think it is important to implement, and if you're willing to change your patch to implement this, I think this would be much appreciated. Ashutosh Chauhan , are you on board with this change as well?
          Hide
          Chaoyu Tang added a comment -

          Sushanth Sowmyan & Ashutosh Chauhan Upload the patch here and also post it in rb for review

          Show
          Chaoyu Tang added a comment - Sushanth Sowmyan & Ashutosh Chauhan Upload the patch here and also post it in rb for review
          Chaoyu Tang made changes -
          Attachment HIVE-3756_2.patch [ 12593233 ]
          Hide
          Chaoyu Tang added a comment -

          Ashutosh Chauhan & Sushanth Sowmyan I uploaded the patch HIVE-3756_2.patch here and also posted it in rb requesting the review. The changes incorporated preserving permission/group in insert overwrite case as discussed here and some review suggestions from Ashutosh. For answers to questions from review, please see https://reviews.apache.org/r/12050/.

          Show
          Chaoyu Tang added a comment - Ashutosh Chauhan & Sushanth Sowmyan I uploaded the patch HIVE-3756 _2.patch here and also posted it in rb requesting the review. The changes incorporated preserving permission/group in insert overwrite case as discussed here and some review suggestions from Ashutosh. For answers to questions from review, please see https://reviews.apache.org/r/12050/ .
          Hide
          Ashutosh Chauhan added a comment -

          +1

          Show
          Ashutosh Chauhan added a comment - +1
          Hide
          Ashutosh Chauhan added a comment -

          Committed to trunk. Thanks, Chaoyu for patch. Thanks, Sushanth for the review.

          Show
          Ashutosh Chauhan added a comment - Committed to trunk. Thanks, Chaoyu for patch. Thanks, Sushanth for the review.
          Ashutosh Chauhan made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Fix Version/s 0.12.0 [ 12324312 ]
          Resolution Fixed [ 1 ]
          Hide
          Chaoyu Tang added a comment -

          Ashutosh Chauhan & Sushanth Sowmyan Thanks for the review.

          Show
          Chaoyu Tang added a comment - Ashutosh Chauhan & Sushanth Sowmyan Thanks for the review.
          Hide
          Ashutosh Chauhan added a comment -

          This issue has been fixed and released as part of 0.12 release. If you find further issues, please create a new jira and link it to this one.

          Show
          Ashutosh Chauhan added a comment - This issue has been fixed and released as part of 0.12 release. If you find further issues, please create a new jira and link it to this one.
          Ashutosh Chauhan made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Szehon Ho made changes -
          Link This issue breaks HIVE-6209 [ HIVE-6209 ]
          Szehon Ho made changes -
          Link This issue is part of HIVE-6892 [ HIVE-6892 ]

            People

            • Assignee:
              Chaoyu Tang
              Reporter:
              Johndee Burks
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development