Hadoop Common
  1. Hadoop Common
  2. HADOOP-8101

Access Control support for Non-secure deployment of Hadoop on Windows

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: native
    • Labels:
      None
    • Target Version/s:
    1. security1.patch
      6 kB
      Sanjay Radia
    2. security.patch
      6 kB
      Sanjay Radia

      Issue Links

        Activity

        Sanjay Radia made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Target Version/s 1.1.0, 0.24.0 [ 12316501, 12317652 ] HADOOP-1-Windows [ 12320361 ]
        Resolution Fixed [ 1 ]
        Hide
        Bikas Saha added a comment -

        Problem 1: Group Mappings for HDFS. HDFS file permissions are implemented inside HDFS - there is no interaction with the local file system in order to implement these permissions. However, HDFS needs a user-to-group mapping. Currently there is a pluggable module for obtaining a mapping via LDap and via shell commands. We need a group mapping for windows.

        HADOOP-8234 tracks this

        Problem 3: Permissions for RawLocalFileSystem when using Hadoop on a local desktop (no HDFS is involved here). We need to emulate set-permissions and get-permissions APIs of the class FileSystem.java when the local file system and desktop are windows. Hadoop FileSystem permission are the same as those in Unix.

        HADOOP-8235 tracks this.

        Problem 2: HDFS and MR Impl Protecting its local OS resources from Tasks. Hadoop impl uses local OS resources such as files and tasks. Hadoop protects these resources from tasks that run on the same hosts. HDFS and MR daemons uses local files & dirs and sets permissions when creating dirs/file and later on checks these permissions. For example, a Datanode sets the permission of its "block dirs" to be unreadable by others when it formats a data node. In some cases the permissions are set using a RawLocalFileSystem's permissions. We need a way to set such protections for windows.

        HADOOP-8235 might lead to a solution to this or we may be able to do something simpler for it

        Show
        Bikas Saha added a comment - Problem 1: Group Mappings for HDFS. HDFS file permissions are implemented inside HDFS - there is no interaction with the local file system in order to implement these permissions. However, HDFS needs a user-to-group mapping. Currently there is a pluggable module for obtaining a mapping via LDap and via shell commands. We need a group mapping for windows. HADOOP-8234 tracks this Problem 3: Permissions for RawLocalFileSystem when using Hadoop on a local desktop (no HDFS is involved here). We need to emulate set-permissions and get-permissions APIs of the class FileSystem.java when the local file system and desktop are windows. Hadoop FileSystem permission are the same as those in Unix. HADOOP-8235 tracks this. Problem 2: HDFS and MR Impl Protecting its local OS resources from Tasks. Hadoop impl uses local OS resources such as files and tasks. Hadoop protects these resources from tasks that run on the same hosts. HDFS and MR daemons uses local files & dirs and sets permissions when creating dirs/file and later on checks these permissions. For example, a Datanode sets the permission of its "block dirs" to be unreadable by others when it formats a data node. In some cases the permissions are set using a RawLocalFileSystem's permissions. We need a way to set such protections for windows. HADOOP-8235 might lead to a solution to this or we may be able to do something simpler for it
        Hide
        Bikas Saha added a comment -

        Moving it to top level in order to create sub-tasks.

        Show
        Bikas Saha added a comment - Moving it to top level in order to create sub-tasks.
        Bikas Saha made changes -
        Link This issue is part of HADOOP-8079 [ HADOOP-8079 ]
        Bikas Saha made changes -
        Parent HADOOP-8079 [ 12542808 ]
        Issue Type Sub-task [ 7 ] Improvement [ 4 ]
        Sanjay Radia made changes -
        Summary Security changes for Hadoop for Windows Access Control support for Non-secure deployment of Hadoop on Windows
        Hide
        Sanjay Radia added a comment -

        As Aaron suggested, I have change the title.

        Show
        Sanjay Radia added a comment - As Aaron suggested, I have change the title.
        Hide
        Aaron T. Myers added a comment -

        I think you mean the work done in HADOOP-8121. Can that be ported to this branch if needed? When running on Windows, I think we might be able to get away using Windows shell commands which use the built-in OS support (taking advantage of the efficiency and caching provided by Windows).

        I was indeed referring to HADOOP-8121. Sure, providing an implementation that shells out could make sense too, though note that Hadoop itself has built-in caching for user -> group mapping, so it might not make that much of a difference from a performance perspective.

        I think, if I understand Sanjay's intent correctly, he is describing the support needed in a non-secure (non-authenticated) Hadoop setup. Hence, the LTC etc pieces are not yet in the picture. Sanjay, please correct me if needed.

        Ah, my mistake. We might want to update the description of this JIRA to make it clear exactly what the scope of this change is intended to be.

        As an aside, LTC (LinuxTaskController) is a pieces of native code that gets executed on demand. If that abstraction is correct, it can be replicated for Windows although it might make sense to revisit the abstraction.

        Sure, makes sense.

        Show
        Aaron T. Myers added a comment - I think you mean the work done in HADOOP-8121 . Can that be ported to this branch if needed? When running on Windows, I think we might be able to get away using Windows shell commands which use the built-in OS support (taking advantage of the efficiency and caching provided by Windows). I was indeed referring to HADOOP-8121 . Sure, providing an implementation that shells out could make sense too, though note that Hadoop itself has built-in caching for user -> group mapping, so it might not make that much of a difference from a performance perspective. I think, if I understand Sanjay's intent correctly, he is describing the support needed in a non-secure (non-authenticated) Hadoop setup. Hence, the LTC etc pieces are not yet in the picture. Sanjay, please correct me if needed. Ah, my mistake. We might want to update the description of this JIRA to make it clear exactly what the scope of this change is intended to be. As an aside, LTC (LinuxTaskController) is a pieces of native code that gets executed on demand. If that abstraction is correct, it can be replicated for Windows although it might make sense to revisit the abstraction. Sure, makes sense.
        Hide
        Bikas Saha added a comment -

        I think you mean the work done in HADOOP-8121. Can that be ported to this branch if needed? When running on Windows, I think we might be able to get away using Windows shell commands which use the built-in OS support (taking advantage of the efficiency and caching provided by Windows).

        I think, if I understand Sanjay's intent correctly, he is describing the support needed in a non-secure (non-authenticated) Hadoop setup. Hence, the LTC etc pieces are not yet in the picture. Sanjay, please correct me if needed.

        As an aside, LTC (LinuxTaskController) is a pieces of native code that gets executed on demand. If that abstraction is correct, it can be replicated for Windows although it might make sense to revisit the abstraction.

        Show
        Bikas Saha added a comment - I think you mean the work done in HADOOP-8121 . Can that be ported to this branch if needed? When running on Windows, I think we might be able to get away using Windows shell commands which use the built-in OS support (taking advantage of the efficiency and caching provided by Windows). I think, if I understand Sanjay's intent correctly, he is describing the support needed in a non-secure (non-authenticated) Hadoop setup. Hence, the LTC etc pieces are not yet in the picture. Sanjay, please correct me if needed. As an aside, LTC (LinuxTaskController) is a pieces of native code that gets executed on demand. If that abstraction is correct, it can be replicated for Windows although it might make sense to revisit the abstraction.
        Hide
        Aaron T. Myers added a comment -

        Currently there is a pluggable module for obtaining a mapping via LDap and via shell commands. We need a group mapping for windows.

        FWIW, I'm fairly confident that the LDAP group mapping was specifically tested with Active Directory when it was being implemented.

        Problem 2: HDFS and MR Impl Protecting its local OS resources from Tasks

        Perhaps you were alluding to this, but to be explicit, I think the biggest hurdle with getting secure Hadoop running on Windows will not be local file system permissions, but to get the sandboxing enabled by the LTC and LCE to function as expected. Those components rely heavily on Unix concepts such as seteuid/setegid, supplementary group list, group execution permissions for binaries, signals, etc.

        Show
        Aaron T. Myers added a comment - Currently there is a pluggable module for obtaining a mapping via LDap and via shell commands. We need a group mapping for windows. FWIW, I'm fairly confident that the LDAP group mapping was specifically tested with Active Directory when it was being implemented. Problem 2: HDFS and MR Impl Protecting its local OS resources from Tasks Perhaps you were alluding to this, but to be explicit, I think the biggest hurdle with getting secure Hadoop running on Windows will not be local file system permissions, but to get the sandboxing enabled by the LTC and LCE to function as expected. Those components rely heavily on Unix concepts such as seteuid/setegid, supplementary group list, group execution permissions for binaries, signals, etc.
        Hide
        Sanjay Radia added a comment -

        Background: Hadoop has secure and non-secure mode - authorization is performed in both modes. The difference is how authentication is done.

        3 Problems

        • Problem 1: Group Mappings for HDFS
          HDFS file permissions are implemented inside HDFS - there is no interaction with the local file system in order to implement these permissions. However, HDFS needs a user-to-group mapping. Currently there is a pluggable module for obtaining a mapping via LDap and via shell commands. We need a group mapping for windows.
        • Problem 2: HDFS and MR Impl Protecting its local OS resources from Tasks
          Hadoop impl uses local OS resources such as files and tasks. Hadoop protects these resources from tasks that run on the same hosts. HDFS and MR daemons uses local files & dirs and sets permissions when creating dirs/file and later on checks these permissions. For example, a Datanode sets the permission of its "block dirs" to be unreadable by others when it formats a data node. In some cases the permissions are set using a RawLocalFileSystem's permissions. We need a way to set such protections for windows.
        • Problem 3: Permissions for RawLocalFileSystem when using Hadoop on a local desktop (no HDFS is involved here).
          We need to emulate set-permissions and get-permissions APIs of the class FileSystem.java when the local file system and desktop are windows. Hadoop FileSystem permission are the same as those in Unix.
        Show
        Sanjay Radia added a comment - Background: Hadoop has secure and non-secure mode - authorization is performed in both modes. The difference is how authentication is done. 3 Problems Problem 1: Group Mappings for HDFS HDFS file permissions are implemented inside HDFS - there is no interaction with the local file system in order to implement these permissions. However, HDFS needs a user-to-group mapping. Currently there is a pluggable module for obtaining a mapping via LDap and via shell commands. We need a group mapping for windows. Problem 2: HDFS and MR Impl Protecting its local OS resources from Tasks Hadoop impl uses local OS resources such as files and tasks. Hadoop protects these resources from tasks that run on the same hosts. HDFS and MR daemons uses local files & dirs and sets permissions when creating dirs/file and later on checks these permissions. For example, a Datanode sets the permission of its "block dirs" to be unreadable by others when it formats a data node. In some cases the permissions are set using a RawLocalFileSystem's permissions. We need a way to set such protections for windows. Problem 3: Permissions for RawLocalFileSystem when using Hadoop on a local desktop (no HDFS is involved here). We need to emulate set-permissions and get-permissions APIs of the class FileSystem.java when the local file system and desktop are windows. Hadoop FileSystem permission are the same as those in Unix.
        Sanjay Radia made changes -
        Attachment security1.patch [ 12518127 ]
        Hide
        Sanjay Radia added a comment -

        Patch updated to current Hadoop-1 (part of the patch had pulled some changes that are already in Hadooop-1)

        Show
        Sanjay Radia added a comment - Patch updated to current Hadoop-1 (part of the patch had pulled some changes that are already in Hadooop-1)
        Hide
        Sanjay Radia added a comment -
        • src/core/org/apache/hadoop/fs/RawLocalFileSystem.java
          Why set the status (U,G, etc) to null – don’t you want the actual permission for the user, group and world. There is no concept of groups in widows and elsewhere you have used a group called "Users".
        • src/core/org/apache/hadoop/io/SecureIOUtils.java
          if RawLocalFile is fixed then you will not need this.
        • src/core/org/apache/hadoop/security/UserGroupInformation.java
          • newLoginContext() Patch screwed up – you must have accidently picked up the change from HADOOP-7982
          • getCurrentUser() – the existing code should work - should not need a WINDOWS-os check
        • src/core/org/apache/hadoop/security/ShellBasedUnixGroupsMapping.java
          The right way is to use a java class to get the windows group and make the default windows config point to this class for getting group mappings. If this is the only place where we have a need for different default config for windows I could live with the patch.
        • src/core/org/apache/hadoop/security/Credentials.java
          Job tokens are set for both secure and non-secure. This change should not be needed. Did things break without this change?
        Show
        Sanjay Radia added a comment - src/core/org/apache/hadoop/fs/RawLocalFileSystem.java Why set the status (U,G, etc) to null – don’t you want the actual permission for the user, group and world. There is no concept of groups in widows and elsewhere you have used a group called "Users". src/core/org/apache/hadoop/io/SecureIOUtils.java if RawLocalFile is fixed then you will not need this. src/core/org/apache/hadoop/security/UserGroupInformation.java newLoginContext() Patch screwed up – you must have accidently picked up the change from HADOOP-7982 getCurrentUser() – the existing code should work - should not need a WINDOWS-os check src/core/org/apache/hadoop/security/ShellBasedUnixGroupsMapping.java The right way is to use a java class to get the windows group and make the default windows config point to this class for getting group mappings. If this is the only place where we have a need for different default config for windows I could live with the patch. src/core/org/apache/hadoop/security/Credentials.java Job tokens are set for both secure and non-secure. This change should not be needed. Did things break without this change?
        Hide
        Sanjay Radia added a comment -

        The current patch is designed for non-secure Hadoop. It is okay to say that windows works for non-secure mode; a secure-windows Hadoop can be addressed later. However it is desirable for a windows desktop to securely submit jobs to a secure Hadoop cluster. All that is needed for that is that the client-side should get authenticate and get tokens - this should work since the login in UserGroupInformation does the right thing in a platform independent way (To be verified).

        Show
        Sanjay Radia added a comment - The current patch is designed for non-secure Hadoop. It is okay to say that windows works for non-secure mode; a secure-windows Hadoop can be addressed later. However it is desirable for a windows desktop to securely submit jobs to a secure Hadoop cluster. All that is needed for that is that the client-side should get authenticate and get tokens - this should work since the login in UserGroupInformation does the right thing in a platform independent way (To be verified).
        Sanjay Radia made changes -
        Attachment security.patch [ 12516760 ]
        Hide
        Sanjay Radia added a comment -

        Attached the security patch from the parent jira

        Show
        Sanjay Radia added a comment - Attached the security patch from the parent jira
        Sanjay Radia made changes -
        Field Original Value New Value
        Component/s native [ 12312070 ]
        Sanjay Radia created issue -

          People

          • Assignee:
            Unassigned
            Reporter:
            Sanjay Radia
          • Votes:
            0 Vote for this issue
            Watchers:
            13 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development