Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Not A Problem
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: security
    • Labels:
      None

      Description

      HBase security will be in part layered on top of HDFS security, and whatever ZK offers as well. For sake of discussion we presume both HDFS and ZK use a Kerberos based authentication and authorization model, as proposed in the Hadoop Security Architecture document. There are two basic options for that, fine- or coarse-grained:

      Coarse

      There could simply be a single delegation token granted to a HBase cluster from HDFS and ZK for all operations on behalf of all possible users of the HBase cluster. From the perspective of HDFS and ZK, there is only a single principal for each cluster.

      Fine

      The HBase master could manage and renew HDFS and ZK delegation tokens on behalf of users authenticated to HBase via Kerberos. So when a client authenticates via KRB to the HMaster when looking up region locations as the first step to any HBase access, the HMaster would get a delegation token from the NameNode on behalf of the user. (The user would then hand the delegation token to the HRegionServers to allow access to store data via their embedded DFSClients.) It would be ideal if ZooKeeper authentication and authorization could tie in seamlessly. For example, at the same time the HMaster is getting a delegation token for the user for HDFS, it could also get another token for ZK on behalf of the user. A wrinkle here is token renewal. If a user transacts with a HRegionServer with an expired token, the HRegionServer would renew the token (or ask the HMaster to renew the token if superuser should not be delegated from HMaster to HRegionServer) transparently with the NameNode on behalf of the user. Something like that would be necessary on the ZK side also. To support this model, the HRegionServers and HMaster (or just HMaster) must act as a superuser principal capable of impersonating user principals. Presumably, with the ZK ensemble also. Thus ZK, like HDFS, must provide methods for a superuser to act on behalf of others. HDFS will have this facility.

      There are pros and cons for each approach. Coarse obviously is much more simple to implement and reason about. But it requires more trust in HBase to maintain isolation between users than the fine-grained approach. With the fine-grained approach, the regionservers get HDFS and ZK delegation tokens from the HBase client and this allows a policy where files and znodes created by one user+group cannot be read or written by another at the DFS (store) level or the ZK level. Assume group level permissions. Thus you can reason about isolation further down the stack, not just from client->HBase, but client->HBase->HDFS and client->ZK and client->HBase->ZK.

        Activity

        Andrew Purtell created issue -
        Hide
        ryan rawson added a comment -

        The fine grained approach doesnt make sense to me. Who owns the hfiles? I would expect the 'hbase' user to own it. If this is so, then why would we want to delegate end user permissions all the way to hdfs - unless we wanted to have ACL on HFiles depending on the user involved?

        I would expect the client > hbase and hbase> hdfs security to be separate concerns. Otherwise we'd have to do fine grained ACLs on all HFiles, and granting a user access to a table would require granting them access to the files involved in that table only.

        Show
        ryan rawson added a comment - The fine grained approach doesnt make sense to me. Who owns the hfiles? I would expect the 'hbase' user to own it. If this is so, then why would we want to delegate end user permissions all the way to hdfs - unless we wanted to have ACL on HFiles depending on the user involved? I would expect the client > hbase and hbase > hdfs security to be separate concerns. Otherwise we'd have to do fine grained ACLs on all HFiles, and granting a user access to a table would require granting them access to the files involved in that table only.
        Hide
        Andrew Purtell added a comment -

        It's about trust for enforcement of isolation. HBase can run as a single principal across the whole cluster from the HDFS perspective no matter the details of HBase internal security model. So as you put it the "hbase" user would own the hfiles.

        Otherwise we'd have to do fine grained ACLs on all HFiles, and granting a user access to a table would require granting them access to the files involved in that table only.

        That's one option.

        Another is a scheme where HBase users are mapped to HBase roles which are mapped to HDFS users which are aggregated as HDFS groups:

        HBase user -> HBase role -> HDFS user -> HDFS group

        This can provide some flexibility for various configurations from simple-but-no-isolation to complex-but-paranoid. This scheme would have the DFSClients in the region servers operate with multiple delegation tokens from HDFS in a pass through manner.

        The trade off is some complexity for being able to get some assurance of isolation even if HBase is "broken" in some way.

        Show
        Andrew Purtell added a comment - It's about trust for enforcement of isolation. HBase can run as a single principal across the whole cluster from the HDFS perspective no matter the details of HBase internal security model. So as you put it the "hbase" user would own the hfiles. Otherwise we'd have to do fine grained ACLs on all HFiles, and granting a user access to a table would require granting them access to the files involved in that table only. That's one option. Another is a scheme where HBase users are mapped to HBase roles which are mapped to HDFS users which are aggregated as HDFS groups: HBase user -> HBase role -> HDFS user -> HDFS group This can provide some flexibility for various configurations from simple-but-no-isolation to complex-but-paranoid. This scheme would have the DFSClients in the region servers operate with multiple delegation tokens from HDFS in a pass through manner. The trade off is some complexity for being able to get some assurance of isolation even if HBase is "broken" in some way.
        Andrew Purtell made changes -
        Field Original Value New Value
        Summary [DAC] HDFS and ZK access delegation [DAC] HDFS and ZK access delegation (isolation)
        Todd Lipcon made changes -
        Component/s security [ 12314192 ]
        Hide
        Andrew Purtell added a comment -

        Should have been a brainstorming JIRA. Storm fizzled out

        Show
        Andrew Purtell added a comment - Should have been a brainstorming JIRA. Storm fizzled out
        Andrew Purtell made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Not a Problem [ 8 ]

          People

          • Assignee:
            Unassigned
            Reporter:
            Andrew Purtell
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development