Affects Version/s: None
Fix Version/s: None
Consider implementing discretionary access control for HBase.
Access control has three aspects: authentication, authorization and audit.
- Authentication: Access is controlled by insisting on an authentication procedure to establish the identity of the user. The authentication procedure should minimally require a non-plaintext authentication factor (e.g. encrypted password with salt) and should ideally or at least optionally provide cryptographically strong confidence via public key certification.
- Authorization: Access is controlled by specifying rights to resources via an access control list (ACL). An ACL is a list of permissions attached to an object. The list specifies who or what is allowed to access the object and what operations are allowed to be performed on the object, f.e. create, update, read, or delete.
- Audit: Important actions taken by subjects should be logged for accountability, a chronological record which enables the full reconstruction and examination of a sequence of events, e.g. schema changes or data mutations. Logging activity should be protected from all subjects except for a restricted set with administrative privilege, perhaps to only a single super-user.
Discretionary access control means the access policy for an object is determined by the owner of the object. Every object in the system must have a valid owner. Owners can assign access rights and permissions to other users. The initial owner of an object is the subject who created it. If subjects are deleted from a system, ownership of objects owned by them should revert to some super-user or otherwise valid default.
HBase can enforce access policy at table, column family, or cell granularity. Cell granularity does not make much sense. An implementation which controls access at both the table and column family levels is recommended, though a first cut could consider control at the table level only. The initial set of permissions can be: Create (table schema or column family), update (table schema or column family), read (column family), delete (table or column family), execute (filters), and transfer ownership. The subject identities and access tokens could be stored in a new administrative table. ACLs on tables and column families can be stored in META.
Access other than read access to catalog and administrative tables should be restricted to a set of administrative users or perhaps a single super-user. A data mutation on a user table by a subject without administrative or superuser privilege which results in a table split is an implicit temporary privilege elevation where the regionserver or master updates the catalog tables as necessary to support the split.
Audit logging should be configurable on a per-table basis to avoid this overhead where it is not wanted.
Consider supporting external authentication and subject identification mechanisms with Java library support: RADIUS/TACACS, Kerberos, LDAP.
Consider logging audit trails to an HBase table (bigtable type schemas are natural for this) and optionally external logging options with Java library support – syslog, etc., or maybe commons-logging is sufficient and punt to administrator to set up appropriate commons-logging/log4j configurations for their needs.
HBASE-1002 is considered, and the option to support filtering via upload of (perhaps complex) bytecode produced by some little language compiler is implemented, the execute privilege could be extended in a manner similar to how stored procedures in SQL land execute either with the privilege of the current user or the (table/procedure) creator.