Details
-
New Feature
-
Status: In Progress
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Given the primary business value of Apache Ranger is to enable sharing of resources, it will help if Apache Ranger provides an abstraction that enables a set of resources/data across services, a dataset, to be the unit of sharing instead of one or more resources in each service. This has several benefits, like:
- A single policy to manage access to data in multiple services - like HBase, Hive, Snowflake, Kafka, Google BigQuery, AWS S3, AWS Redshift, ADLS-Gen2. This enables authorization to be centered around a purpose, like:
- Marketing Campaign 2022 dataset
- Sales 2021 dataset
- CA Claims 2021 dataset
- Enables different set of users to manage sharing data into a dataset and manage access to the data in a dataset:
- Data owners share data into a dataset, with necessary masking, row-filters and schedules; they can update the share details, including stop sharing into a dataset.
- Dataset admins manage who has access to the data in the dataset. This relieves data owners from having to micromanage access to the shared data, for example when a user needs access to the data in multiple services to participate in a project.
Attached document has more details on this new abstraction, including a number of questions & answers that to help understand various aspects of this feature. Please read and add your comments/suggestions.