NiFi currently supports these group mappings:
Benefits of Unix or Hadoop based group identity:
- File is now allowed any many environments as it's not integrated with the corporations identity system (typically AD or another LDAP).
- LDAP adds a lot of complexity, overhead, and tons of up-front config per environment.
- Most services, including those related to Hadoop, are moving to Unix based group lookups.
How it works and some possible method to implement:
- Linux users/groups come from the "Name Service Switch" (aka "nss").
- Here are the Linux native commands to query it:
- show all users: `getent passwd`
- show specific user: `getent group %s`
- show all groups: `getent group`
- show specific group: `getent group %s`
- hadoop-common has it's own libraries for accessing `nss`. Example of Knox's implementation: https://github.com/apache/knox/blob/master/gateway-provider-identity-assertion-hadoop-groups/src/main/java/org/apache/knox/gateway/identityasserter/hadoop/groups/filter/HadoopGroupProviderFilter.java
- ranger usersync calls `getent` directly: https://github.com/apache/ranger/blob/da29d1929a54b2b579a74da32e5ea074d0f8e15d/ugsync/src/main/java/org/apache/ranger/unixusersync/process/UnixUserGroupBuilder.java#L49-L51
- there are modules in most programming languages to access `nss`.
Advantages of this method:
- ability to get users/groups from multiple LDAP directories. (very common requirement)
- little to no configuration: There is nothing customer or cluster specific to configure. (i.e. it "just works" with the default configs).
- ease of change: if all services use this approach, you don't have to update dozens of services every time LDAP changes.
- a lot less overhead: The OS has the users/groups. All of the services running the OS simply check locally for users/groups. Spares the LDAP servers and our machines from doing all the lookups.
- group names are guaranteed to be consistent across services.
- as this is becoming the standard in Knox, it makes KnoxSSO more stable.
- less concern about LDAP credentials all over the place.
- easier to pass security/compliance tests since we are utilising the customers existing identity infrastructure.