Given a cluster with an authorization provider configured (eg Sentry) and the paths covered by the provider are snapshotable, there was a change in behaviour in how the provider permissions and ACLs are applied to files in snapshots between the 2.x branch and Hadoop 3.0.
Eg, if we have the snapshotable path /data, which is Sentry managed. The ACLs below are provided by Sentry:
After taking a snapshot, the files in the snapshot do not see the provider permissions:
However pre-Hadoop 3.0 (when the attribute provider etc was extensively refactored) snapshots did get the provider permissions.
The reason is this code in FSDirectory.java which ultimately calls the attribute provider and passes the path we want permissions for:
Picks the last resolved Inode and if you then call node.getPathComponents, for a path like '/data/.snapshot/snap1/tab1' it will return /data/tab1. It resolves the snapshot path to its original location, but its still the snapshot inode.
However the logic passes 'iip.getPathComponents' which returns "/user/.snapshot/snap1/tab" to the provider.
The pre Hadoop 3.0 code passes the inode directly to the provider, and hence it only ever sees the path as "/user/data/tab1".
It is debatable which path should be passed to the provider - /user/.snapshot/snap1/tab or /data/tab1 in the case of snapshots. However as the behaviour has changed I feel we should ensure the old behaviour is retained.
It would also be fairly easy to provide a config switch so the provider gets the full snapshot path or the resolved path.