Description
In a federated setting (HDFS federation, federation across multiple buckets on S3, multiple containers across Azure storage), certain system tools/pipelines require the ability to map paths to the clusters/accounts.
Consider the example of GDPR compliance/retention jobs that need to go over various datasets, ingested over a period of T days and remove/quarantine datasets that are not properly annotated/have reached their retention period. Such jobs can rely on renames to a global trash/quarantine directory to accomplish their task. However, in a federated setting, efficient, atomic renames (as those within a single HDFS cluster) are not supported across the different clusters/shards in federation. As a result, such jobs will need to leverage a trash/quarantine directory per cluster/shard. Further, they would need to map from a particular path to the cluster/shard that contains this path.
To address such cases, this JIRA proposes to get add two new methods to FileSystem: getClusterRoot and getClusterRoots().