Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-17072

Add getClusterRoot and getClusterRoots methods to FileSystem and ViewFilesystem

    XMLWordPrintableJSON

Details

    • Task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • fs, viewfs
    • None

    Description

      In a federated setting (HDFS federation, federation across multiple buckets on S3, multiple containers across Azure storage), certain system tools/pipelines require the ability to map paths to the clusters/accounts.

      Consider the example of GDPR compliance/retention jobs that need to go over various datasets, ingested over a period of T days and remove/quarantine datasets that are not properly annotated/have reached their retention period. Such jobs can rely on renames to a global trash/quarantine directory to accomplish their task. However, in a federated setting, efficient, atomic renames (as those within a single HDFS cluster) are not supported across the different clusters/shards in federation. As a result, such jobs will need to leverage a trash/quarantine directory per cluster/shard. Further, they would need to map from a particular path to the cluster/shard that contains this path.

      To address such cases, this JIRA proposes to get add two new methods to FileSystem: getClusterRoot and getClusterRoots().

      Attachments

        1. HADOOP-17072.001.patch
          7 kB
          Virajith Jalaparti

        Activity

          People

            virajith Virajith Jalaparti
            virajith Virajith Jalaparti
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: