Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-17072

Add getClusterRoot and getClusterRoots methods to FileSystem and ViewFilesystem

    XMLWordPrintableJSON

    Details

    • Type: Task
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: fs, viewfs
    • Labels:
      None

      Description

      In a federated setting (HDFS federation, federation across multiple buckets on S3, multiple containers across Azure storage), certain system tools/pipelines require the ability to map paths to the clusters/accounts.

      Consider the example of GDPR compliance/retention jobs that need to go over various datasets, ingested over a period of T days and remove/quarantine datasets that are not properly annotated/have reached their retention period. Such jobs can rely on renames to a global trash/quarantine directory to accomplish their task. However, in a federated setting, efficient, atomic renames (as those within a single HDFS cluster) are not supported across the different clusters/shards in federation. As a result, such jobs will need to leverage a trash/quarantine directory per cluster/shard. Further, they would need to map from a particular path to the cluster/shard that contains this path.

      To address such cases, this JIRA proposes to get add two new methods to FileSystem: getClusterRoot and getClusterRoots().

        Attachments

        1. HADOOP-17072.001.patch
          7 kB
          Virajith Jalaparti

          Activity

            People

            • Assignee:
              virajith Virajith Jalaparti
              Reporter:
              virajith Virajith Jalaparti
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated: