Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-9806

Allow HDFS block replicas to be provided by an external storage system

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.1.0
    • Component/s: None
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      Provided storage allows data stored outside HDFS to be mapped to and addressed from HDFS. It builds on heterogeneous storage by introducing a new storage type, PROVIDED, to the set of media in a datanode. Clients accessing data in PROVIDED storages can cache replicas in local media, enforce HDFS invariants (e.g., security, quotas), and address more data than the cluster could persist in the storage attached to DataNodes.
      Show
      Provided storage allows data stored outside HDFS to be mapped to and addressed from HDFS. It builds on heterogeneous storage by introducing a new storage type, PROVIDED, to the set of media in a datanode. Clients accessing data in PROVIDED storages can cache replicas in local media, enforce HDFS invariants (e.g., security, quotas), and address more data than the cluster could persist in the storage attached to DataNodes.

      Description

      In addition to heterogeneous media, many applications work with heterogeneous storage systems. The guarantees and semantics provided by these systems are often similar, but not identical to those of HDFS. Any client accessing multiple storage systems is responsible for reasoning about each system independently, and must propagate/and renew credentials for each store.

      Remote stores could be mounted under HDFS. Block locations could be mapped to immutable file regions, opaque IDs, or other tokens that represent a consistent view of the data. While correctness for arbitrary operations requires careful coordination between stores, in practice we can provide workable semantics with weaker guarantees.

        Attachments

        1. HDFS-9806.001.patch
          501 kB
          Virajith Jalaparti
        2. HDFS-9806.002.patch
          500 kB
          Virajith Jalaparti
        3. HDFS-9806.003.patch
          500 kB
          Virajith Jalaparti
        4. HDFS-9806-design.001.pdf
          360 kB
          Chris Douglas
        5. HDFS-9806-design.002.pdf
          429 kB
          Virajith Jalaparti

          Issue Links

          1.
          [READ] Datanode support to read from external stores. Sub-task Resolved Virajith Jalaparti
          2.
          [READ] Add tool generating FSImage from external store Sub-task Resolved Chris Douglas
          3.
          [READ] Namenode support for data stored in external stores. Sub-task Resolved Virajith Jalaparti
          4.
          [READ] ProvidedReplica should return an InputStream that is bounded by its length Sub-task Resolved Virajith Jalaparti
          5.
          [READ] Fix NullPointerException in ProvidedBlocksBuilder Sub-task Resolved Virajith Jalaparti
          6.
          [READ] Tests for ProvidedStorageMap Sub-task Resolved Virajith Jalaparti
          7.
          [READ] Handle failures of Datanode with PROVIDED storage Sub-task Resolved Virajith Jalaparti
          8.
          [READ] Test for increasing replication of provided files. Sub-task Resolved Virajith Jalaparti
          9.
          [READ] Test cases for ProvidedVolumeDF and ProviderBlockIteratorImpl Sub-task Resolved Virajith Jalaparti
          10.
          [READ] Datanodes should use a unique identifier when reading from external stores Sub-task Resolved Virajith Jalaparti
          11.
          [READ] Merge BlockFormatProvider and FileRegionProvider. Sub-task Resolved Virajith Jalaparti
          12.
          [READ] Check that the replicas served from a {{ProvidedVolumeImpl}} belong to the correct external storage Sub-task Resolved Virajith Jalaparti
          13.
          [READ] Share remoteFS between ProvidedReplica instances. Sub-task Resolved Virajith Jalaparti
          14.
          [READ] HDFS-12091 breaks the tests for provided block reads Sub-task Resolved Virajith Jalaparti
          15.
          [READ] Fix errors in image generation tool from latest rebase Sub-task Resolved Virajith Jalaparti
          16.
          [READ] TestNameNodeProvidedImplementation#testProvidedDatanodeFailures fails after rebase Sub-task Resolved Virajith Jalaparti
          17.
          [READ] Even one dead datanode with PROVIDED storage results in ProvidedStorageInfo being marked as FAILED Sub-task Resolved Virajith Jalaparti
          18.
          [READ] FsVolumeImpl exception when scanning Provided storage volume Sub-task Resolved Virajith Jalaparti
          19.
          [READ] Test NameNode restarts when PROVIDED is configured Sub-task Resolved Virajith Jalaparti
          20.
          [AliasMap] Create a version of the AliasMap that runs in memory in the Namenode (leveldb) Sub-task Resolved Ewan Higgs
          21.
          [READ] Implement LevelDBFileRegionFormat Sub-task Resolved Ewan Higgs
          22.
          [READ] Refactor FileRegion and BlockAliasMap to separate out HDFS metadata and PROVIDED storage metadata Sub-task Resolved Ewan Higgs
          23.
          [READ] Fix reporting of Provided volumes Sub-task Resolved Virajith Jalaparti
          24.
          [READ] Increasing replication for PROVIDED files should create local replicas Sub-task Resolved Virajith Jalaparti
          25.
          [READ] Reduce memory and CPU footprint for PROVIDED volumes. Sub-task Resolved Virajith Jalaparti
          26.
          [READ] Report multiple locations for PROVIDED blocks Sub-task Resolved Virajith Jalaparti
          27.
          [READ] Allow cluster id to be specified to the Image generation tool Sub-task Resolved Virajith Jalaparti
          28.
          [READ] Image generation tool does not close an opened stream Sub-task Resolved Virajith Jalaparti
          29.
          [READ] Fix the randomized selection of locations in {{ProvidedBlocksBuilder}}. Sub-task Resolved Virajith Jalaparti
          30.
          [READ] Documentation for provided storage Sub-task Resolved Virajith Jalaparti
          31.
          Add visibility/stability annotations Sub-task Resolved Chris Douglas
          32.
          [READ] Support replication of Provided blocks with non-default topologies. Sub-task Resolved Virajith Jalaparti
          33.
          [READ] Allow Datanodes with Provided volumes to start when blocks with the same id exist locally Sub-task Resolved Virajith Jalaparti
          34.
          [READ] Skip setting block count of ProvidedDatanodeStorageInfo on DN registration update Sub-task Resolved Virajith Jalaparti
          35.
          [READ] Fix closing streams in ImageWriter Sub-task Resolved Virajith Jalaparti
          36.
          [READ] Handle decommissioning and under-maintenance Datanodes with Provided storage. Sub-task Resolved Virajith Jalaparti
          37.
          [9806] Code style cleanup Sub-task Resolved Virajith Jalaparti
          38.
          [READ] Fix configuration and implementation of LevelDB-based alias maps Sub-task Resolved Virajith Jalaparti

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                chris.douglas Chris Douglas
              • Votes:
                0 Vote for this issue
                Watchers:
                73 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: