Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-9806

Allow HDFS block replicas to be provided by an external storage system

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.1.0
    • Component/s: None
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      Provided storage allows data stored outside HDFS to be mapped to and addressed from HDFS. It builds on heterogeneous storage by introducing a new storage type, PROVIDED, to the set of media in a datanode. Clients accessing data in PROVIDED storages can cache replicas in local media, enforce HDFS invariants (e.g., security, quotas), and address more data than the cluster could persist in the storage attached to DataNodes.
      Show
      Provided storage allows data stored outside HDFS to be mapped to and addressed from HDFS. It builds on heterogeneous storage by introducing a new storage type, PROVIDED, to the set of media in a datanode. Clients accessing data in PROVIDED storages can cache replicas in local media, enforce HDFS invariants (e.g., security, quotas), and address more data than the cluster could persist in the storage attached to DataNodes.

      Description

      In addition to heterogeneous media, many applications work with heterogeneous storage systems. The guarantees and semantics provided by these systems are often similar, but not identical to those of HDFS. Any client accessing multiple storage systems is responsible for reasoning about each system independently, and must propagate/and renew credentials for each store.

      Remote stores could be mounted under HDFS. Block locations could be mapped to immutable file regions, opaque IDs, or other tokens that represent a consistent view of the data. While correctness for arbitrary operations requires careful coordination between stores, in practice we can provide workable semantics with weaker guarantees.

        Attachments

        1. HDFS-9806-design.001.pdf
          360 kB
          Christopher Douglas
        2. HDFS-9806-design.002.pdf
          429 kB
          Virajith Jalaparti
        3. HDFS-9806.001.patch
          501 kB
          Virajith Jalaparti
        4. HDFS-9806.002.patch
          500 kB
          Virajith Jalaparti
        5. HDFS-9806.003.patch
          500 kB
          Virajith Jalaparti

        Issue Links

        1.
        [READ] Datanode support to read from external stores. Sub-task Resolved Virajith Jalaparti Actions
        2.
        [READ] Add tool generating FSImage from external store Sub-task Resolved Christopher Douglas Actions
        3.
        [READ] Namenode support for data stored in external stores. Sub-task Resolved Virajith Jalaparti Actions
        4.
        [READ] ProvidedReplica should return an InputStream that is bounded by its length Sub-task Resolved Virajith Jalaparti Actions
        5.
        [READ] Fix NullPointerException in ProvidedBlocksBuilder Sub-task Resolved Virajith Jalaparti Actions
        6.
        [READ] Tests for ProvidedStorageMap Sub-task Resolved Virajith Jalaparti Actions
        7.
        [READ] Handle failures of Datanode with PROVIDED storage Sub-task Resolved Virajith Jalaparti Actions
        8.
        [READ] Test for increasing replication of provided files. Sub-task Resolved Virajith Jalaparti Actions
        9.
        [READ] Test cases for ProvidedVolumeDF and ProviderBlockIteratorImpl Sub-task Resolved Virajith Jalaparti Actions
        10.
        [READ] Datanodes should use a unique identifier when reading from external stores Sub-task Resolved Virajith Jalaparti Actions
        11.
        [READ] Merge BlockFormatProvider and FileRegionProvider. Sub-task Resolved Virajith Jalaparti Actions
        12.
        [READ] Check that the replicas served from a {{ProvidedVolumeImpl}} belong to the correct external storage Sub-task Resolved Virajith Jalaparti Actions
        13.
        [READ] Share remoteFS between ProvidedReplica instances. Sub-task Resolved Virajith Jalaparti Actions
        14.
        [READ] HDFS-12091 breaks the tests for provided block reads Sub-task Resolved Virajith Jalaparti Actions
        15.
        [READ] Fix errors in image generation tool from latest rebase Sub-task Resolved Virajith Jalaparti Actions
        16.
        [READ] TestNameNodeProvidedImplementation#testProvidedDatanodeFailures fails after rebase Sub-task Resolved Virajith Jalaparti Actions
        17.
        [READ] Even one dead datanode with PROVIDED storage results in ProvidedStorageInfo being marked as FAILED Sub-task Resolved Virajith Jalaparti Actions
        18.
        [READ] FsVolumeImpl exception when scanning Provided storage volume Sub-task Resolved Virajith Jalaparti Actions
        19.
        [READ] Test NameNode restarts when PROVIDED is configured Sub-task Resolved Virajith Jalaparti Actions
        20.
        [AliasMap] Create a version of the AliasMap that runs in memory in the Namenode (leveldb) Sub-task Resolved Ewan Higgs Actions
        21.
        [READ] Implement LevelDBFileRegionFormat Sub-task Resolved Ewan Higgs Actions
        22.
        [READ] Refactor FileRegion and BlockAliasMap to separate out HDFS metadata and PROVIDED storage metadata Sub-task Resolved Ewan Higgs Actions
        23.
        [READ] Fix reporting of Provided volumes Sub-task Resolved Virajith Jalaparti Actions
        24.
        [READ] Increasing replication for PROVIDED files should create local replicas Sub-task Resolved Virajith Jalaparti Actions
        25.
        [READ] Reduce memory and CPU footprint for PROVIDED volumes. Sub-task Resolved Virajith Jalaparti Actions
        26.
        [READ] Report multiple locations for PROVIDED blocks Sub-task Resolved Virajith Jalaparti Actions
        27.
        [READ] Allow cluster id to be specified to the Image generation tool Sub-task Resolved Virajith Jalaparti Actions
        28.
        [READ] Image generation tool does not close an opened stream Sub-task Resolved Virajith Jalaparti Actions
        29.
        [READ] Fix the randomized selection of locations in {{ProvidedBlocksBuilder}}. Sub-task Resolved Virajith Jalaparti Actions
        30.
        [READ] Documentation for provided storage Sub-task Resolved Virajith Jalaparti Actions
        31.
        Add visibility/stability annotations Sub-task Resolved Christopher Douglas Actions
        32.
        [READ] Support replication of Provided blocks with non-default topologies. Sub-task Resolved Virajith Jalaparti Actions
        33.
        [READ] Allow Datanodes with Provided volumes to start when blocks with the same id exist locally Sub-task Resolved Virajith Jalaparti Actions
        34.
        [READ] Skip setting block count of ProvidedDatanodeStorageInfo on DN registration update Sub-task Resolved Virajith Jalaparti Actions
        35.
        [READ] Fix closing streams in ImageWriter Sub-task Resolved Virajith Jalaparti Actions
        36.
        [READ] Handle decommissioning and under-maintenance Datanodes with Provided storage. Sub-task Resolved Virajith Jalaparti Actions
        37.
        [9806] Code style cleanup Sub-task Resolved Virajith Jalaparti Actions
        38.
        [READ] Fix configuration and implementation of LevelDB-based alias maps Sub-task Resolved Virajith Jalaparti Actions

          Activity

          $i18n.getText('security.level.explanation', $currentSelection) Viewable by All Users
          Cancel

            People

            • Assignee:
              Unassigned Assign to me
              Reporter:
              cdouglas Christopher Douglas

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment