Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.0.0-alpha1
    • Component/s: None
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      <!-- markdown -->
      HDFS now provides native support for erasure coding (EC) to store data more efficiently. Each individual directory can be configured with an EC policy with command `hdfs erasurecode -setPolicy`. When a file is created, it will inherit the EC policy from its nearest ancestor directory to determine how its blocks are stored. Compared to 3-way replication, the default EC policy saves 50% of storage space while also tolerating more storage failures.

      To support small files, the currently phase of HDFS-EC stores blocks in _striped_ layout, where a logical file block is divided into small units (64KB by default) and distributed to a set of DataNodes. This enables parallel I/O but also decreases data locality. Therefore, the cluster environment and I/O workloads should be considered before configuring EC policies.
      Show
      <!-- markdown --> HDFS now provides native support for erasure coding (EC) to store data more efficiently. Each individual directory can be configured with an EC policy with command `hdfs erasurecode -setPolicy`. When a file is created, it will inherit the EC policy from its nearest ancestor directory to determine how its blocks are stored. Compared to 3-way replication, the default EC policy saves 50% of storage space while also tolerating more storage failures. To support small files, the currently phase of HDFS-EC stores blocks in _striped_ layout, where a logical file block is divided into small units (64KB by default) and distributed to a set of DataNodes. This enables parallel I/O but also decreases data locality. Therefore, the cluster environment and I/O workloads should be considered before configuring EC policies.

      Description

      Erasure Coding (EC) can greatly reduce the storage overhead without sacrifice of data reliability, comparing to the existing HDFS 3-replica approach. For example, if we use a 10+4 Reed Solomon coding, we can allow loss of 4 blocks, with storage overhead only being 40%. This makes EC a quite attractive alternative for big data storage, particularly for cold data.

      Facebook had a related open source project called HDFS-RAID. It used to be one of the contribute packages in HDFS but had been removed since Hadoop 2.0 for maintain reason. The drawbacks are: 1) it is on top of HDFS and depends on MapReduce to do encoding and decoding tasks; 2) it can only be used for cold files that are intended not to be appended anymore; 3) the pure Java EC coding implementation is extremely slow in practical use. Due to these, it might not be a good idea to just bring HDFS-RAID back.

      We (Intel and Cloudera) are working on a design to build EC into HDFS that gets rid of any external dependencies, makes it self-contained and independently maintained. This design lays the EC feature on the storage type support and considers compatible with existing HDFS features like caching, snapshot, encryption, high availability and etc. This design will also support different EC coding schemes, implementations and policies for different deployment scenarios. By utilizing advanced libraries (e.g. Intel ISA-L library), an implementation can greatly improve the performance of EC encoding/decoding and makes the EC solution even more attractive. We will post the design document soon.

        Attachments

        1. HDFS-7285-Consolidated-20150911.patch
          1.20 MB
          Zhe Zhang
        2. HDFSErasureCodingSystemTestReport-20150826.pdf
          218 kB
          Rui Gao
        3. Compare-consolidated-20150824.diff
          72 kB
          Zhe Zhang
        4. HDFSErasureCodingSystemTestPlan-20150824.pdf
          72 kB
          Rui Gao
        5. Consolidated-20150810.patch
          1.23 MB
          Zhe Zhang
        6. Consolidated-20150806.patch
          1.24 MB
          Zhe Zhang
        7. Consolidated-20150707.patch
          1.02 MB
          Zhe Zhang
        8. HDFS-7285-merge-consolidated.trunk.04.patch
          1.03 MB
          Vinayakumar B
        9. HDFS-7285-merge-consolidated.trunk.03.patch
          1.04 MB
          Vinayakumar B
        10. HDFS-EC-merge-consolidated-01.patch
          1.06 MB
          Zhe Zhang
        11. HDFS-7285-merge-consolidated-trunk-01.patch
          1.06 MB
          Vinayakumar B
        12. HDFS-7285-merge-consolidated-01.patch
          1.06 MB
          Vinayakumar B
        13. HDFS-EC-Merge-PoC-20150624.patch
          811 kB
          Zhe Zhang
        14. HDFS-bistriped.patch
          19 kB
          Zhe Zhang
        15. HDFSErasureCodingPhaseITestPlan.pdf
          111 kB
          Zhe Zhang
        16. HDFS-7285-initial-PoC.patch
          470 kB
          Zhe Zhang
        17. HDFSErasureCodingDesign-20150206.pdf
          1.42 MB
          Tsz Wo Nicholas Sze
        18. HDFSErasureCodingDesign-20150204.pdf
          1.40 MB
          Tsz Wo Nicholas Sze
        19. ECParser.py
          5 kB
          Zhe Zhang
        20. ECAnalyzer.py
          2 kB
          Zhe Zhang
        21. fsimage-analysis-20150105.pdf
          82 kB
          Zhe Zhang
        22. HDFSErasureCodingDesign-20141217.pdf
          1.59 MB
          Zhe Zhang
        23. HDFSErasureCodingDesign-20141028.pdf
          1.98 MB
          Zhe Zhang

          Issue Links

          There are no Sub-Tasks for this issue.

            Activity

              People

              • Assignee:
                zhz Zhe Zhang
                Reporter:
                whjiang Weihua Jiang
              • Votes:
                4 Vote for this issue
                Watchers:
                137 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: