Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-7285

Erasure Coding Support inside HDFS

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.0.0-alpha1
    • None
    • None
    • Reviewed
    • Hide
      <!-- markdown -->
      HDFS now provides native support for erasure coding (EC) to store data more efficiently. Each individual directory can be configured with an EC policy with command `hdfs erasurecode -setPolicy`. When a file is created, it will inherit the EC policy from its nearest ancestor directory to determine how its blocks are stored. Compared to 3-way replication, the default EC policy saves 50% of storage space while also tolerating more storage failures.

      To support small files, the currently phase of HDFS-EC stores blocks in _striped_ layout, where a logical file block is divided into small units (64KB by default) and distributed to a set of DataNodes. This enables parallel I/O but also decreases data locality. Therefore, the cluster environment and I/O workloads should be considered before configuring EC policies.
      Show
      <!-- markdown --> HDFS now provides native support for erasure coding (EC) to store data more efficiently. Each individual directory can be configured with an EC policy with command `hdfs erasurecode -setPolicy`. When a file is created, it will inherit the EC policy from its nearest ancestor directory to determine how its blocks are stored. Compared to 3-way replication, the default EC policy saves 50% of storage space while also tolerating more storage failures. To support small files, the currently phase of HDFS-EC stores blocks in _striped_ layout, where a logical file block is divided into small units (64KB by default) and distributed to a set of DataNodes. This enables parallel I/O but also decreases data locality. Therefore, the cluster environment and I/O workloads should be considered before configuring EC policies.

    Description

      Erasure Coding (EC) can greatly reduce the storage overhead without sacrifice of data reliability, comparing to the existing HDFS 3-replica approach. For example, if we use a 10+4 Reed Solomon coding, we can allow loss of 4 blocks, with storage overhead only being 40%. This makes EC a quite attractive alternative for big data storage, particularly for cold data.

      Facebook had a related open source project called HDFS-RAID. It used to be one of the contribute packages in HDFS but had been removed since Hadoop 2.0 for maintain reason. The drawbacks are: 1) it is on top of HDFS and depends on MapReduce to do encoding and decoding tasks; 2) it can only be used for cold files that are intended not to be appended anymore; 3) the pure Java EC coding implementation is extremely slow in practical use. Due to these, it might not be a good idea to just bring HDFS-RAID back.

      We (Intel and Cloudera) are working on a design to build EC into HDFS that gets rid of any external dependencies, makes it self-contained and independently maintained. This design lays the EC feature on the storage type support and considers compatible with existing HDFS features like caching, snapshot, encryption, high availability and etc. This design will also support different EC coding schemes, implementations and policies for different deployment scenarios. By utilizing advanced libraries (e.g. Intel ISA-L library), an implementation can greatly improve the performance of EC encoding/decoding and makes the EC solution even more attractive. We will post the design document soon.

      Attachments

        1. 1619363340018.png
          76 kB
          Tsz-wo Sze
        2. HDFS-7285-Consolidated-20150911.patch
          1.20 MB
          Zhe Zhang
        3. HDFSErasureCodingSystemTestReport-20150826.pdf
          218 kB
          Rui Gao
        4. Compare-consolidated-20150824.diff
          72 kB
          Zhe Zhang
        5. HDFSErasureCodingSystemTestPlan-20150824.pdf
          72 kB
          Rui Gao
        6. Consolidated-20150810.patch
          1.23 MB
          Zhe Zhang
        7. Consolidated-20150806.patch
          1.24 MB
          Zhe Zhang
        8. Consolidated-20150707.patch
          1.02 MB
          Zhe Zhang
        9. HDFS-7285-merge-consolidated.trunk.04.patch
          1.03 MB
          Vinayakumar B
        10. HDFS-7285-merge-consolidated.trunk.03.patch
          1.04 MB
          Vinayakumar B
        11. HDFS-EC-merge-consolidated-01.patch
          1.06 MB
          Zhe Zhang
        12. HDFS-7285-merge-consolidated-trunk-01.patch
          1.06 MB
          Vinayakumar B
        13. HDFS-7285-merge-consolidated-01.patch
          1.06 MB
          Vinayakumar B
        14. HDFS-EC-Merge-PoC-20150624.patch
          811 kB
          Zhe Zhang
        15. HDFS-bistriped.patch
          19 kB
          Zhe Zhang
        16. HDFSErasureCodingPhaseITestPlan.pdf
          111 kB
          Zhe Zhang
        17. HDFS-7285-initial-PoC.patch
          470 kB
          Zhe Zhang
        18. HDFSErasureCodingDesign-20150206.pdf
          1.42 MB
          Tsz-wo Sze
        19. HDFSErasureCodingDesign-20150204.pdf
          1.40 MB
          Tsz-wo Sze
        20. ECParser.py
          5 kB
          Zhe Zhang
        21. ECAnalyzer.py
          2 kB
          Zhe Zhang
        22. fsimage-analysis-20150105.pdf
          82 kB
          Zhe Zhang
        23. HDFSErasureCodingDesign-20141217.pdf
          1.59 MB
          Zhe Zhang
        24. HDFSErasureCodingDesign-20141028.pdf
          1.98 MB
          Zhe Zhang

        Issue Links

          There are no Sub-Tasks for this issue.

          Activity

            People

              zhz Zhe Zhang
              whjiang Weihua Jiang
              Votes:
              4 Vote for this issue
              Watchers:
              129 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: