[HDFS-7285] Erasure Coding Support inside HDFS - ASF JIRA

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 3.0.0-alpha1
Component/s: None
Labels:
None

Target Version/s:
Hadoop Flags:

Reviewed
Release Note:

Hide

HDFS now provides native support for erasure coding (EC) to store data more efficiently. Each individual directory can be configured with an EC policy with command `hdfs erasurecode -setPolicy`. When a file is created, it will inherit the EC policy from its nearest ancestor directory to determine how its blocks are stored. Compared to 3-way replication, the default EC policy saves 50% of storage space while also tolerating more storage failures.

To support small files, the currently phase of HDFS-EC stores blocks in _striped_ layout, where a logical file block is divided into small units (64KB by default) and distributed to a set of DataNodes. This enables parallel I/O but also decreases data locality. Therefore, the cluster environment and I/O workloads should be considered before configuring EC policies.

Show
 HDFS now provides native support for erasure coding (EC) to store data more efficiently. Each individual directory can be configured with an EC policy with command `hdfs erasurecode -setPolicy`. When a file is created, it will inherit the EC policy from its nearest ancestor directory to determine how its blocks are stored. Compared to 3-way replication, the default EC policy saves 50% of storage space while also tolerating more storage failures. To support small files, the currently phase of HDFS-EC stores blocks in _striped_ layout, where a logical file block is divided into small units (64KB by default) and distributed to a set of DataNodes. This enables parallel I/O but also decreases data locality. Therefore, the cluster environment and I/O workloads should be considered before configuring EC policies.

Description

Erasure Coding (EC) can greatly reduce the storage overhead without sacrifice of data reliability, comparing to the existing HDFS 3-replica approach. For example, if we use a 10+4 Reed Solomon coding, we can allow loss of 4 blocks, with storage overhead only being 40%. This makes EC a quite attractive alternative for big data storage, particularly for cold data.

Facebook had a related open source project called HDFS-RAID. It used to be one of the contribute packages in HDFS but had been removed since Hadoop 2.0 for maintain reason. The drawbacks are: 1) it is on top of HDFS and depends on MapReduce to do encoding and decoding tasks; 2) it can only be used for cold files that are intended not to be appended anymore; 3) the pure Java EC coding implementation is extremely slow in practical use. Due to these, it might not be a good idea to just bring HDFS-RAID back.

We (Intel and Cloudera) are working on a design to build EC into HDFS that gets rid of any external dependencies, makes it self-contained and independently maintained. This design lays the EC feature on the storage type support and considers compatible with existing HDFS features like caching, snapshot, encryption, high availability and etc. This design will also support different EC coding schemes, implementations and policies for different deployment scenarios. By utilizing advanced libraries (e.g. Intel ISA-L library), an implementation can greatly improve the performance of EC encoding/decoding and makes the EC solution even more attractive. We will post the design document soon.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

1619363340018.png
29/Apr/21 06:19
76 kB
Tsz-wo Sze
HDFS-7285-Consolidated-20150911.patch
11/Sep/15 21:25
1.20 MB
Zhe Zhang
HDFSErasureCodingSystemTestReport-20150826.pdf
28/Aug/15 07:53
218 kB
Rui Gao
Compare-consolidated-20150824.diff
24/Aug/15 18:51
72 kB
Zhe Zhang
HDFSErasureCodingSystemTestPlan-20150824.pdf
24/Aug/15 02:18
72 kB
Rui Gao
Consolidated-20150810.patch
10/Aug/15 19:49
1.23 MB
Zhe Zhang
Consolidated-20150806.patch
07/Aug/15 06:38
1.24 MB
Zhe Zhang
Consolidated-20150707.patch
07/Jul/15 23:44
1.02 MB
Zhe Zhang
HDFS-7285-merge-consolidated.trunk.04.patch
04/Jul/15 12:21
1.03 MB
Vinayakumar B
HDFS-7285-merge-consolidated.trunk.03.patch
02/Jul/15 11:00
1.04 MB
Vinayakumar B
HDFS-EC-merge-consolidated-01.patch
01/Jul/15 23:30
1.06 MB
Zhe Zhang
HDFS-7285-merge-consolidated-trunk-01.patch
01/Jul/15 17:05
1.06 MB
Vinayakumar B
HDFS-7285-merge-consolidated-01.patch
01/Jul/15 12:12
1.06 MB
Vinayakumar B
HDFS-EC-Merge-PoC-20150624.patch
24/Jun/15 22:59
811 kB
Zhe Zhang
HDFS-bistriped.patch
19/Jun/15 19:25
19 kB
Zhe Zhang
HDFSErasureCodingPhaseITestPlan.pdf
09/Jun/15 05:33
111 kB
Zhe Zhang
HDFS-7285-initial-PoC.patch
06/Mar/15 22:18
470 kB
Zhe Zhang
HDFSErasureCodingDesign-20150206.pdf
07/Feb/15 02:04
1.42 MB
Tsz-wo Sze
HDFSErasureCodingDesign-20150204.pdf
04/Feb/15 21:27
1.40 MB
Tsz-wo Sze
ECParser.py
16/Jan/15 19:29
5 kB
Zhe Zhang
ECAnalyzer.py
16/Jan/15 19:29
2 kB
Zhe Zhang
fsimage-analysis-20150105.pdf
05/Jan/15 19:58
82 kB
Zhe Zhang
HDFSErasureCodingDesign-20141217.pdf
17/Dec/14 19:51
1.59 MB
Zhe Zhang
HDFSErasureCodingDesign-20141028.pdf
29/Oct/14 03:04
1.98 MB
Zhe Zhang

Issue Links

incorporates

HADOOP-11264 Common side changes for HDFS Erasure coding support

Resolved

is depended upon by

HDFS-8030 HDFS Erasure Coding Phase II -- EC with contiguous layout

In Progress

HDFS-8031 Follow-on work for erasure coding phase I (striping layout)

Open

is related to

HBASE-19954 Separate TestBlockReorder into individual tests to avoid ShutdownHook suppression error against hadoop3

Resolved

HDFS-503 Implement erasure coding as a layer on HDFS

Closed

HDFS-2832 Enable support for heterogeneous storages in HDFS - DN as a collection of storages

Closed

HDFS-6584 Support Archival Storage

Closed

HDFS-7343 HDFS smart storage management

Open

relates to

HADOOP-12633 Extend Erasure Code to support POWER Chip acceleration

Open

supercedes

MAPREDUCE-3868 Reenable Raid

Resolved

(3 is related to, 1 relates to, 1 supercedes)

Sub-Tasks

There are no Sub-Tasks for this issue.

Erasure Coding Support inside HDFS

Details

Description

Attachments

Attachments

Issue Links

Sub-Tasks

Activity

People

Dates