Details
-
New Feature
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
Reviewed
-
Description
The goal of this JIRA is to discuss how the cost of raw storage for a HDFS file system can be reduced. Keeping three copies of the same data is very costly, especially when the size of storage is huge. One idea is to reduce the replication factor and do erasure coding of a set of blocks so that the over probability of failure of a block remains the same as before.
Many forms of error-correcting codes are available, see http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has described DiskReduce https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt.
My opinion is to discuss implementation strategies that are not part of base HDFS, but is a layer on top of HDFS.
Attachments
Attachments
Issue Links
- blocks
-
HDFS-600 Support for pluggable erasure coding policy for HDFS
- Resolved
- is depended upon by
-
HDFS-582 Create a fsckraid tool to verify the consistency of erasure codes for HDFS-503
- Resolved
- relates to
-
HDFS-7285 Erasure Coding Support inside HDFS
- Resolved
-
MAPREDUCE-1837 Raid should store the metadata in HDFS
- Resolved
-
MAPREDUCE-2036 Enable Erasure Code in Tool similar to Hadoop Archive
- Resolved