Here is a preliminary version of implementing Erasure coding in HDFS.
This package implements a Distributed Raid File System. It is used alongwith
an instance of the Hadoop Distributed File System (HDFS). It can be used to
provide better protection against data corruption. It can also be used to
reduce the total storage requirements of HDFS.
Distributed Raid File System consists of two main software components. The first component
is the RaidNode, a daemon that creates parity files from specified HDFS files.
The second component "raidfs" is a software that is layered over a HDFS client and it
intercepts all calls that an application makes to the HDFS client. If HDFS encounters
corrupted data while reading a file, the raidfs client detects it; it uses the
relevant parity blocks to recover the corrupted data (if possible) and returns
the data to the application. The application is completely transparent to the
fact that parity data was used to satisfy it's read request.
The primary use of this feature is to save disk space for HDFS files.
HDFS typically stores data in triplicate.
The Distributed Raid File System can be configured in such a way that a set of
data blocks of a file are combined together to form one or more parity blocks.
This allows one to reduce the replication factor of a HDFS file from 3 to 2
while keeping the failure probabilty relatively same as before. This typically
results in saving 25% to 30% of storage space in a HDFS cluster.
The RaidNode periodically scans all the specified paths in the configuration
file. For each path, it recursively scans all files that have more than 2 blocks
and that has not been modified during the last few hours (default is 24 hours).
It picks the specified number of blocks (as specified by the stripe size),
from the file, generates a parity block by combining them and
stores the results as another HDFS file in the specified destination
directory. There is a one-to-one mapping between a HDFS
file and its parity file. The RaidNode also periodically finds parity files
that are orphaned and deletes them.
The Distributed Raid FileSystem is layered over a DistributedFileSystem
instance intercepts all calls that go into HDFS. HDFS throws a ChecksumException
or a BlocMissingException when a file read encounters bad data. The layered
Distributed Raid FileSystem catches these exceptions, locates the corresponding
parity file, extract the original data from the parity files and feeds the
extracted data back to the application in a completely transparent way.
The layered Distributed Raid FileSystem does not fix the data-loss that it
encounters while serving data. It merely make the application transparently
use the parity blocks to re-create the original data. A command line tool
"fsckraid" is currently under development that will fix the corrupted files
by extracting the data from the associated parity files. An adminstrator
can run "fsckraid" manually as and when needed.
More details in src/contrib/raid/README