Integrating Bookkeeper into HDFS
BookKeeper is a system to reliably log streams of records (https://issues.apache.org/jira/browse/ZOOKEEPER-276). The NameNode is a natural target for such a system for being the metadata repository of the entire file system for HDFS.
The NameNode works by logging any modification to the file system on a separate stream and periodically merge the stream with the latest image of the file system. The standard version of HDFS only supports file-based logging on any number of devices. This patch provides BookKeeper logging.
The advantages of BookKeeper-logging over file-logging are:
- higher throughput performance in large deployments
- higher availability though externally recoverable log files (ledgers)
This patch provides:
HOW TO APPLY THE PATCH
The patch is available against Hadoop release 0.19 (http://svn.apache.org/repos/asf/hadoop/core/tags/release-0.19.0/). This patch includes https://issues.apache.org/jira/browse/HADOOP-5188 (Modifications to enable multiple types of logging)
The patch does not support file-based logging, nor it offers a way to switch from one system to the other, so please apply with caution. The patch does not support HDFS deployment older than 0.19 and has been tested only over pre-formatted file-systems; although it should work without many troubles, we suggest not to apply this patch to production systems.
To compile, just run ant. To compile and run properly you will need to add the zookeeper and bookkeeper jar to you hadoop/lib directory. Please refer to http://hadoop.apache.org/zookeeper/ for information on how to get, compile and run Zookeeper and Bookkeeper.
HOW TO USE THE PATCHED HDFS
One you patch Hadoop there will be some new properties to configure in hadoop-site.xml; a description follows:
Specifies the type of logging to use. At the moment, only BKEditLogThreadBuf is provided.
Specifies the number of bookies to use. Default is 3.
Specifies the size of the quorum. Default is 2.
Specifies the logging mode. A string in
. For more information on this see TODO
Specifies the ZooKeeper server containing info about bookies
Once HDFS is configured, the subsequent steps are:
We tested our prototype against vanilla HDFS using a benchmark provided by the Hadoop folks and
available in the Hadoop distribution as org.apache.hadoop.hdfs.NNThroughputBenchmark. BookKeeper
logging was configured to use 3 bookies with a quorum of 2 in the verifiable mode. File-logging was
configured to write on a single device (FS) or two devices (FS-NFS, one of them being mounted through NFS).
The machines were dual core 4G with 2 300G SATA drives.
The variables involved are the number of concurrent threads and the number of operations to log. We tested logging 400k ops of type create, and the throughput (measured in Ops/s) is depicted in the attached graph.