Goal: The goal of the HighTideNode is to keep only one physical replica per data center. This is mostly for older files that change very infrequently.The HighTide server watches over the two HDFS namespaces from two different NameNodes in two different data centers. These two equivalent namespaces will be populated via means that are external to HighTide. The HighTide server verifies (via checksum of the crc files) that two directories in the two HDFS contain identical data, and if so, reduces the replication factor to 2 on both HDFS. (One or both HDFS could be using HDFS-RAID too).The HighTideNode monitors any missing replicas on both namenode, and if it finds any it will fix by copying data from the other namenode in the remote data center.
In short, the replication within a HDFS cluster will occur via the NameNode as usual. Each NameNode will maintain fewer than 3 copies of the data. The replication across HDFS clusters will be coordinated by the HighTideNode. It invokes the -list-corruptFiles RPC to each NameNode periodically (every minute) to detect missing replicas.
DataNodeGateway:I envision a single HighTideNode coordinating replication between multiple HDFS clusters. An alternative approach would be to do some sort of a GateWay approach: a specialized DataNode that exports the DataNode protocol and appears like a huge-big DataNode to a HDFS cluster, but instead of storing blocks on local disks, the GatewayDataNode would store data in a remote HDFS cluster. This is similar to existing NFS Gateways, e.g. NFS-CIFS interaction. The downside is that this design is more complex and intrusive to HDFS rather than being a layer on top of it.
Mean-Time-To-Recover (MTR): Will this approach of having remote replicas increase the probability of data loss? My claim is that we should try to keep the MTR practically the same as it is today. If all the replicas of a block on HDFS1 goes missing, then the HighTideNode will first increase the replication factor of the equivalent file in HDFS2. This ensures that we get back to 3-overall copies as soon as possible, thus keeping the MTR same as it is now. Then the HighTideNode will copy over this block from HDFS2 to HDFS1, wait for HDFS1 to attain a replica count of 2 before decreasing the replica count on HDFS2 from 3 back to 2.
HDFS-RAID: HighTide can co-exist with HDFS-RAID. HDFS-RAID allows us to keep fewer physical copies of data + parity. The MTR from RAID is smaller compared to HighTide, but the savings using HighTide is way more because the savings-percentage does not depend on RAID-stripe size or on file-lengths. Once can use RAID to achieve a replication factor of 1.2 in each HDFS cluster and then use HighTide to have an additional 1.2 replicas on the remote HDFS cluster.
Performance: Map-reduce jobs could have a performance impact if the number of replicas are reduced from 3 to 2. So, the tradeoff is reducing the total amount of storage while possibly increasing job latencies.
HBase: With the current HBase design it is difficult to use HighTide to replicate across data centers. This is something that we need to delve more into.